parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	4c0d90b517	Add read lock timeouts to prevent deadlocks	2024-10-10 15:19:05 -04:00
Joshua M. Boniface	9714ac20b2	Update formatting for Black 24.4.0	2024-04-19 10:26:06 -04:00
Joshua M. Boniface	4c6aabec6a	Fix bug if d_network changes	2024-04-05 14:05:51 -04:00
Joshua M. Boniface	09269f182c	Add live migrate max downtime selector meta field Adds a new flag to VM metadata to allow setting the VM live migration max downtime. This will enable very busy VMs that hang live migration to have this value changed.	2024-01-11 00:05:50 -05:00
Joshua M. Boniface	123c7ce857	Update copyright header on all files for 2024 Last release of 2023 is probably the best time to do this.	2023-12-29 11:16:59 -05:00
Joshua M. Boniface	e654fbba08	Move debug condition handling to Logger Avoids many dozens of conditionals sprinkled throughout the code by centralizing this check into the main Logger instance.	2023-12-27 13:01:45 -05:00
Joshua M. Boniface	3e4cc53fdd	Add node network statistics and utilization values Adds a new physical network interface stats parser to the node keepalives, and leverages this information to provide a network utilization overview in the Prometheus metrics.	2023-12-21 15:45:01 -05:00
Joshua M. Boniface	03a738f878	Move config parser into daemon_lib And reformat/add config values for API.	2023-11-30 00:05:37 -05:00
Joshua M. Boniface	41f4e4fb2f	Split health monitoring into discrete daemon/pkg	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	bcc57638a9	Refactor pvcnoded to use new configuration	2023-11-26 15:41:25 -05:00
Joshua M. Boniface	e818df5dae	Use enable/disable --now instead of two commands Avoids needing two calls here especially for the stop.	2023-11-16 02:40:35 -05:00
Joshua M. Boniface	c76a5afd04	Avoid waits during node secondary Waiting for the daemons to stop took too much time on some nodes and could throw off the lockstep. Instead, leverage background=True to run the systemctl os_commands in the background (when they complete is irrelevant), stop the Metadata API first, and don't delay during its stop at all.	2023-11-16 02:34:12 -05:00
Joshua M. Boniface	aef38639cf	Rename pvcapid-worker to pvcworkerd	2023-11-15 20:31:39 -05:00
Joshua M. Boniface	83c4c6633d	Readd RBD lock detection and clearing on startup This is still needed due to the nature of the locks and freeing them on startup, and to preserve lock=fail behaviour on VM startup. Also fixes the fencing lock flush to directly use the client library outside of Celery. I don't like this hack but it seems prudent until we move fencing to the workers as well.	2023-11-10 01:33:48 -05:00
Joshua M. Boniface	08411708f6	Clean up dangling references to cmd pipes Also removes the schema references for these CMD pipes as they are no longer required.	2023-11-09 23:28:14 -05:00
Joshua M. Boniface	ce17c60a20	Port OSD on-node tasks to Celery worker system Adds Celery versions of the osd_add, osd_replace, osd_refresh, osd_remove, and osd_db_vg_add functions.	2023-11-09 23:28:08 -05:00
Joshua M. Boniface	89681d54b9	Port VM on-node tasks to Celery worker system Adds Celery versions of the flush_locks, device_attach, and device_detach functions.	2023-11-06 20:40:46 -05:00
Joshua M. Boniface	f0c2e9d295	Don't start pvcapid-worker on primary It will be running anyways	2023-11-05 19:44:00 -05:00
Joshua M. Boniface	7490f13b7c	Check for partition tables on new devices	2023-11-04 03:13:58 -04:00
Joshua M. Boniface	e32054be81	Refactor refresh as well	2023-11-04 02:44:52 -04:00
Joshua M. Boniface	b3d13fe9be	Add log message for zap	2023-11-04 01:02:51 -04:00
Joshua M. Boniface	48b2ccbd95	Add timeout for safe-to-destroy Continuously take the OSD down and out while doing so.	2023-11-04 00:55:05 -04:00
Joshua M. Boniface	1535078842	Fix lvremove, lvcreate, and update ZK details	2023-11-04 00:30:14 -04:00
Joshua M. Boniface	0e45613634	Use right key with correct data	2023-11-04 00:02:00 -04:00
Joshua M. Boniface	7f5dd385b5	Use right key for FSID elsewhere	2023-11-03 23:51:01 -04:00
Joshua M. Boniface	befce62925	Add OSD destroy before purge	2023-11-03 23:44:27 -04:00
Joshua M. Boniface	b0909aed61	Get proper FSID value	2023-11-03 23:38:24 -04:00
Joshua M. Boniface	f418b40527	Use proper FSID instead of hack	2023-11-03 16:38:19 -04:00
Joshua M. Boniface	dd0177ce10	Rework replacement procedure again Avoid calling other functions; replicate the actual process from Ceph docs (https://docs.ceph.com/en/pacific/rados/operations/add-or-rm-osds/) to ensure things work out well (e.g. preserving OSD IDs).	2023-11-03 16:31:56 -04:00
Joshua M. Boniface	ed5bc9fb43	Fix numerous formatting and function bugs	2023-11-03 14:00:05 -04:00
Joshua M. Boniface	94d8d2cf75	Fix skip_zap_flag anomaly and add crush rm	2023-11-03 02:35:12 -04:00
Joshua M. Boniface	20497cf89d	Fix bugs and skip safe_to_destroy on force	2023-11-03 02:29:50 -04:00
Joshua M. Boniface	64e37ae963	Update OSD replacement functionality 1. Simplify this by leveraging the existing remove_osd/add_osd functions, since its task was functionally identical to those two in sequential order. 2. Add support for split OSDs within the command (replacing all OSDs on the block device(s) as required). 3. Add additional configurability and flexibility around the old device, weight, and external DB LVs.	2023-11-03 01:45:49 -04:00
Joshua M. Boniface	3cb8a70f04	Add forcing to OSD purge	2023-11-02 23:20:48 -04:00
Joshua M. Boniface	f53af510c1	Avoid startup failures if OSD removed	2023-11-02 22:24:39 -04:00
Joshua M. Boniface	d5d783fad3	Set proper split flag	2023-11-02 22:20:22 -04:00
Joshua M. Boniface	980ea6a9e9	Adjust handling of ext_db and _count options Avoid the use of superfluous flag options, default them to none, and add support for fixed-size DB LVs.	2023-11-02 13:29:47 -04:00
Joshua M. Boniface	8780044be6	Ensure db_device is an empty string	2023-11-02 00:52:18 -04:00
Joshua M. Boniface	f08c654f22	Fix missing fstring	2023-11-01 21:41:06 -04:00
Joshua M. Boniface	526a5f4a74	Add support for split OSD adds Allows creating multiple OSDs on a single (NVMe) block device, leveraging the "ceph-volume lvm batch" command. Replaces the previous method of creating OSDs. Also adds a new ZK item for each OSD indicating if it is split or not.	2023-11-01 21:31:35 -04:00
Joshua M. Boniface	aa0b1f504f	Fix output bug	2023-11-01 15:46:38 -04:00
Joshua M. Boniface	794cea4a02	Reverse ordering, run checks before starting timer	2023-09-15 22:25:37 -04:00
Joshua M. Boniface	479e156234	Run monitoring plugins once on startup	2023-09-15 17:53:16 -04:00
Joshua M. Boniface	86830286f3	Adjust message printing to be on one line	2023-09-15 17:00:34 -04:00
Joshua M. Boniface	4d51318a40	Make monitoring interval configurable	2023-09-15 16:54:51 -04:00
Joshua M. Boniface	cba6f5be48	Fix wording of non-coordinator state	2023-09-15 16:51:04 -04:00
Joshua M. Boniface	254303b9d4	Use coordinator_state instead of router_state Makes it much clearer what this variable represents.	2023-09-15 16:47:56 -04:00
Joshua M. Boniface	40b7d68853	Separate monitoring and move to 60s interval Removes the dependency of the monitoring subsystem from the node keepalives, and runs them at a 60s interval to avoid excessive backups if a plugin takes too long. Adds its own logs and related items as required. Finally adds a new required argument to the run() of plugins, the coordinator state, which can be used by a plugin to determine actions based on whether the node is a primary, secondary, or non-coordinator.	2023-09-15 16:47:11 -04:00
Joshua M. Boniface	570da99605	Avoid failures if no children found	2023-09-02 01:36:17 -04:00
Joshua M. Boniface	5e43f9bd7c	Ensure Patroni failures do not block takeover	2023-08-29 22:00:11 -04:00

1 2 3

112 Commits