parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	b5f996febd	Fix bugs for node flush for stop/shutdown/restart Previously VMs in stop/shutdown/restart states wouldn't be properly handled during a node flush. This fixes the bugs and ensures that the transient VM states (shutdown/restart) are completed before proceeding, and then avoids setting a stopped/shutdown VM to shutdown/auotstart.	2023-08-18 11:25:59 -04:00
Joshua M. Boniface	2c3a3cdf52	Use try when watching health value in NodeInstance	2023-03-07 09:53:01 -05:00
Joshua M. Boniface	7c07fbefff	Adjust keepalive health printing and ordering	2023-02-24 11:08:30 -05:00
Joshua M. Boniface	202dc3ed59	Correct error handling if monitoring plugins fail	2023-02-24 10:19:41 -05:00
Joshua M. Boniface	e45b3108a2	Add health delta change to message output	2023-02-22 15:02:08 -05:00
Joshua M. Boniface	118237a53b	Fix bad string value for message	2023-02-22 15:02:08 -05:00
Joshua M. Boniface	1093ca6264	Disallow health less than 0	2023-02-15 16:50:24 -05:00
Joshua M. Boniface	0ecf219910	Run setup during plugin loads	2023-02-15 10:11:38 -05:00
Joshua M. Boniface	0f4edc54d1	Use percentage in keepalie output	2023-02-15 01:56:02 -05:00
Joshua M. Boniface	14d29f2986	Adjust text on log message	2023-02-13 22:21:23 -05:00
Joshua M. Boniface	bc88d764b0	Add logging flag for montioring plugin output	2023-02-13 22:04:39 -05:00
Joshua M. Boniface	b07396c39a	Fix bugs if plugins fail to load	2023-02-13 21:51:48 -05:00
Joshua M. Boniface	1ea4800212	Set node health to None when restarting	2023-02-13 15:54:46 -05:00
Joshua M. Boniface	9c14d84bfc	Add node health value and send out API	2023-02-13 15:53:39 -05:00
Joshua M. Boniface	d8f346abdd	Move Ceph cluster health reporting to plugin Also removes several outputs from the normal keepalive that were superfluous/static so that the main output fits on one line.	2023-02-13 13:29:40 -05:00
Joshua M. Boniface	3c742a827b	Initial implementation of monitoring plugin system	2023-02-13 12:06:26 -05:00
Joshua M. Boniface	726d0a562b	Update copyright header year	2022-10-06 11:55:27 -04:00
Joshua M. Boniface	92e2ff7449	Fix bug with space-containing detect strings	2022-07-06 15:58:57 -04:00
Joshua M. Boniface	7a40c7a55b	Add support for replacing/refreshing OSDs Adds commands to both replace an OSD disk, and refresh (reimport) an existing OSD disk on a new node. This handles the cases where an OSD disk should be replaced (either due to upgrades or failures) or where a node is rebuilt in-place and an existing OSD must be re-imported to it. This should avoid the need to do a full remove/add sequence for either case. Also cleans up some aspects of OSD removal that are identical between methods (e.g. using safe-to-destroy and sleeping after stopping) and fixes a bug if an OSD does not truly exist when the daemon starts up.	2022-05-06 15:32:06 -04:00
Joshua M. Boniface	3801fcc07b	Fix bug with initial JSON for stats	2022-05-02 13:28:19 -04:00
Joshua M. Boniface	c741900baf	Refactor OSD removal to use new ZK data With the OSD LVM information stored in Zookeeper, we can use this to determine the actual block device to zap rather than relying on runtime determination and guestimation.	2022-05-02 12:52:22 -04:00
Joshua M. Boniface	464f0e0356	Store additional OSD information in ZK Ensures that information like the FSIDs and the OSD LVM volume are stored in Zookeeper at creation time and updated at daemon start time (to ensure the data is populated at least once, or if the /dev/sdX path changes). This will allow safer operation of OSD removals and the potential implementation of re-activation after node replacements.	2022-05-02 12:11:39 -04:00
Joshua M. Boniface	cea8832f90	Ensure initial OSD stats is populated Values are all invalid but this ensures the client won't error out when trying to show an OSD that has never checked in yet.	2022-04-29 16:50:30 -04:00
Joshua M. Boniface	d6ca74376a	Fix bugs with forced removal	2022-04-29 14:03:07 -04:00
Joshua M. Boniface	4d698be34b	Add OSD removal force option Ensures a removal can continue even in situations where some step(s) might fail, for instance removing an obsolete OSD from a replaced node.	2022-04-29 11:16:33 -04:00
Joshua M. Boniface	67131de4f6	Fix bug when removing OSDs Ensure the OSD is down as well as out or purge might fail.	2021-12-28 03:05:34 -05:00
Joshua M. Boniface	abc23ebb18	Handle detect strings as arguments for blockdevs Allows specifying blockdevs in the OSD and OSD-DB addition commands as detect strings rather than actual block device paths. This provides greater flexibility for automation with pvcbootstrapd (which originates the concept of detect strings) and in general usage as well.	2021-12-28 02:53:02 -05:00
Joshua M. Boniface	78faa90139	Reformat recent changes with Black	2021-11-06 03:27:07 -04:00
Joshua M. Boniface	66bfad3109	Fix linting errors F522/F523 unused args	2021-11-06 03:24:50 -04:00
Joshua M. Boniface	c41664d2da	Reformat code with Black code formatter Unify the code style along PEP and Black principles using the tool.	2021-11-06 03:02:43 -04:00
Joshua M. Boniface	dfebb2d3e5	Also validate on failures	2021-10-12 17:11:03 -04:00
Joshua M. Boniface	b8204d89ac	Go back to passing if exception Validation already happened and the set happens again later.	2021-10-12 14:21:52 -04:00
Joshua M. Boniface	2d9fb9688d	Validate network MTU after initial read	2021-10-12 10:53:17 -04:00
Joshua M. Boniface	3122d73bf5	Avoid duplicate runs of MTU set It wasn't the validator duplicating, but the update duplicating, so avoid that happening properly this time.	2021-10-09 19:21:47 -04:00
Joshua M. Boniface	7ed8ef179c	Revert "Avoid duplicate runs of MTU validator" This reverts commit `56021c443a`.	2021-10-09 19:11:42 -04:00
Joshua M. Boniface	caead02b2a	Set all log messages to information state None of these were "success" messages and thus shouldn't have been ok state.	2021-10-09 19:09:38 -04:00
Joshua M. Boniface	87bc5f93e6	Avoid duplicate runs of MTU validator	2021-10-09 19:07:41 -04:00
Joshua M. Boniface	203893559e	Use correct isinstance instead of type	2021-10-09 19:03:31 -04:00
Joshua M. Boniface	2c51bb0705	Move MTU validation to function Prevents code duplication and ensures validation runs when an MTU is updated, not just on network creation.	2021-10-09 19:01:45 -04:00
Joshua M. Boniface	46d3daf686	Add logger message when setting MTU	2021-10-09 18:56:18 -04:00
Joshua M. Boniface	e9d05aa24e	Ensure vx_mtu is always an int()	2021-10-09 18:52:50 -04:00
Joshua M. Boniface	6ce28c43af	Add MTU value checking and log messages Ensures that if a specified MTU is more than the maximum it is set to the maximum instead, and adds warning messages for both situations.	2021-10-09 18:48:56 -04:00
Joshua M. Boniface	c45f8f5bd5	Have VXNetworkInstance set MTU if unset Makes this explicit in Zookeeper if a network is unset, post-migration (schema version 6). Addresses #144	2021-10-09 17:52:57 -04:00
Joshua M. Boniface	3690a2c1e0	Fix migration bugs and invalid vx_mtu Addresses #144	2021-10-09 17:35:10 -04:00
Joshua M. Boniface	50d8aa0586	Add handlers for client network MTUs Refactors some of the code in VXNetworkInterface to handle MTUs in a more streamlined fashion. Also fixes a bug whereby bridge client networks were being explicitly given the cluster dev MTU which might not be correct. Now adds support for this option explicitly in the configs, and defaults to 1500 for safety (the standard Ethernet MTU). Addresses #144	2021-10-09 17:02:27 -04:00
Joshua M. Boniface	142c999ce8	Re-add success log output during migration	2021-09-27 11:50:55 -04:00
Joshua M. Boniface	55221b3d97	Simplify VM migration down to 3 steps Remove two superfluous synchronization steps which are not needed here, since the exclusive lock handles that situation anyways. Still does not fix the weird flush->unflush lock timeout bug, but is better worked-around now due to the cancelling of the other wait freeing this up and continuing.	2021-09-27 00:03:20 -04:00
Joshua M. Boniface	0d72798814	Work around synchronization lock issues Make the block on stage C only wait for 900 seconds (15 minutes) to prevent indefinite blocking. The issue comes if a VM is being received, and the current unflush is cancelled for a flush. When this happens, this lock acquisition seems to block for no obvious reason, and no other changes seem to affect it. This is certainly some sort of locking bug within Kazoo but I can't diagnose it as-is. Leave a TODO to look into this again in the future.	2021-09-26 23:26:21 -04:00
Joshua M. Boniface	3638efc77e	Improve log messages during VM migration	2021-09-26 23:15:38 -04:00
Joshua M. Boniface	c2c888d684	Use event to non-block wait and fix inf wait	2021-09-26 22:55:39 -04:00

1 2

62 Commits