parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	b07396c39a	Fix bugs if plugins fail to load	2023-02-13 21:51:48 -05:00
Joshua M. Boniface	1ea4800212	Set node health to None when restarting	2023-02-13 15:54:46 -05:00
Joshua M. Boniface	9c14d84bfc	Add node health value and send out API	2023-02-13 15:53:39 -05:00
Joshua M. Boniface	d8f346abdd	Move Ceph cluster health reporting to plugin Also removes several outputs from the normal keepalive that were superfluous/static so that the main output fits on one line.	2023-02-13 13:29:40 -05:00
Joshua M. Boniface	3c742a827b	Initial implementation of monitoring plugin system	2023-02-13 12:06:26 -05:00
Joshua M. Boniface	726d0a562b	Update copyright header year	2022-10-06 11:55:27 -04:00
Joshua M. Boniface	92e2ff7449	Fix bug with space-containing detect strings	2022-07-06 15:58:57 -04:00
Joshua M. Boniface	7a40c7a55b	Add support for replacing/refreshing OSDs Adds commands to both replace an OSD disk, and refresh (reimport) an existing OSD disk on a new node. This handles the cases where an OSD disk should be replaced (either due to upgrades or failures) or where a node is rebuilt in-place and an existing OSD must be re-imported to it. This should avoid the need to do a full remove/add sequence for either case. Also cleans up some aspects of OSD removal that are identical between methods (e.g. using safe-to-destroy and sleeping after stopping) and fixes a bug if an OSD does not truly exist when the daemon starts up.	2022-05-06 15:32:06 -04:00
Joshua M. Boniface	3801fcc07b	Fix bug with initial JSON for stats	2022-05-02 13:28:19 -04:00
Joshua M. Boniface	c741900baf	Refactor OSD removal to use new ZK data With the OSD LVM information stored in Zookeeper, we can use this to determine the actual block device to zap rather than relying on runtime determination and guestimation.	2022-05-02 12:52:22 -04:00
Joshua M. Boniface	464f0e0356	Store additional OSD information in ZK Ensures that information like the FSIDs and the OSD LVM volume are stored in Zookeeper at creation time and updated at daemon start time (to ensure the data is populated at least once, or if the /dev/sdX path changes). This will allow safer operation of OSD removals and the potential implementation of re-activation after node replacements.	2022-05-02 12:11:39 -04:00
Joshua M. Boniface	cea8832f90	Ensure initial OSD stats is populated Values are all invalid but this ensures the client won't error out when trying to show an OSD that has never checked in yet.	2022-04-29 16:50:30 -04:00
Joshua M. Boniface	d6ca74376a	Fix bugs with forced removal	2022-04-29 14:03:07 -04:00
Joshua M. Boniface	4d698be34b	Add OSD removal force option Ensures a removal can continue even in situations where some step(s) might fail, for instance removing an obsolete OSD from a replaced node.	2022-04-29 11:16:33 -04:00
Joshua M. Boniface	67131de4f6	Fix bug when removing OSDs Ensure the OSD is down as well as out or purge might fail.	2021-12-28 03:05:34 -05:00
Joshua M. Boniface	abc23ebb18	Handle detect strings as arguments for blockdevs Allows specifying blockdevs in the OSD and OSD-DB addition commands as detect strings rather than actual block device paths. This provides greater flexibility for automation with pvcbootstrapd (which originates the concept of detect strings) and in general usage as well.	2021-12-28 02:53:02 -05:00
Joshua M. Boniface	78faa90139	Reformat recent changes with Black	2021-11-06 03:27:07 -04:00
Joshua M. Boniface	66bfad3109	Fix linting errors F522/F523 unused args	2021-11-06 03:24:50 -04:00
Joshua M. Boniface	c41664d2da	Reformat code with Black code formatter Unify the code style along PEP and Black principles using the tool.	2021-11-06 03:02:43 -04:00
Joshua M. Boniface	dfebb2d3e5	Also validate on failures	2021-10-12 17:11:03 -04:00
Joshua M. Boniface	b8204d89ac	Go back to passing if exception Validation already happened and the set happens again later.	2021-10-12 14:21:52 -04:00
Joshua M. Boniface	2d9fb9688d	Validate network MTU after initial read	2021-10-12 10:53:17 -04:00
Joshua M. Boniface	3122d73bf5	Avoid duplicate runs of MTU set It wasn't the validator duplicating, but the update duplicating, so avoid that happening properly this time.	2021-10-09 19:21:47 -04:00
Joshua M. Boniface	7ed8ef179c	Revert "Avoid duplicate runs of MTU validator" This reverts commit `56021c443a`.	2021-10-09 19:11:42 -04:00
Joshua M. Boniface	caead02b2a	Set all log messages to information state None of these were "success" messages and thus shouldn't have been ok state.	2021-10-09 19:09:38 -04:00
Joshua M. Boniface	87bc5f93e6	Avoid duplicate runs of MTU validator	2021-10-09 19:07:41 -04:00
Joshua M. Boniface	203893559e	Use correct isinstance instead of type	2021-10-09 19:03:31 -04:00
Joshua M. Boniface	2c51bb0705	Move MTU validation to function Prevents code duplication and ensures validation runs when an MTU is updated, not just on network creation.	2021-10-09 19:01:45 -04:00
Joshua M. Boniface	46d3daf686	Add logger message when setting MTU	2021-10-09 18:56:18 -04:00
Joshua M. Boniface	e9d05aa24e	Ensure vx_mtu is always an int()	2021-10-09 18:52:50 -04:00
Joshua M. Boniface	6ce28c43af	Add MTU value checking and log messages Ensures that if a specified MTU is more than the maximum it is set to the maximum instead, and adds warning messages for both situations.	2021-10-09 18:48:56 -04:00
Joshua M. Boniface	c45f8f5bd5	Have VXNetworkInstance set MTU if unset Makes this explicit in Zookeeper if a network is unset, post-migration (schema version 6). Addresses #144	2021-10-09 17:52:57 -04:00
Joshua M. Boniface	3690a2c1e0	Fix migration bugs and invalid vx_mtu Addresses #144	2021-10-09 17:35:10 -04:00
Joshua M. Boniface	50d8aa0586	Add handlers for client network MTUs Refactors some of the code in VXNetworkInterface to handle MTUs in a more streamlined fashion. Also fixes a bug whereby bridge client networks were being explicitly given the cluster dev MTU which might not be correct. Now adds support for this option explicitly in the configs, and defaults to 1500 for safety (the standard Ethernet MTU). Addresses #144	2021-10-09 17:02:27 -04:00
Joshua M. Boniface	142c999ce8	Re-add success log output during migration	2021-09-27 11:50:55 -04:00
Joshua M. Boniface	55221b3d97	Simplify VM migration down to 3 steps Remove two superfluous synchronization steps which are not needed here, since the exclusive lock handles that situation anyways. Still does not fix the weird flush->unflush lock timeout bug, but is better worked-around now due to the cancelling of the other wait freeing this up and continuing.	2021-09-27 00:03:20 -04:00
Joshua M. Boniface	0d72798814	Work around synchronization lock issues Make the block on stage C only wait for 900 seconds (15 minutes) to prevent indefinite blocking. The issue comes if a VM is being received, and the current unflush is cancelled for a flush. When this happens, this lock acquisition seems to block for no obvious reason, and no other changes seem to affect it. This is certainly some sort of locking bug within Kazoo but I can't diagnose it as-is. Leave a TODO to look into this again in the future.	2021-09-26 23:26:21 -04:00
Joshua M. Boniface	3638efc77e	Improve log messages during VM migration	2021-09-26 23:15:38 -04:00
Joshua M. Boniface	c2c888d684	Use event to non-block wait and fix inf wait	2021-09-26 22:55:39 -04:00
Joshua M. Boniface	febef2e406	Track status of VM state thread	2021-09-26 22:55:21 -04:00
Joshua M. Boniface	2a4f38e933	Simplify locking process for VM migration Rather than using a cumbersome and overly complex ping-pong of read and write locks, instead move to a much simpler process using exclusive locks. Describing the process in ASCII or narrative is cumbersome, but the process ping-pongs via a set of exclusive locks and wait timers, so that the two sides are able to synchronize via blocking the exclusive lock. The end result is a much more streamlined migration (takes about half the time all things considered) which should be less error-prone.	2021-09-26 22:08:07 -04:00
Joshua M. Boniface	e23e2dd9bf	Fix typo in log message	2021-09-26 03:35:30 -04:00
Joshua M. Boniface	0f02c5eaef	Fix typo in sgdisk command options	2021-09-26 00:59:05 -04:00
Joshua M. Boniface	075abec5fe	Use re.search instead of re.match Required since we're not matching the start of the string.	2021-09-26 00:55:29 -04:00
Joshua M. Boniface	3a1cbf8d01	Raise basic exceptions in CephInstance Avoids no exception to reraise errors on failures.	2021-09-26 00:50:10 -04:00
Joshua M. Boniface	a438a4155a	Fix OSD creation for partition paths and fix gdisk The previous implementation did not work with /dev/nvme devices or any /dev/disk/by-* devices due to some logical failures in the partition naming scheme, so fix these, and be explicit about what is supported in the PVC CLI command output. The 'echo \| gdisk' implementation of partition creation also did not work due to limitations of subprocess.run; instead, use sgdisk which allows these commands to be written out explicitly and is included in the same package as gdisk.	2021-09-26 00:12:28 -04:00
Joshua M. Boniface	65df807b09	Add support for configurable OSD DB ratios The default of 0.05 (5%) is likely ideal in the initial implementation, but allow this to be set explicitly for maximum flexibility in space-constrained or performance-critical use-cases.	2021-09-24 01:06:39 -04:00
Joshua M. Boniface	adc8a5a3bc	Add separate OSD DB device support Adds in three parts: 1. Create an API endpoint to create OSD DB volume groups on a device. Passed through to the node via the same command pipeline as creating/removing OSDs, and creates a volume group with a fixed name (osd-db). 2. Adds API support for specifying whether or not to use this DB volume group when creating a new OSD via the "ext_db" flag. Naming and sizing is fixed for simplicity and based on Ceph recommendations (5% of OSD size). The Zookeeper schema tracks the block device to use during removal. 3. Adds CLI support for the new and modified API endpoints, as well as displaying the block device and DB block device in the OSD list. While I debated supporting adding a DB device to an existing OSD, in practice this ended up being a very complex operation involving stopping the OSD and setting some options, so this is not supported; this can be specified during OSD creation only. Closes #142	2021-09-23 13:59:49 -04:00
Joshua M. Boniface	e962743e51	Add VM device hot attach/detach support Adds a new API endpoint to support hot attach/detach of devices, and the corresponding client-side logic to use this endpoint when doing VM network/storage add/remove actions. The live attach is now the default behaviour for these types of additions and removals, and can be disabled if needed. Closes #141	2021-09-12 19:33:00 -04:00
Joshua M. Boniface	fb46f5f9e9	Change default node object state to flushed	2021-08-29 03:34:08 -04:00

1 2

51 Commits