parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	8f906c1f81	Use power off in fence instead of reset Use a power off (and then make the power on a requirement) during a node fence. Removes some potential ambiguity in the power state, since we will know for certain if it is off.	2021-10-12 11:04:27 -04:00
Joshua M. Boniface	2d9fb9688d	Validate network MTU after initial read	2021-10-12 10:53:17 -04:00
Joshua M. Boniface	f13cc04b89	Bump version to 0.9.41	2021-10-09 19:39:21 -04:00
Joshua M. Boniface	95e01f38d5	Adjust log type of object setup message	2021-10-09 19:23:12 -04:00
Joshua M. Boniface	3122d73bf5	Avoid duplicate runs of MTU set It wasn't the validator duplicating, but the update duplicating, so avoid that happening properly this time.	2021-10-09 19:21:47 -04:00
Joshua M. Boniface	7ed8ef179c	Revert "Avoid duplicate runs of MTU validator" This reverts commit `56021c443a`.	2021-10-09 19:11:42 -04:00
Joshua M. Boniface	caead02b2a	Set all log messages to information state None of these were "success" messages and thus shouldn't have been ok state.	2021-10-09 19:09:38 -04:00
Joshua M. Boniface	87bc5f93e6	Avoid duplicate runs of MTU validator	2021-10-09 19:07:41 -04:00
Joshua M. Boniface	203893559e	Use correct isinstance instead of type	2021-10-09 19:03:31 -04:00
Joshua M. Boniface	2c51bb0705	Move MTU validation to function Prevents code duplication and ensures validation runs when an MTU is updated, not just on network creation.	2021-10-09 19:01:45 -04:00
Joshua M. Boniface	46d3daf686	Add logger message when setting MTU	2021-10-09 18:56:18 -04:00
Joshua M. Boniface	e9d05aa24e	Ensure vx_mtu is always an int()	2021-10-09 18:52:50 -04:00
Joshua M. Boniface	6ce28c43af	Add MTU value checking and log messages Ensures that if a specified MTU is more than the maximum it is set to the maximum instead, and adds warning messages for both situations.	2021-10-09 18:48:56 -04:00
Joshua M. Boniface	c45f8f5bd5	Have VXNetworkInstance set MTU if unset Makes this explicit in Zookeeper if a network is unset, post-migration (schema version 6). Addresses #144	2021-10-09 17:52:57 -04:00
Joshua M. Boniface	3690a2c1e0	Fix migration bugs and invalid vx_mtu Addresses #144	2021-10-09 17:35:10 -04:00
Joshua M. Boniface	50d8aa0586	Add handlers for client network MTUs Refactors some of the code in VXNetworkInterface to handle MTUs in a more streamlined fashion. Also fixes a bug whereby bridge client networks were being explicitly given the cluster dev MTU which might not be correct. Now adds support for this option explicitly in the configs, and defaults to 1500 for safety (the standard Ethernet MTU). Addresses #144	2021-10-09 17:02:27 -04:00
Joshua M. Boniface	6ee4c55071	Correct flawed conditional in verify_ipmi	2021-10-07 15:11:19 -04:00
Joshua M. Boniface	c27359c4bf	Bump version to 0.9.40	2021-10-07 14:42:04 -04:00
Joshua M. Boniface	46078932c3	Correct bad stop_keepalive_timer call	2021-10-07 14:41:12 -04:00
Joshua M. Boniface	bdb9db8375	Bump version to 0.9.39	2021-10-07 11:52:38 -04:00
Joshua M. Boniface	da9248cfa2	Bump version to 0.9.38	2021-10-03 22:32:41 -04:00
Joshua M. Boniface	23977b04fc	Bump version to 0.9.37	2021-09-30 02:08:14 -04:00
Joshua M. Boniface	f6f6f07488	Add timeouts to queue gets and adjust Ensure that all keepalive timeouts are set (prevent the queue.get() actions from blocking forever) and set the thread timeouts to line up as well. Everything here is thus limited to keepalive_interval seconds (default 5s) to keep it uniform.	2021-09-27 16:10:27 -04:00
Joshua M. Boniface	142c999ce8	Re-add success log output during migration	2021-09-27 11:50:55 -04:00
Joshua M. Boniface	1de069298c	Fix missing character in log message	2021-09-27 00:49:43 -04:00
Joshua M. Boniface	55221b3d97	Simplify VM migration down to 3 steps Remove two superfluous synchronization steps which are not needed here, since the exclusive lock handles that situation anyways. Still does not fix the weird flush->unflush lock timeout bug, but is better worked-around now due to the cancelling of the other wait freeing this up and continuing.	2021-09-27 00:03:20 -04:00
Joshua M. Boniface	0d72798814	Work around synchronization lock issues Make the block on stage C only wait for 900 seconds (15 minutes) to prevent indefinite blocking. The issue comes if a VM is being received, and the current unflush is cancelled for a flush. When this happens, this lock acquisition seems to block for no obvious reason, and no other changes seem to affect it. This is certainly some sort of locking bug within Kazoo but I can't diagnose it as-is. Leave a TODO to look into this again in the future.	2021-09-26 23:26:21 -04:00
Joshua M. Boniface	3638efc77e	Improve log messages during VM migration	2021-09-26 23:15:38 -04:00
Joshua M. Boniface	c2c888d684	Use event to non-block wait and fix inf wait	2021-09-26 22:55:39 -04:00
Joshua M. Boniface	febef2e406	Track status of VM state thread	2021-09-26 22:55:21 -04:00
Joshua M. Boniface	2a4f38e933	Simplify locking process for VM migration Rather than using a cumbersome and overly complex ping-pong of read and write locks, instead move to a much simpler process using exclusive locks. Describing the process in ASCII or narrative is cumbersome, but the process ping-pongs via a set of exclusive locks and wait timers, so that the two sides are able to synchronize via blocking the exclusive lock. The end result is a much more streamlined migration (takes about half the time all things considered) which should be less error-prone.	2021-09-26 22:08:07 -04:00
Joshua M. Boniface	3b805cdc34	Fix failure to connect to libvirt in keepalive This should be caught and abort the thread rather than failing and holding up keepalives.	2021-09-26 20:42:01 -04:00
Joshua M. Boniface	06f0f7ed91	Fix several bugs in fence handling 1. Output from ipmitool was not being stripped, and stray newlines were throwing off the comparisons. Fixes this. 2. Several stages were lacking meaningful messages. Adds these in so the output is more clear about what is going on. 3. Reduce the sleep time after a fence to just 1x the keepalive_interval, rather than 2x, because this seemed like excessively long even for slow IPMI interfaces, especially since we're checking the power state now anyways. 4. Set the node daemon state to an explicit 'fenced' state after a successful fence to indicate to users that the node was indeed fenced successfully and not still 'dead'.	2021-09-26 20:07:30 -04:00
Joshua M. Boniface	fd040ab45a	Ensure pvc-flush is after network-online	2021-09-26 17:40:42 -04:00
Joshua M. Boniface	e23e2dd9bf	Fix typo in log message	2021-09-26 03:35:30 -04:00
Joshua M. Boniface	0f02c5eaef	Fix typo in sgdisk command options	2021-09-26 00:59:05 -04:00
Joshua M. Boniface	075abec5fe	Use re.search instead of re.match Required since we're not matching the start of the string.	2021-09-26 00:55:29 -04:00
Joshua M. Boniface	3a1cbf8d01	Raise basic exceptions in CephInstance Avoids no exception to reraise errors on failures.	2021-09-26 00:50:10 -04:00
Joshua M. Boniface	a438a4155a	Fix OSD creation for partition paths and fix gdisk The previous implementation did not work with /dev/nvme devices or any /dev/disk/by-* devices due to some logical failures in the partition naming scheme, so fix these, and be explicit about what is supported in the PVC CLI command output. The 'echo \| gdisk' implementation of partition creation also did not work due to limitations of subprocess.run; instead, use sgdisk which allows these commands to be written out explicitly and is included in the same package as gdisk.	2021-09-26 00:12:28 -04:00
Joshua M. Boniface	65df807b09	Add support for configurable OSD DB ratios The default of 0.05 (5%) is likely ideal in the initial implementation, but allow this to be set explicitly for maximum flexibility in space-constrained or performance-critical use-cases.	2021-09-24 01:06:39 -04:00
Joshua M. Boniface	d0f3e9e285	Bump version to 0.9.36	2021-09-23 14:01:38 -04:00
Joshua M. Boniface	adc8a5a3bc	Add separate OSD DB device support Adds in three parts: 1. Create an API endpoint to create OSD DB volume groups on a device. Passed through to the node via the same command pipeline as creating/removing OSDs, and creates a volume group with a fixed name (osd-db). 2. Adds API support for specifying whether or not to use this DB volume group when creating a new OSD via the "ext_db" flag. Naming and sizing is fixed for simplicity and based on Ceph recommendations (5% of OSD size). The Zookeeper schema tracks the block device to use during removal. 3. Adds CLI support for the new and modified API endpoints, as well as displaying the block device and DB block device in the OSD list. While I debated supporting adding a DB device to an existing OSD, in practice this ended up being a very complex operation involving stopping the OSD and setting some options, so this is not supported; this can be specified during OSD creation only. Closes #142	2021-09-23 13:59:49 -04:00
Joshua M. Boniface	df277edf1c	Move console watcher stop try up Could cause an exception if d_domain is not defined yet.	2021-09-22 16:02:04 -04:00
Joshua M. Boniface	772807deb3	Bump version to 0.9.35	2021-09-13 02:20:46 -04:00
Joshua M. Boniface	f3fb492633	Handle VM disk/network stats gathering exceptions	2021-09-12 19:41:07 -04:00
Joshua M. Boniface	e962743e51	Add VM device hot attach/detach support Adds a new API endpoint to support hot attach/detach of devices, and the corresponding client-side logic to use this endpoint when doing VM network/storage add/remove actions. The live attach is now the default behaviour for these types of additions and removals, and can be disabled if needed. Closes #141	2021-09-12 19:33:00 -04:00
Joshua M. Boniface	be954c1625	Don't crash cleanup if no this_node	2021-08-29 03:52:18 -04:00
Joshua M. Boniface	fb46f5f9e9	Change default node object state to flushed	2021-08-29 03:34:08 -04:00
Joshua M. Boniface	694b8e85a0	Bump version to 0.9.34	2021-08-24 16:15:25 -04:00
Joshua M. Boniface	a4c0e0befd	Fix typo in output message	2021-08-23 00:39:19 -04:00

... 3 4 5 6 7 ...

810 Commits