parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	56ba7b1457	Bump version to 0.9.1	2020-10-29 12:16:38 -04:00
Joshua M. Boniface	ec0b8acf90	Support per-VM migration type selectors Allow a VM to specify its migration type as a default choice. The valid options are "default" (i.e. behave as now), "live" which forces a live migration only, and "shutdown" which forces a shutdown migration only. The new option is treated as a VM meta option and is set to default if not found.	2020-10-29 12:01:29 -04:00
Joshua M. Boniface	5d08ad9573	Fix incorrect keepalive interval setting	2020-10-26 11:44:45 -04:00
Joshua M. Boniface	0f299777f1	Modify version to 3-digit numbering I expect 0.9 will be fairly long-lived, so add another decimal place so I may continue adding tweaks to it. THIS IS NOT SEMVER.	2020-10-26 02:13:11 -04:00
Joshua M. Boniface	890023cbfc	Make sender wait dynamic based on receiver	2020-10-21 14:43:54 -04:00
Joshua M. Boniface	28abb018e3	Improve some timeouts and conditionals	2020-10-21 12:00:10 -04:00
Joshua M. Boniface	017953c2e6	Move lock release to phase D	2020-10-21 11:07:01 -04:00
Joshua M. Boniface	82b4d3ed1b	Add missing prefix statements to loggers	2020-10-21 10:52:53 -04:00
Joshua M. Boniface	bae366a316	Add waits and only receive check on send	2020-10-21 10:43:42 -04:00
Joshua M. Boniface	351076c15e	Check if node changed during final check Avoids situations where two migrates, to different nodes, happen in rapid succession. Aborts the migration if the current target node no longer matches what was set at the start of the execution.	2020-10-21 02:52:36 -04:00
Joshua M. Boniface	42514b9a50	Improve messages further	2020-10-21 02:41:42 -04:00
Joshua M. Boniface	611e47f338	Add messages to migration aborts Results in some information duplication, but ensures logging of the reason a migration was aborted separate from the error(s) this may generate.	2020-10-21 02:38:42 -04:00
Joshua M. Boniface	1523959074	Move where setting last_ vars happens	2020-10-21 02:24:00 -04:00
Joshua M. Boniface	ef762359f4	Adjust timing to avoid migrating to self quickly Add another separate state lock, release it earlier, and ensure timings are good to avoid double-migrating one VM.	2020-10-21 02:17:55 -04:00
Joshua M. Boniface	398d33778f	Avoid stopping duplicates, just lock our own key	2020-10-20 16:10:39 -04:00
Joshua M. Boniface	a6d492ed9f	Remove spurious writes and adjust sleep	2020-10-20 16:04:26 -04:00
Joshua M. Boniface	11fa3b0df3	Remove additional wait and add last_node entries These allow for aborting a migration to retain the previous settings and override what the client set.	2020-10-20 15:58:55 -04:00
Joshua M. Boniface	442aa4e420	Tweak timers further	2020-10-20 15:43:59 -04:00
Joshua M. Boniface	3910843660	Add missing break	2020-10-20 15:39:29 -04:00
Joshua M. Boniface	70f3fdbfb9	Tweak the delays slightly on receive	2020-10-20 15:38:07 -04:00
Joshua M. Boniface	7cb0241a12	Attempt live migrates 3 times before proceeding	2020-10-20 15:33:41 -04:00
Joshua M. Boniface	9fb33ed7a7	Increase peer lock acquiring timers	2020-10-20 15:26:59 -04:00
Joshua M. Boniface	abfe0108ab	Better handle aborting migrations	2020-10-20 15:22:16 -04:00
Joshua M. Boniface	567fe8f36b	Wait for existing migrations before proceeding	2020-10-20 15:12:32 -04:00
Joshua M. Boniface	ec7b78b9b8	Add additional short sleep in receive	2020-10-20 13:29:17 -04:00
Joshua M. Boniface	224c8082ef	Alter text of synchronization messages	2020-10-20 13:08:18 -04:00
Joshua M. Boniface	f9e7e9884f	Improve handling of VM migrations The VM migration code was very old, very spaghettified, and prone to strange failures. Improve this by taking cues from the node primary migration. Use synchronization between the nodes to ensure lockstep completion of the migration in discrete steps. A proper queue can be built later to integrate with this code more cleanly. References #108	2020-10-20 13:01:55 -04:00
Joshua M. Boniface	726501f4d4	Add additional logging to flush selector Adds additional debug logging to the flush selector to determine how any why any given node is selected. Useful for troubleshooting strange choices.	2020-10-20 12:34:18 -04:00
Joshua M. Boniface	c6e34c7dc6	Bump base version to 0.9	2020-10-18 14:31:19 -04:00
Joshua M. Boniface	f749633f7c	Use provisioned memory for mem migration selector Use the new "provisioned" memory field, instead of the "allocated" memory field, to determine the optimal node when using the "mem" migration selector. This will take into account non-running VMs in the calculation as well as running VMs.	2020-10-18 14:17:15 -04:00
Joshua M. Boniface	a4b80be5ed	Add provisioned memory to node info Adds a separate field to the node memory, "provisioned", which totals the amount of memory provisioned to all VMs on the node, regardless of state, and in contrast to "allocated" which only counts running VMs. Allows for the detection of potential overprovisioned states when factoring in non-running VMs. Includes the supporting code to get this data, since the original implementation of VM memory selection was dependent on the VM being running and getting this from libvirt. Now, if the VM is not active, it gets this from the domain XML instead.	2020-10-18 14:17:15 -04:00
Joshua M. Boniface	aa5f8c93fd	Entirely disable IPv6 on bridged interfaces Prevents any potential leakage due to autoconfigured IPv6 on bridged interfaces. These are exclusively VM-side bridges, and the PVC host should not have any IPv6 configuration on them, ever.	2020-10-15 11:00:59 -04:00
Joshua M. Boniface	9366977fe6	Copy d_domain before iterating Prevents a bug where the thread can crash due to a change in the d_domain object while running the for loop. By copying and iterating over the copy, this becomes safer.	2020-09-16 15:12:37 -04:00
Joshua M. Boniface	65b44f2955	Avoid breaking keepalive during incoming migration The keepalive was getting stuck gathering memoryStats from the non-running VM, since it was in a paused state. Avoid this by just skipping past the rest of the stats gathering if the VM isn't running.	2020-08-28 01:47:36 -04:00
Joshua M. Boniface	78dec77987	Bump version to 0.8	2020-08-26 10:24:44 -04:00
Joshua M. Boniface	921e57ca78	Fix syntax error	2020-08-20 23:05:56 -04:00
Joshua M. Boniface	3cc7df63f2	Add configurable VM shutdown timeout Closes #102	2020-08-20 21:26:12 -04:00
Joshua M. Boniface	e8e65934e3	Use logger prefix for thread debug logs	2020-08-17 14:30:21 -04:00
Joshua M. Boniface	24fda8a73f	Use new debug logger for DNS Aggregator	2020-08-17 14:26:43 -04:00
Joshua M. Boniface	9b3ef6d610	Add connect timeout to Ceph This doesn't seem to actually do anything (like most of these timeouts...) but add it just for posterity.	2020-08-17 13:58:14 -04:00
Joshua M. Boniface	b451c0e8e3	Add additional start/finish debug messages	2020-08-17 13:11:03 -04:00
Joshua M. Boniface	f9b126a106	Make zkhandler accept failures more robustly Most of these would silently fail if there was e.g. an issue with the ZK connection. Instead, encase things in try blocks and handle the exceptions in a more graceful way, returning None or False if applicable. Except for locks, which should retry 5 times before aborting.	2020-08-17 13:03:36 -04:00
Joshua M. Boniface	553f96e7ef	Use logger for debug output Using simple print statements was annoying (lack of timing info and formatting), so move to using the debug logger for these instead with a custom state ('d') with white text to differentiate them. Also indicate which subthread of the keepalive each task is being executed in for easier tracing of issues.	2020-08-17 12:46:52 -04:00
Joshua M. Boniface	65add58c9a	Properly properly handle issue	2020-08-16 11:38:39 -04:00
Joshua M. Boniface	0a01d84290	Tie fence timers to keepalive_interval Also wait 2 full keepalive intervals after fencing before doing anything else, to give the Ceph cluster a chance to recover.	2020-08-15 12:38:03 -04:00
Joshua M. Boniface	4afb288429	Properly handle missing domain_name fail	2020-08-15 12:07:23 -04:00
Joshua M. Boniface	985ad5edc0	Warn if fencing will fail Verify our IPMI state on startup, and then warn if fencing will fail. For now, this is sufficient, but in future (requires refactoring) we might want to adjust how fencing occurs based on this information.	2020-08-13 14:42:18 -04:00
Joshua M. Boniface	0587bcbd67	Go back to manual command for OSD stats Using the Ceph library was a disaster here; it had no timeout or way to force it to continue, so keepalives would become stuck and trigger fence storms. Go back to the manual osd dump command with a 2s timeout which is far more reliable and can be adequately terminated if it runs long.	2020-08-12 22:31:25 -04:00
Joshua M. Boniface	e0cb4a58c3	Ensure zk_listener is readded after reconnect	2020-08-11 12:46:15 -04:00
Joshua M. Boniface	099c58ead8	Fix missing char in log message	2020-08-11 12:40:35 -04:00

1 2

89 Commits