parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	598b2025e8	Use Rados and add Ceph entries to pvcnoded.yaml	2020-06-06 21:12:51 -04:00
Joshua M. Boniface	70b787d1fd	Move all VM functions into thread	2020-06-06 15:44:05 -04:00
Joshua M. Boniface	e1310a05f2	Implement recording of VM stats during keepalive	2020-06-06 15:34:03 -04:00
Joshua M. Boniface	2ad6860dfe	Move Ceph statistics gathering into thread	2020-06-06 13:25:02 -04:00
Joshua M. Boniface	cebb4bbc1a	Comment cleanup	2020-06-06 13:20:40 -04:00
Joshua M. Boniface	a672e06dd2	Move fencing to end of keepalive function	2020-06-06 13:19:11 -04:00
Joshua M. Boniface	1db73bb892	Move libvirt closure into previous section	2020-06-06 13:18:37 -04:00
Joshua M. Boniface	c1956072f0	Rename update_zookeeper function to node_keepalive	2020-06-06 12:49:50 -04:00
Joshua M. Boniface	ce60836c34	Allow enforcement of live migration Provides a CLI and API argument to force live migration, which triggers a new VM state "migrate-live". The node daemon VMInstance during migrate will read this flag from the state and, if enforced, will not trigger a shutdown migration. Closes #95	2020-06-06 12:00:44 -04:00
Joshua M. Boniface	b5434ba744	Fix typo in variable name	2020-06-06 11:29:48 -04:00
Joshua M. Boniface	f61d443773	Allow move of migrated VM to current node Will make the migrate permanent instead of throwing an error. Fixes #96	2020-06-06 11:25:10 -04:00
Joshua M. Boniface	da20b4493a	Properly return the function	2020-06-05 15:50:43 -04:00
Joshua M. Boniface	440821b136	Refactor cluster validation into a command wrapper Instead of using group-based validation, which breaks the help context for subcommands, use a decorator to validate the cluster status for each command. The eager help option will then override this decorator for help commands, while enforcing it for others.	2020-06-05 14:49:53 -04:00
Joshua M. Boniface	b9e5b14f94	Update lastnode too if a self-migrate is aborted References #92	2020-06-04 10:28:04 -04:00
Joshua M. Boniface	5d2031d99e	Prevent a VM migrating to the same node Prevents a rare edge case where a node can end up "migrating" to itself. Quick hack to fix this, though like most of the VM management should probably be rethought/rewritten later. Fixes #92	2020-06-04 10:26:47 -04:00
Joshua M. Boniface	9ee5ae4826	Volume and Snapshot are not sorted by ID	2020-05-29 13:43:44 -04:00
Joshua M. Boniface	48711000b0	Ensure stats sorting is by right key	2020-05-29 13:41:52 -04:00
Joshua M. Boniface	82c067b591	Sort list output in CLI client properly	2020-05-29 13:39:20 -04:00
Joshua M. Boniface	0fab7072ac	Sort all Ceph lists by numeric ID	2020-05-29 13:31:18 -04:00
Joshua M. Boniface	2d507f8b42	Ensure rbdlist is updated when modifying VM config	2020-05-12 11:08:47 -04:00
Joshua M. Boniface	5f9836f96d	Add error message to OSD parse fail	2020-05-12 11:04:38 -04:00
Joshua M. Boniface	95c59ba629	Improve flush handling slightly	2020-05-12 11:04:38 -04:00
Joshua M. Boniface	e724e73140	Don't show built-in bridges as invalid	2020-05-12 10:46:10 -04:00
Joshua M. Boniface	3cf90c46ad	Correct bad handling of static reservations	2020-05-09 10:20:06 -04:00
Joshua M. Boniface	7b2180b626	Get both reservations in leases by default	2020-05-09 10:05:55 -04:00
Joshua M. Boniface	72a38fd437	Correct changed dhcp_reservations key name	2020-05-09 10:00:53 -04:00
Joshua M. Boniface	73eb4fb457	Fix typo of macaddress in dhcp add	2020-05-09 00:15:25 -04:00
Joshua M. Boniface	b580760537	Add missing fmt_cyan variable	2020-05-08 18:15:02 -04:00
Joshua M. Boniface	683c3afea6	Correct spelling mistake	2020-05-06 11:29:42 -04:00
Joshua M. Boniface	4c7cb1a20c	Add further wording tweaks and details	2020-05-06 11:20:12 -04:00
Joshua M. Boniface	90feb83eab	Revamp some wording in the documentation	2020-05-06 10:41:13 -04:00
Joshua M. Boniface	b91923735c	Move some messages around	2020-05-05 16:19:18 -04:00
Joshua M. Boniface	34c4690d49	Don't convert bytes into KB in OVA import Doing so can create an image that is 1 sector (512 bytes) too large, which will then break qemu-img because it's stupid (or, VMDK is stupid, I haven't decided which is).. Current Ceph rbd commands seem to accept --size in bytes so this is fine.	2020-05-05 16:14:18 -04:00
Joshua M. Boniface	3e351bb84a	Add additional error checking for profile creation	2020-05-05 15:28:39 -04:00
Joshua M. Boniface	331027d124	Add further tweaks to takeover state checks Just ensure that everything is proper state before proceeding	2020-04-22 11:16:19 -04:00
Joshua M. Boniface	ae4f36b881	Hook flush into more services Trying to ensure that pvc-flush completes before anything tries to shut down.	2020-04-14 19:58:53 -04:00
Joshua M. Boniface	e451426c7c	Fix minor bugs from change in VM info handling	2020-04-13 22:56:19 -04:00
Joshua M. Boniface	611e0edd80	Reorder last keepalive during cleanup Make sure the stopping of the keepalive timer and final keepalive update are done as the last step before complete shutdown. The previous setup could conceivably result in a node being fenced should the cleanup operations take longer than ~45 seconds, for instance if primary node switchover took too long or blocked, or log watchers failed to stop quickly enough. Ensures that keepalives will continue to be run during the shutdown process until the last possible moment.	2020-04-12 03:49:29 -04:00
Joshua M. Boniface	b413e042a6	Improve handling of primary contention Previously, contention could occasionally cause a flap/dual primary contention state due to the lack of checking within this function. This could cause a state where a node transitions to primary than is almost immediately shifted away, which could cause undefined behaviour in the cluster. The solution includes several elements: * Implement an exclusive lock operation in zkhandler * Switch the become_primary function to use this exclusive lock * Implement exclusive locking during the contention process * As a failsafe, check stat versions before setting the node as the primary node, in case another node already has * Delay the start of takeover/relinquish operations by slightly longer than the lock timeout * Make the current router_state conditions more explicit (positive conditionals rather than negative conditionals) The new scenario ensures that during contention, only one secondary will ever succeed at acquiring the lock. Ideally, the other would then grab the lock and pass, but in testing this does not seem to be the case - the lock always times out, so the failsafe check is technically not needed but has been left as an added safety mechanism. With this setup, the node that fails the contention will never block the switchover nor will it try to force itself onto the cluster after another node has successfully won contention. Timeouts may need to be adjusted in the future, but the base timeout of 0.4 seconds (and transition delay of 0.5 seconds) seems to work reliably during preliminary tests.	2020-04-12 03:40:17 -04:00
Joshua M. Boniface	e672d799a6	Set flush after pvcapid.service This may or may not help, but should in theory prevent the flush from trying to run after a (locally-running) API daemon is terminated, which could cause an API failure and a failure to flush.	2020-04-12 01:48:50 -04:00
Joshua M. Boniface	59707bad4e	Fix some errors in the FAQ	2020-04-11 01:33:18 -04:00
Joshua M. Boniface	9c19813808	Fix link to FAQ page	2020-04-11 01:28:32 -04:00
Joshua M. Boniface	8fe50bea77	Add FAQ to documentation	2020-04-11 01:22:07 -04:00
Joshua M. Boniface	8faa3bb53d	Handle info fuzzy matches better If we are calling info, we want one VM. Don't silently discard other options or try (and fail later) to parse multiple, just say no VM found.	2020-04-09 10:26:49 -04:00
Joshua M. Boniface	a130f19a19	Depend pvcnoded on Zookeeper (harder) and libvirtd	2020-04-09 09:57:53 -04:00
Joshua M. Boniface	a671d9d457	Use consistent tense in messages	2020-04-08 22:00:51 -04:00
Joshua M. Boniface	fee1c7dd6c	Reorder cleanup and gracefully wait for flushes	2020-04-08 22:00:08 -04:00
Joshua M. Boniface	b3a75d8069	Use post instead of get on initialize	2020-04-06 15:05:33 -04:00
Joshua M. Boniface	c3bd6b6ecc	Add missing call into cluster initialize function	2020-04-06 14:48:26 -04:00
Joshua M. Boniface	5d58bee34f	Add some time around noded startup/shutdown Otherwise, systemd kills networking before the node daemon fully stops and it goes into "dead" status, which is super annoying.	2020-04-01 23:59:14 -04:00

1 2 3 4 5 ...

1695 Commits