parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	e451426c7c	Fix minor bugs from change in VM info handling	2020-04-13 22:56:19 -04:00
Joshua M. Boniface	611e0edd80	Reorder last keepalive during cleanup Make sure the stopping of the keepalive timer and final keepalive update are done as the last step before complete shutdown. The previous setup could conceivably result in a node being fenced should the cleanup operations take longer than ~45 seconds, for instance if primary node switchover took too long or blocked, or log watchers failed to stop quickly enough. Ensures that keepalives will continue to be run during the shutdown process until the last possible moment.	2020-04-12 03:49:29 -04:00
Joshua M. Boniface	b413e042a6	Improve handling of primary contention Previously, contention could occasionally cause a flap/dual primary contention state due to the lack of checking within this function. This could cause a state where a node transitions to primary than is almost immediately shifted away, which could cause undefined behaviour in the cluster. The solution includes several elements: * Implement an exclusive lock operation in zkhandler * Switch the become_primary function to use this exclusive lock * Implement exclusive locking during the contention process * As a failsafe, check stat versions before setting the node as the primary node, in case another node already has * Delay the start of takeover/relinquish operations by slightly longer than the lock timeout * Make the current router_state conditions more explicit (positive conditionals rather than negative conditionals) The new scenario ensures that during contention, only one secondary will ever succeed at acquiring the lock. Ideally, the other would then grab the lock and pass, but in testing this does not seem to be the case - the lock always times out, so the failsafe check is technically not needed but has been left as an added safety mechanism. With this setup, the node that fails the contention will never block the switchover nor will it try to force itself onto the cluster after another node has successfully won contention. Timeouts may need to be adjusted in the future, but the base timeout of 0.4 seconds (and transition delay of 0.5 seconds) seems to work reliably during preliminary tests.	2020-04-12 03:40:17 -04:00
Joshua M. Boniface	e672d799a6	Set flush after pvcapid.service This may or may not help, but should in theory prevent the flush from trying to run after a (locally-running) API daemon is terminated, which could cause an API failure and a failure to flush.	2020-04-12 01:48:50 -04:00
Joshua M. Boniface	59707bad4e	Fix some errors in the FAQ	2020-04-11 01:33:18 -04:00
Joshua M. Boniface	9c19813808	Fix link to FAQ page	2020-04-11 01:28:32 -04:00
Joshua M. Boniface	8fe50bea77	Add FAQ to documentation	2020-04-11 01:22:07 -04:00
Joshua M. Boniface	8faa3bb53d	Handle info fuzzy matches better If we are calling info, we want one VM. Don't silently discard other options or try (and fail later) to parse multiple, just say no VM found.	2020-04-09 10:26:49 -04:00
Joshua M. Boniface	a130f19a19	Depend pvcnoded on Zookeeper (harder) and libvirtd	2020-04-09 09:57:53 -04:00
Joshua M. Boniface	a671d9d457	Use consistent tense in messages	2020-04-08 22:00:51 -04:00
Joshua M. Boniface	fee1c7dd6c	Reorder cleanup and gracefully wait for flushes	2020-04-08 22:00:08 -04:00
Joshua M. Boniface	b3a75d8069	Use post instead of get on initialize	2020-04-06 15:05:33 -04:00
Joshua M. Boniface	c3bd6b6ecc	Add missing call into cluster initialize function	2020-04-06 14:48:26 -04:00
Joshua M. Boniface	5d58bee34f	Add some time around noded startup/shutdown Otherwise, systemd kills networking before the node daemon fully stops and it goes into "dead" status, which is super annoying.	2020-04-01 23:59:14 -04:00
Joshua M. Boniface	f668412941	Don't use Requires as the dep is too hard Requires seems to flush on every service restart which is NOT what we want. Use Wants instead.	2020-04-01 15:15:37 -04:00
Joshua M. Boniface	a0ebc0d3a7	Add more robust requirements to pvc-flush service	2020-04-01 15:09:44 -04:00
Joshua M. Boniface	98a7005c1b	Add significant TimeoutSec to pvc-flush service This will stop systemd from killing the service in the middle of a flush or unflush operation, which completely defeats the purpose. 30 minutes was chosen as this is a very large but still somewhat manageable value, which should cover even a very large very loaded cluster with room to spare.	2020-04-01 01:24:09 -04:00
Joshua M. Boniface	44efd66f2c	Fix error renaming keys This function was not implemented and thus failed; implements it.	2020-03-30 21:38:18 -04:00
Joshua M. Boniface	09aeb33d13	Don't convert non-integer bytes/ops	2020-03-30 19:09:16 -04:00
Joshua M. Boniface	6563053f6c	Add underlying OS and architecture blurbs	2020-03-25 15:54:03 -04:00
Joshua M. Boniface	862f7ee9a8	Reword the opening paragraph	2020-03-25 15:42:51 -04:00
Joshua M. Boniface	97a560fcbe	Update cluster documentation Add a TOC, add additional sections, improve wording in some sections, spellcheck.	2020-03-25 15:38:00 -04:00
Joshua M. Boniface	d84e94eff4	Add force_single_node script v0.7	2020-03-25 10:48:49 -04:00
Joshua M. Boniface	ce9d0e9603	Add helper scripts to CLI client	2020-03-22 01:19:55 -04:00
Joshua M. Boniface	3aea5ae34b	Correct invalid function call	2020-03-21 16:46:34 -04:00
Joshua M. Boniface	3f5076d9ca	Revamp some architecture documentation	2020-03-15 18:07:05 -04:00
Joshua M. Boniface	8ed602ef9c	Update getting started paragraph	2020-03-15 17:50:16 -04:00
Joshua M. Boniface	e501345e44	Revamp GitHub notice	2020-03-15 17:39:06 -04:00
Joshua M. Boniface	d8f97d090a	Update title in README	2020-03-15 17:37:30 -04:00
Joshua M. Boniface	082648f3b2	Mention Zookeeper in initial paragraph	2020-03-15 17:36:12 -04:00
Joshua M. Boniface	2df8f5d407	Fix pvcapid config in migrations script	2020-03-15 17:33:27 -04:00
Joshua M. Boniface	ca65cb66b8	Update Debian changelog	2020-03-15 17:32:12 -04:00
Joshua M. Boniface	616d7c43ed	Add additional info about OVA deployment	2020-03-15 17:31:12 -04:00
Joshua M. Boniface	4fe3a73980	Reorganize manuals and architecture pages	2020-03-15 17:19:51 -04:00
Joshua M. Boniface	26084741d0	Update README and index for 0.7	2020-03-15 17:17:17 -04:00
Joshua M. Boniface	4a52ff56b9	Catch failures in getPoolInformation Fixes #90	2020-03-15 16:58:13 -04:00
Joshua M. Boniface	0a367898a0	Don't trigger aggregator fail if fine	2020-03-12 13:22:12 -04:00
Joshua M. Boniface	ca5327b908	Make strtobool even more robust If strtobool fails, return False always.	2020-03-09 09:30:16 -04:00
Joshua M. Boniface	d36d8e0637	Use custom strtobool to handle weird edge cases	2020-03-06 09:40:13 -05:00
Joshua M. Boniface	36588a3a81	Work around bad RequestArgs handling	2020-03-03 16:48:20 -05:00
Joshua M. Boniface	c02bc0b46a	Correct issues with VM lock freeing Code was bad and using a depricated feature.	2020-03-02 12:45:12 -05:00
Joshua M. Boniface	1e4350ca6f	Properly handle takeover state in VXNetworks Most of these actions/conditionals were looking for primary state, but were failing during node takeover. Update the conditionals to look for both router states instead. Also add a wait to lock flushing until a takeover is completed.	2020-03-02 10:41:00 -05:00
Joshua M. Boniface	b8852e116e	Improve handling of root disk in GRUB Since vdX names become sdX names inside VMs, use the same setup as the fstab in order to map this onto a static SCSI ID.	2020-03-02 10:02:39 -05:00
Joshua M. Boniface	9e468d3524	Increase build-and-deploy wait time to 15	2020-02-27 14:32:01 -05:00
Joshua M. Boniface	11f045f100	Support showing individual userdata and script doc Closes #89	2020-02-27 14:31:08 -05:00
Joshua M. Boniface	fd80eb9e22	Ensure profile creation works with empty lists If we get a 404 code back from the upper function, we should create an empty list rather than trying to loop through the dictionary.	2020-02-24 09:30:58 -05:00
Joshua M. Boniface	6ac82d6ce9	Ensure single-element templates are lists Ensures any list-assuming statements later on hold true even when there is only a single template entry.	2020-02-21 10:50:28 -05:00
Joshua M. Boniface	b438b9b4c2	Import gevent for production listener	2020-02-21 09:39:07 -05:00
Joshua M. Boniface	4417bd374b	Add Python requests toolbelt to CLI deps	2020-02-20 23:27:07 -05:00
Joshua M. Boniface	9d5f50f82a	Implement progress bars for file uploads Provide pretty status bars to indicate upload progress for tasks that perform large file uploads to the API ('provisioner ova upload' and 'storage volume upload') so the administrator can gauge progress and estimated time to completion.	2020-02-20 22:42:19 -05:00

... 7 8 9 10 11 ...

2059 Commits