parallelvirtualcluster/pvc - pvc

Commit Graph

Author	SHA1	Message	Date
Joshua Boniface	7c99a7bda7	Safely reset RBD locks on failed VMs Should correct issues on cold start as well as if a VM crashes uncleanly, which would prevent the VM from starting due to stale RBD locks. This implementation has four parts: 1. Update how IP addresses are handled, specifically by replacing all previous instances of "vni_ipaddr" with "vni_floatingipaddr", and then adding the "vni_ipaddr" with the real data for this node's IPs. Also include the storage IPs in this where they weren't before, so each this_node actually has the local IPs plus floating IPs. This enables the next two steps. 2. Modify flush_locks to take this_node as an argument, and update the run_command function to only operate against this node, rather than on the primary coordinator. 3. Have the flush_locks check each lock against the current node, to verify that the lock is actually held by the current node. This is the only way to do this safely. During fencing, we override this by not passing a this_node which bypasses this check. 4. Have the VM start do the check for VM failure/startup and execute a flush_locks before actually starting the VM.	2020-12-14 15:53:18 -05:00
Joshua Boniface	260b39ebf2	Lint: E302 expected 2 blank lines, found X	2020-11-07 14:45:24 -05:00
Joshua Boniface	7932be3948	Lint: E261 at least two spaces before inline comment	2020-11-07 13:11:03 -05:00
Joshua Boniface	3f242cd437	Lint: E202 whitespace before '}'	2020-11-07 12:57:42 -05:00
Joshua Boniface	e333f2b935	Lint: E201 whitespace after '{'	2020-11-07 12:38:31 -05:00
Joshua Boniface	d9e7b7ec15	Lint: F401 <library> imported but unused	2020-11-06 19:22:49 -05:00
Joshua Boniface	63f4f9aed7	Lint: E722 do not use bare 'except'	2020-11-06 18:55:10 -05:00
Joshua Boniface	224c8082ef	Alter text of synchronization messages	2020-10-20 13:08:18 -04:00
Joshua Boniface	726501f4d4	Add additional logging to flush selector Adds additional debug logging to the flush selector to determine how any why any given node is selected. Useful for troubleshooting strange choices.	2020-10-20 12:34:18 -04:00
Joshua Boniface	0e5c681ada	Clean up imports Make several imports more specific to reduce redundant code imports and improve memory utilization.	2020-08-11 12:09:10 -04:00
Joshua Boniface	95c59ba629	Improve flush handling slightly	2020-05-12 11:04:38 -04:00
Joshua Boniface	b413e042a6	Improve handling of primary contention Previously, contention could occasionally cause a flap/dual primary contention state due to the lack of checking within this function. This could cause a state where a node transitions to primary than is almost immediately shifted away, which could cause undefined behaviour in the cluster. The solution includes several elements: * Implement an exclusive lock operation in zkhandler * Switch the become_primary function to use this exclusive lock * Implement exclusive locking during the contention process * As a failsafe, check stat versions before setting the node as the primary node, in case another node already has * Delay the start of takeover/relinquish operations by slightly longer than the lock timeout * Make the current router_state conditions more explicit (positive conditionals rather than negative conditionals) The new scenario ensures that during contention, only one secondary will ever succeed at acquiring the lock. Ideally, the other would then grab the lock and pass, but in testing this does not seem to be the case - the lock always times out, so the failsafe check is technically not needed but has been left as an added safety mechanism. With this setup, the node that fails the contention will never block the switchover nor will it try to force itself onto the cluster after another node has successfully won contention. Timeouts may need to be adjusted in the future, but the base timeout of 0.4 seconds (and transition delay of 0.5 seconds) seems to work reliably during preliminary tests.	2020-04-12 03:40:17 -04:00
Joshua Boniface	0a367898a0	Don't trigger aggregator fail if fine	2020-03-12 13:22:12 -04:00
Joshua Boniface	d2a5fe59c0	Use transitional takeover states for migration Use a pair of transitional states, "takeover" and "relinquish", when transitioning between primary and secondary coordinator states. This provides a clsuter-wide record that the nodes are still working during their synchronous transition states, and should allow clients to determine when the node(s) have fully switched over. Also add an additional 2 seconds of wait at the end of the transition jobs to ensure everything has had a chance to start before proceeding. References #72	2020-02-19 14:06:54 -05:00
Joshua Boniface	ce985234c3	Use consistent naming of components Rename "pvcd" to "pvcnoded", and "pvc-api" to "pvcapid" so names for the daemons are fully consistent. Update the names of the configuration files as well to match this new formatting. References #79	2020-02-08 19:34:07 -05:00

15 Commits