parallelvirtualcluster/pvc - pvc

Commit Graph

Author	SHA1	Message	Date
Joshua Boniface	b3483fa810	Add explicit returns from flush/ready threads	2019-12-26 11:08:00 -05:00
Joshua Boniface	47cf0a8006	Ensure migration out occurs	2019-12-25 21:11:02 -05:00
Joshua Boniface	77db36a891	Ensure migration out occurs	2019-12-25 21:02:46 -05:00
Joshua Boniface	9a39d739e8	Ensure we empty of flush_thread	2019-12-25 20:29:17 -05:00
Joshua Boniface	a66b834ae4	Fix several small bugs	2019-12-19 18:58:53 -05:00
Joshua Boniface	8c252aeecc	Implemented coordinated locked node transitions The previous method was a "throw it in the sea"-type migration with some (very arbitrary) sleep statements thrown in for good measure. Reimplement this with some hard locking. During each phase of the transition, the nodes acquire read/write shared locks to a Zookeeper key so that they can tightly coordinate the actions of transferring each part of the primary state between them. This is done in a subthread to prevent strange blocking issues that were encountered, likely due to business in the existing main thread.	2019-12-19 10:56:34 -05:00
Joshua Boniface	98764f1edd	Clean up some aspects of node switchover	2019-12-18 21:39:40 -05:00
Joshua Boniface	23188199cb	Handle failing Patroni events more gracefully	2019-12-18 21:12:22 -05:00
Joshua Boniface	b3e21a5bf8	Integrate metadata API into node daemon	2019-12-14 16:41:01 -05:00
Joshua Boniface	8c36e7618a	Modify node daemon to follow API	2019-12-14 14:13:26 -05:00
Joshua Boniface	6fa828e721	Don't stop the provisioner worker It should probably just be running on all nodes all the time already, but is started when a node first becomes primary.	2019-12-12 23:08:02 -05:00
Joshua Boniface	c1b6ce0ff7	Reorder starting clients	2019-12-12 23:03:34 -05:00
Joshua Boniface	b854d53fab	Add API management to node daemon	2019-12-12 22:59:07 -05:00
Joshua Boniface	03447d3374	Update copyright string year to include 2019	2019-10-13 12:09:51 -04:00
Joshua Boniface	18fc49fc6c	Use node instead of hypervisor consistently	2019-10-12 01:59:08 -04:00
Joshua Boniface	5995353597	Implement VM metadata and use it Implements the storing of three VM metadata attributes: 1. Node limits - allows specifying a list of hosts on which the VM must run. This limit influences the migration behaviour of VMs. 2. Per-VM node selectors - allows each VM to have its migration autoselection method specified, to automatically allow different methods per VM based on the administrator's preferences. 3. VM autorestart - allows a VM to be automatically restarted from a stopped state, presumably due to a failure to find a target node (either due to limits or otherwise) during a flush/fence recovery, on the next node unflush/ready state of its home hypervisor. Useful mostly in conjunction with limits to ensure that VMs which were shut down due to there being no valid migration targets are started back up when their node becomes ready again. Includes the full client interaction with these metadata options, including printing, as well as defining a new function to modify this metadata. For the CLI it is set/modified either on `vm define` or via the `vm meta` command. For the API it is set/modified either on a POST to the `/vm` endpoint (during VM definition) or on POST to the `/vm/<vm>` endpoint. For the API this replaces the previous reserved word for VM creation from scratch as this will no longer be implemented in-daemon (see #22). Closes #52	2019-10-12 01:17:39 -04:00
Joshua Boniface	7e77752ce5	Add limit to Patroni switchover attempts	2019-08-07 11:46:42 -04:00
Joshua Boniface	e2ae58b62c	Add the missing newline to the string compare	2019-08-04 17:00:33 -04:00
Joshua Boniface	d0d5ab4425	Fix bug if the switchover target is the same	2019-08-04 16:51:11 -04:00
Joshua Boniface	a329376d33	Lock primary_node key during primary switchover Also implements a looping to switch over the Patroni leader to ensure this always follows the primary and clean up the code around here a bit.	2019-08-04 16:42:06 -04:00
Joshua Boniface	f30be555c1	Improve message output for logging Improve some formatting of the messages being printed to make it nicer for long-term logging.	2019-07-10 22:38:32 -04:00
Joshua Boniface	8f160abf90	Handle cancelling flushes when new ones run Store the flush_thread of a node as a class object. Before starting a new flush thread (either flush or unflush), stop the existing one if it exists to prevent further migrations, then start the new thread. Set the object to None on init and again once the task actually finishes. Remove the inflush flag as this is not required when using these threads and functionally does nothing any longer, but add the flush_stopper flag to trigger cancellation of the current job.	2019-07-10 11:54:34 -04:00
Joshua Boniface	c7c8c8bcbb	Fix bug with flush	2019-07-10 00:43:55 -04:00
Joshua Boniface	7a8aee9fe7	Remove flush locking functionality This just seemed like more trouble that it was worth. Flush locks were originally intended as a way to counteract the weird issues around flushing that were mostly fixed by the code refactoring, so this will help test if those issues are truly gone. If not, will look into a cleaner solution that doesn't result in unchangeable states.	2019-07-09 23:59:17 -04:00
Joshua Boniface	17dfaf43c5	Move hypervisor selection out to common	2019-07-09 14:20:58 -04:00
Joshua Boniface	bc54ea2449	Log message when starting or stopping API client	2019-07-08 19:29:49 -04:00
Joshua Boniface	d9ebd04264	Fix missing dom_uuid values in data reads	2019-07-07 15:30:28 -04:00
Joshua Boniface	b82ccaa84d	Improve flush handling Similar to recent client changes, don't replace the previous node record of an already-migrated VM. Wait for shutdown if required. Use a continue statement instead of a needless else block.	2019-07-07 15:27:37 -04:00
Joshua Boniface	8216125b02	Enable autostart of API client on Primary Adds a config flag that turns on the API client following the Primary coordinator. The retcode of the start/stop commands is ignore so this can fail gracefully if e.g. the client isn't installed.	2019-07-06 02:42:56 -04:00
Joshua Boniface	3e591bd09e	Remove extra whitespaces on blank lines	2019-06-25 22:33:23 -04:00
Joshua Boniface	8ef21cf9f2	Sleep longer before removing gateways 1 second was just slightly too little time to wait and packets would occasionally be lost on primary switchover. Increase this to 2 seconds to provide more time for arping to run on the new primary.	2019-05-23 22:20:38 -04:00
Joshua Boniface	595cf1782c	Switch DNS aggregator to PostgreSQL MariaDB+Galera was terribly unstable, with the cluster failing to start or dying randomly, and generally seemed incredibly unsuitable for an HA solution. This commit switches the DNS aggregator SQL backend to PostgreSQL, implemented via Patroni HA. It also manages the Patroni state, forcing the primary instance to follow the PVC coordinator, such that the active DNS Aggregator instance is always able to communicate read+write with the local system. This required some logic changes to how the DNS Aggregator worked, specifically ensuring that database changes aren't attempted while the instance isn't actively running - to be honest this was a bug anyways that had just never been noticed. Closes #34	2019-05-21 01:07:41 -04:00
Joshua Boniface	3cf573baf6	Update domainstate after unflush is complete	2019-05-11 00:55:15 -04:00
Joshua Boniface	516ea1b57c	Handle unflushes like flushes squentially Makes an unflush a controlled event like flushing, rather than a free-for-all. This does slow down unflushing somewhat (disallowing parallelism from multiple hosts to the current host), but allows the locking to actually be effective.	2019-05-11 00:30:47 -04:00
Joshua Boniface	62a71af46e	Implement locking for unflush as well References #32	2019-05-11 00:13:03 -04:00
Joshua Boniface	9d8c886811	Correct typo in flush_lock write	2019-05-11 00:08:07 -04:00
Joshua Boniface	c19902d952	Implement flush locking for nodes Implements a locking mechanism to prevent clobbering of node flushes. When a flush begins, a global cluster lock is placed which is freed once the flush completes. While the lock is in place, other flush events queue waiting for the lock to free before proceeding. Modifies the CLI output flow when the `--wait` option is specified. First, if a lock exists when running the command, the message is tweaked to indicate this, and the client will wait first for the lock to free, and then for the flush as normal. Second, the wait depends on the active lock rather than the domain_status for consistency purposes. Closes #32	2019-05-10 23:52:24 -04:00
Joshua Boniface	0dbd1c41a9	Create floating VNI address on brcluster	2019-03-18 20:17:26 -04:00
Joshua Boniface	d90fb07240	Move to YAML config and allow split functions 1. Move to a YAML-based configuration format instead of the original INI-based configuration to facilitate better organization and readability. 2. Modify the daemon to be able to operate in several modes based on configuration flags. Either networking or storage functions can be disabled using the configuration, allowing the PVC system to be used only for hypervisor management if required.	2019-03-11 01:47:40 -04:00
Joshua Boniface	be37dd954b	Fix output message inconsistency	2018-12-05 23:56:20 -05:00
Joshua Boniface	3ff4e9da29	Remove some cruft	2018-11-20 21:11:23 -05:00
Joshua Boniface	a421bde679	Fix up a few more bugs	2018-11-18 17:29:35 -05:00
Joshua Boniface	1f58d61cb0	Rewrite DNSAggregatorInstance to handle DNS well Trying to directly AXFR from dnsmasq is a mess, since their zone is barely compliant with spec, it doesn't support notifies, and it is generally really messy. This implements an advanced "AXFR parser" system, which looks at the results of an AXFR from the local dnsmasq instances per-network, and updates the real replicated MariaDB pdns backend cluster with the changed data. This allows a sensible, transferable zone with its own SOA that is dynamically reconfigured as hosts come and go from the dnsmasq zone.	2018-11-18 16:45:52 -05:00
Joshua Boniface	4c1e1b4622	Make everything work with dual-stack	2018-11-14 00:26:52 -05:00
Joshua Boniface	bfbe9188ce	Finish setup of Ceph OSD addition and basic management	2018-10-29 17:51:25 -04:00
Joshua Boniface	d8796fd6d6	Move IP creation/removal to common function	2018-10-27 16:31:31 -04:00
Joshua Boniface	fd27d3f544	Add and remove dnsaggregator nets on primary change	2018-10-25 22:09:32 -04:00
Joshua Boniface	6b5fa3d50b	Move Zookeeper update out of NodeInstance and into the main Daemon	2018-10-22 21:01:59 -04:00
Joshua Boniface	bfd42b5a7b	Make primary watching happen in the daemon not the Node object	2018-10-21 22:08:23 -04:00
Joshua Boniface	187a572c13	Make a whole bunch of things work	2018-10-17 20:05:22 -04:00

1 2

55 Commits