parallelvirtualcluster/pvc - pvc

Commit Graph

Author	SHA1	Message	Date
Joshua Boniface	f30be555c1	Improve message output for logging Improve some formatting of the messages being printed to make it nicer for long-term logging.	2019-07-10 22:38:32 -04:00
Joshua Boniface	8f160abf90	Handle cancelling flushes when new ones run Store the flush_thread of a node as a class object. Before starting a new flush thread (either flush or unflush), stop the existing one if it exists to prevent further migrations, then start the new thread. Set the object to None on init and again once the task actually finishes. Remove the inflush flag as this is not required when using these threads and functionally does nothing any longer, but add the flush_stopper flag to trigger cancellation of the current job.	2019-07-10 11:54:34 -04:00
Joshua Boniface	c7c8c8bcbb	Fix bug with flush	2019-07-10 00:43:55 -04:00
Joshua Boniface	7a8aee9fe7	Remove flush locking functionality This just seemed like more trouble that it was worth. Flush locks were originally intended as a way to counteract the weird issues around flushing that were mostly fixed by the code refactoring, so this will help test if those issues are truly gone. If not, will look into a cleaner solution that doesn't result in unchangeable states.	2019-07-09 23:59:17 -04:00
Joshua Boniface	17dfaf43c5	Move hypervisor selection out to common	2019-07-09 14:20:58 -04:00
Joshua Boniface	bc54ea2449	Log message when starting or stopping API client	2019-07-08 19:29:49 -04:00
Joshua Boniface	d9ebd04264	Fix missing dom_uuid values in data reads	2019-07-07 15:30:28 -04:00
Joshua Boniface	b82ccaa84d	Improve flush handling Similar to recent client changes, don't replace the previous node record of an already-migrated VM. Wait for shutdown if required. Use a continue statement instead of a needless else block.	2019-07-07 15:27:37 -04:00
Joshua Boniface	8216125b02	Enable autostart of API client on Primary Adds a config flag that turns on the API client following the Primary coordinator. The retcode of the start/stop commands is ignore so this can fail gracefully if e.g. the client isn't installed.	2019-07-06 02:42:56 -04:00
Joshua Boniface	3e591bd09e	Remove extra whitespaces on blank lines	2019-06-25 22:33:23 -04:00
Joshua Boniface	8ef21cf9f2	Sleep longer before removing gateways 1 second was just slightly too little time to wait and packets would occasionally be lost on primary switchover. Increase this to 2 seconds to provide more time for arping to run on the new primary.	2019-05-23 22:20:38 -04:00
Joshua Boniface	595cf1782c	Switch DNS aggregator to PostgreSQL MariaDB+Galera was terribly unstable, with the cluster failing to start or dying randomly, and generally seemed incredibly unsuitable for an HA solution. This commit switches the DNS aggregator SQL backend to PostgreSQL, implemented via Patroni HA. It also manages the Patroni state, forcing the primary instance to follow the PVC coordinator, such that the active DNS Aggregator instance is always able to communicate read+write with the local system. This required some logic changes to how the DNS Aggregator worked, specifically ensuring that database changes aren't attempted while the instance isn't actively running - to be honest this was a bug anyways that had just never been noticed. Closes #34	2019-05-21 01:07:41 -04:00
Joshua Boniface	3cf573baf6	Update domainstate after unflush is complete	2019-05-11 00:55:15 -04:00
Joshua Boniface	516ea1b57c	Handle unflushes like flushes squentially Makes an unflush a controlled event like flushing, rather than a free-for-all. This does slow down unflushing somewhat (disallowing parallelism from multiple hosts to the current host), but allows the locking to actually be effective.	2019-05-11 00:30:47 -04:00
Joshua Boniface	62a71af46e	Implement locking for unflush as well References #32	2019-05-11 00:13:03 -04:00
Joshua Boniface	9d8c886811	Correct typo in flush_lock write	2019-05-11 00:08:07 -04:00
Joshua Boniface	c19902d952	Implement flush locking for nodes Implements a locking mechanism to prevent clobbering of node flushes. When a flush begins, a global cluster lock is placed which is freed once the flush completes. While the lock is in place, other flush events queue waiting for the lock to free before proceeding. Modifies the CLI output flow when the `--wait` option is specified. First, if a lock exists when running the command, the message is tweaked to indicate this, and the client will wait first for the lock to free, and then for the flush as normal. Second, the wait depends on the active lock rather than the domain_status for consistency purposes. Closes #32	2019-05-10 23:52:24 -04:00
Joshua Boniface	0dbd1c41a9	Create floating VNI address on brcluster	2019-03-18 20:17:26 -04:00
Joshua Boniface	d90fb07240	Move to YAML config and allow split functions 1. Move to a YAML-based configuration format instead of the original INI-based configuration to facilitate better organization and readability. 2. Modify the daemon to be able to operate in several modes based on configuration flags. Either networking or storage functions can be disabled using the configuration, allowing the PVC system to be used only for hypervisor management if required.	2019-03-11 01:47:40 -04:00
Joshua Boniface	be37dd954b	Fix output message inconsistency	2018-12-05 23:56:20 -05:00
Joshua Boniface	3ff4e9da29	Remove some cruft	2018-11-20 21:11:23 -05:00
Joshua Boniface	a421bde679	Fix up a few more bugs	2018-11-18 17:29:35 -05:00
Joshua Boniface	1f58d61cb0	Rewrite DNSAggregatorInstance to handle DNS well Trying to directly AXFR from dnsmasq is a mess, since their zone is barely compliant with spec, it doesn't support notifies, and it is generally really messy. This implements an advanced "AXFR parser" system, which looks at the results of an AXFR from the local dnsmasq instances per-network, and updates the real replicated MariaDB pdns backend cluster with the changed data. This allows a sensible, transferable zone with its own SOA that is dynamically reconfigured as hosts come and go from the dnsmasq zone.	2018-11-18 16:45:52 -05:00
Joshua Boniface	4c1e1b4622	Make everything work with dual-stack	2018-11-14 00:26:52 -05:00
Joshua Boniface	bfbe9188ce	Finish setup of Ceph OSD addition and basic management	2018-10-29 17:51:25 -04:00
Joshua Boniface	d8796fd6d6	Move IP creation/removal to common function	2018-10-27 16:31:31 -04:00
Joshua Boniface	fd27d3f544	Add and remove dnsaggregator nets on primary change	2018-10-25 22:09:32 -04:00
Joshua Boniface	6b5fa3d50b	Move Zookeeper update out of NodeInstance and into the main Daemon	2018-10-22 21:01:59 -04:00
Joshua Boniface	bfd42b5a7b	Make primary watching happen in the daemon not the Node object	2018-10-21 22:08:23 -04:00
Joshua Boniface	187a572c13	Make a whole bunch of things work	2018-10-17 20:05:22 -04:00
Joshua Boniface	87d1c7513e	Add floating IPs and better termination of daemons	2018-10-17 00:23:43 -04:00
Joshua Boniface	c13a4e84af	Add DNS aggregator via PowerDNS and sqlite3	2018-10-15 21:09:40 -04:00
Joshua Boniface	a5c76c5d41	Use new-style class definitions	2018-10-14 22:14:29 -04:00
Joshua Boniface	d4e5015db4	Shorten this string	2018-10-14 03:08:11 -04:00
Joshua Boniface	f198f62563	Massive rejigger into single daemon Completely restructure the daemon code to move the 4 discrete daemons into a single daemon that can be run on every hypervisor. Introduce the idea of a static list of "coordinator" nodes which are configured at install time to run Zookeeper and FRR in router mode, and which are allowed to take on client network management duties (gateway, DHCP, DNS, etc.) while also allowing them to run VMs (i.e. no dedicated "router" nodes required).	2018-10-14 02:40:54 -04:00

35 Commits