parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	1671a87dd4	Fix the flush service	2020-01-11 17:04:12 -05:00
Joshua M. Boniface	b6474198a4	Implement cluster maintenance mode Implements a "maintenance mode" for PVC clusters. For now, the only thing this mode does is disable node fencing while the state is true. This allows the administrator to tell PVC that network connectivity, etc. might be interrupted and to avoid fencing nodes. Closes #70	2020-01-09 10:53:27 -05:00
Joshua M. Boniface	4e5bce4975	Update copyright header year to 2020	2020-01-08 19:38:02 -05:00
Joshua M. Boniface	c515d63340	Add provision state for VMs	2020-01-08 17:40:02 -05:00
Joshua M. Boniface	21d87f5e51	Add v6 configurations to dnsmasq These options were only applied with v4 networks; now, use the v6 address in a dual-stack or v6-only network.	2020-01-06 23:48:04 -05:00
Joshua M. Boniface	f326fd99e2	Properly fix IPv4 no-DHCP networking	2020-01-06 22:31:37 -05:00
Joshua M. Boniface	38dae8b32f	Change name of cluster in patronictl command	2020-01-06 16:37:17 -05:00
Joshua M. Boniface	2d2bdb879e	Use get() instead of direct dict reference	2020-01-06 16:34:39 -05:00
Joshua M. Boniface	30d4470c8f	Only print AXFR errors in debug mode	2020-01-06 16:04:37 -05:00
Joshua M. Boniface	bbfadac5e1	Fix dnsmasq options for DHCP-disabled networks	2020-01-06 16:04:26 -05:00
Joshua M. Boniface	7b3e267f7a	Implement bridge_device for bridged VNIs Required due to #64. Bridged networks were being created on top of a vLAN if the Cluster network was a vLAN device, rather than being created on the underlying device. This came from a previous revision of the cluster architecture guidelines where Cluster was supposed to be a raw device rather than a vLAN. This fixed the problem by implementing a configuration field for a "bridge_device", a NIC device that can then have the bridged vLANs created on top of it. Fixes #64	2020-01-06 14:44:56 -05:00
Joshua M. Boniface	094ac8c3a8	Ensure stdout is used	2020-01-06 12:34:35 -05:00
Joshua M. Boniface	13548b791d	Add additional debugging and fix pool_idx loop var	2020-01-06 11:31:22 -05:00
Joshua M. Boniface	e7bc4f7328	Handle empty None-type hostname	2020-01-05 22:46:56 -05:00
Joshua M. Boniface	be20ba02a7	Handle VM states in flush more accurately We don't want to block forever on a failure, so limit valid waiting states to just those we know it should be in during a migration.	2020-01-05 15:21:16 -05:00
Joshua M. Boniface	7311fa561b	Fix bad join with new table name	2020-01-04 15:17:27 -05:00
Joshua M. Boniface	bf89050e8b	Update userdata table name	2020-01-04 15:10:37 -05:00
Joshua M. Boniface	20ae2186f9	Run VM state actions in a thread Prevents blocking the main thread(s) while a VM is changing state. In particular, this caused some issues with nodes not responding to cancellation/reversal of a flush/ready state until the previous migration was finished, which could cause issues. This entire subset of actions is now threaded and so can run on its own in the background.	2019-12-26 11:08:16 -05:00
Joshua M. Boniface	b3483fa810	Add explicit returns from flush/ready threads	2019-12-26 11:08:00 -05:00
Joshua M. Boniface	47cf0a8006	Ensure migration out occurs	2019-12-25 21:11:02 -05:00
Joshua M. Boniface	77db36a891	Ensure migration out occurs	2019-12-25 21:02:46 -05:00
Joshua M. Boniface	9a39d739e8	Ensure we empty of flush_thread	2019-12-25 20:29:17 -05:00
Joshua M. Boniface	a66b834ae4	Fix several small bugs	2019-12-19 18:58:53 -05:00
Joshua M. Boniface	b17b7bf22b	Add black magic to minimize ping losses This particular arping interval/count, along with forcing it to run in the foreground, seems to minimize the packet loss when the primary coordinator transitions. Through extensive testing, this value results in the, consistently, least amount of loss: 1-2 pings, at an 0.025s ping interval, return "TTL exceeded", with no other loss, and only when the node the test VM is on is the one switching to secondary state. No other combination of values here, nor tweaks to other parts of the code, seem able to reduce this further, therefore this is likely the best configuration possible.	2019-12-19 18:57:32 -05:00
Joshua M. Boniface	8c252aeecc	Implemented coordinated locked node transitions The previous method was a "throw it in the sea"-type migration with some (very arbitrary) sleep statements thrown in for good measure. Reimplement this with some hard locking. During each phase of the transition, the nodes acquire read/write shared locks to a Zookeeper key so that they can tightly coordinate the actions of transferring each part of the primary state between them. This is done in a subthread to prevent strange blocking issues that were encountered, likely due to business in the existing main thread.	2019-12-19 10:56:34 -05:00
Joshua M. Boniface	0841ddf8b0	Handle integrity errors in DNS aggregator	2019-12-19 10:45:06 -05:00
Joshua M. Boniface	98764f1edd	Clean up some aspects of node switchover	2019-12-18 21:39:40 -05:00
Joshua M. Boniface	23188199cb	Handle failing Patroni events more gracefully	2019-12-18 21:12:22 -05:00
Joshua M. Boniface	2b1b78622e	Fix invalid arping option It made little difference and didn't error, but was incorrect.	2019-12-18 12:06:40 -05:00
Joshua M. Boniface	364ab10673	Add slight delay when stopping the metadata API	2019-12-18 11:56:04 -05:00
Joshua Boniface	39c9f911cc	Increase arping interval to 0.2s	2019-12-15 14:55:34 -05:00
Joshua Boniface	686af31c08	Reduce arping interval to 0.1s	2019-12-15 12:30:45 -05:00
Joshua Boniface	0a94fac407	Fix bugs around passing master Was not passing properly and getting stuck sometimes, so modify the checking and route creation a bit to prevent it. Seems to work.	2019-12-15 00:08:18 -05:00
Joshua Boniface	b3e21a5bf8	Integrate metadata API into node daemon	2019-12-14 16:41:01 -05:00
Joshua Boniface	8c36e7618a	Modify node daemon to follow API	2019-12-14 14:13:26 -05:00
Joshua Boniface	78f053d81f	Recreate network in aggregator if DNS changes	2019-12-13 00:03:47 -05:00
Joshua Boniface	0a8dd30a48	Restart dnsmasq when network details change	2019-12-12 23:51:22 -05:00
Joshua Boniface	6fa828e721	Don't stop the provisioner worker It should probably just be running on all nodes all the time already, but is started when a node first becomes primary.	2019-12-12 23:08:02 -05:00
Joshua Boniface	c1b6ce0ff7	Reorder starting clients	2019-12-12 23:03:34 -05:00
Joshua Boniface	b854d53fab	Add API management to node daemon	2019-12-12 22:59:07 -05:00
Joshua Boniface	88a181b20d	Allow metadata API in nft rules	2019-12-11 17:04:29 -05:00
Joshua Boniface	1fb560e996	Add DNS nameservers to networks	2019-12-08 23:55:45 -05:00
Joshua Boniface	9cb5561e77	Move default NS record to upstream_domain	2019-12-08 23:05:32 -05:00
Joshua Boniface	3471f4e57a	Remove obsolete pvc-nsX and add pvc-ns name Should point towards the floating IP.	2019-12-08 20:20:20 -05:00
Joshua Boniface	356c12db2e	Add ceph df output to pool data Allows additional information visible in the `ceph df` command, including pool free space and used percentage.	2019-12-06 00:47:27 -05:00
Joshua Boniface	531578fd28	Use consistent tense for VM states Replace "failed" with "fail" and "disabled" with "disable" for consistency with the remaining states.	2019-10-23 23:57:59 -04:00
Joshua M. Boniface	040ca33683	Clean up handling of OSD dump command	2019-10-22 12:51:29 -04:00
Joshua M. Boniface	190623bdd9	Use empty string for node limit	2019-10-22 12:32:14 -04:00
Joshua M. Boniface	f0e0a38a20	Fix bug in config element retrieval	2019-10-22 12:30:23 -04:00
Joshua Boniface	237a37015d	Set upstream IP in key if changed	2019-10-21 16:50:41 -04:00

... 3 4 5 6 7 ...

466 Commits