parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	b50b2a827b	Add forced delays after pool add/remove Prevents returning immediately to give the cluster some breathing room before the admin can do other commands. Keep the write lock as well to prevent other clients from attempting this as well.	2019-06-18 21:56:24 -04:00
Joshua M. Boniface	537ad5de43	Make ceph pool removal confirmation verbose	2019-06-18 21:51:17 -04:00
Joshua M. Boniface	ee73676114	Fix bug with pool removal	2019-06-18 21:51:11 -04:00
Joshua M. Boniface	264c2d4748	Fix broken prompting for pool removal	2019-06-18 21:33:39 -04:00
Joshua M. Boniface	2bbbda3da5	Only trigger pool updates on primary	2019-06-18 21:26:05 -04:00
Joshua M. Boniface	612f5ab52c	Strip pv_block from stdout	2019-06-18 20:34:25 -04:00
Joshua M. Boniface	1622226c32	Add more logging during OSD creation/deletion	2019-06-18 20:31:04 -04:00
Joshua M. Boniface	3adeef6fdd	Use the fsid to activate new OSDs	2019-06-18 20:22:28 -04:00
Joshua M. Boniface	443108f53d	Add support for enable/disable keepalive detail	2019-06-18 19:54:42 -04:00
Joshua M. Boniface	79f284a0a9	Pass logger into run_command	2019-06-18 13:45:59 -04:00
Joshua M. Boniface	080ca3201c	Correct actual problem with this_node	2019-06-18 13:43:54 -04:00
Joshua M. Boniface	d076f9f4eb	Use self.this_node everywhere	2019-06-18 13:25:16 -04:00
Joshua M. Boniface	aee078f3eb	Support disabling keepalive logging	2019-06-18 12:44:07 -04:00
Joshua M. Boniface	b0411e8e1a	Remove "error" message from Ceph commands This triggeres at every node start and isn't useful.	2019-06-18 12:41:38 -04:00
Joshua M. Boniface	8d9007f697	Remove OSD stat collection if count is zero Otherwise, ceph osd df will hang indefinitely trying to get data for the zero OSDs.	2019-06-18 12:36:53 -04:00
Joshua M. Boniface	5a327dc41a	Clean up Ceph pipeline and add more debug logs	2019-06-18 11:19:03 -04:00
Joshua M. Boniface	46a416bc78	Use a proper variable for vni_mtu	2019-06-18 00:01:12 -04:00
Joshua M. Boniface	1f92b90a3e	Don't encode initial data as we're using zkhander	2019-06-17 23:53:16 -04:00
Joshua M. Boniface	d4ebe63d9b	Rename network device field It seems much nicer and more consistent as "device" rather than as "name".	2019-06-17 23:44:41 -04:00
Joshua M. Boniface	1d3f868206	Unify network devices and addresses in config The old way of doing this was a little cumbersome, with an upper YAML tree split between "devices" (name and MTU) and addresses. This commit unifies these under the root "networking" section to make this section clearer.	2019-06-17 23:41:07 -04:00
Joshua M. Boniface	e70255dbd6	Support configurable interface MTUs MTUs were hardcoded at 9000, which breaks if the underlying interface or network switch does not support jumbo frames, a possible deployment limitation. This has non-obvious consequences due to MTU mismatches for certain services (Ceph, Zookeeper, etc.). This commit adds support for configurable MTUs for each interface, set in pvcd.yaml. The example has been updated to reflect this, with a default of 1500 (the Ethernet standard). This commit also adds autoconfiguration of the VNI device MTU based on the `vni_mtu` value, the same for bridge networks and minus 50 (rather than 200 from the hardcoded value, based on the following resource [1]) for VXLAN networks. [1] http://ipengineer.net/2014/06/vxlan-mtu-vs-ip-mtu-consideration/	2019-06-17 23:34:48 -04:00
Joshua M. Boniface	c583ee1709	Revert "Wait a little longer" This reverts commit `bd7a55e9e1`. This is not really needed, but do keep the 5s wait	2019-06-17 21:56:06 -04:00
Joshua M. Boniface	bd7a55e9e1	Wait a little longer	2019-06-17 12:14:13 -04:00
Joshua M. Boniface	23994f8a11	Increase wait time for daemons and log message	2019-06-17 10:30:46 -04:00
Joshua M. Boniface	fe654aa5a2	Correct typo in daemon	2019-06-16 19:27:20 -04:00
Joshua M. Boniface	3ba3c339a7	Show vCPU count on CLI output Showing the static, total number of CPUs was pointless. Instead, show the number of allocated vCPUs. To preserve space, no longer show the host CPU count in the list.	2019-06-02 22:30:26 -04:00
Joshua M. Boniface	45da4e3f9a	Remove backup file	2019-05-30 21:59:56 -04:00
Joshua M. Boniface	7596e3c3b5	Add missing number	2019-05-28 23:41:31 -04:00
Joshua M. Boniface	b7beea2692	Fix some typos and poor wordings	2019-05-28 20:17:45 -04:00
Joshua M. Boniface	2a6157521d	Reorganize documentation	2019-05-28 20:04:55 -04:00
Joshua M. Boniface	b9774bdf03	Increase wait sleeps in node flush/unflush	2019-05-26 23:21:01 -04:00
Joshua M. Boniface	14e9ba892c	Wait on both sides for 30s Still finding issues with the flush	2019-05-24 01:23:18 -04:00
Joshua M. Boniface	703e34e8ea	Remove disable of pvc-flush Since it isn't re-enabled and this makes life difficult, don't disable the pvc-flush service if it was enabled.	2019-05-23 23:47:57 -04:00
Joshua M. Boniface	ae37afcf75	Wait 10 seconds when starting pvc-flush Without waiting the unflush will trigger too soon, before the daemon is fully ready and such it fails in odd ways.	2019-05-23 23:35:01 -04:00
Joshua M. Boniface	e8b666708c	Add one final keepalive update before exiting	2019-05-23 23:23:03 -04:00
Joshua M. Boniface	4c5ce9b995	Perform additional tweaks to units Use RemainAfterExit to avoid pvc-flush from auto-stopping immediately. Use PartOf to tie services to the target itself. Use --wait on flush to avoid daemon stopping before flush is complete.	2019-05-23 23:18:28 -04:00
Joshua M. Boniface	e46aa22989	Remove invalid Restart in pvc-flush.service	2019-05-23 22:51:36 -04:00
Joshua M. Boniface	0421f5cac8	Make the informational messages stand out	2019-05-23 22:49:00 -04:00
Joshua M. Boniface	7c6132f7dd	Add node autoflush service and target Add a systemd service to manage node flush/unflush, useful during system startup and shutdown to avoid requiring administrator intervention for this to occur. This is optional and the service is not enabled by default, and the postinst script informs the administrator of this. Also adds a systemd target to collect the two service units together and provide an easy way to flush+shutdown or startup+unflush the entire PVC system. Closes #28	2019-05-23 22:42:51 -04:00
Joshua M. Boniface	69462d2c7b	Ensure myhostname is short PVC now uses shortnames for node names, so ensure this is reflected in the default choices for some node-level commands.	2019-05-23 22:27:34 -04:00
Joshua M. Boniface	8ef21cf9f2	Sleep longer before removing gateways 1 second was just slightly too little time to wait and packets would occasionally be lost on primary switchover. Increase this to 2 seconds to provide more time for arping to run on the new primary.	2019-05-23 22:20:38 -04:00
Joshua M. Boniface	d59280d829	Update dependencies for Postgres	2019-05-22 21:57:06 -04:00
Joshua M. Boniface	8881b97e8b	Correct a missing capitalization	2019-05-21 23:19:19 -04:00
Joshua M. Boniface	4bfbbaa7d9	Remove commented needless call	2019-05-21 23:08:28 -04:00
Joshua M. Boniface	3893666507	Improve performance by removing spurious actions 1. Remove a number of time.sleep commands which don't really seem necessary any longer and which significantly increased the startup time while parsing the VM list. 2. Handle some variable sets during initialization of the object, rather than waiting for a management command, enabling... 3. Know when a state change, and the corresponding Libvirt lookup, is unnecessary due to the target node not matching the current node. This also removes a number of unremovable errors from Libvirt on the console which were annoying. This reduces the total time taken by the VM startup segment (lines 760-762 of Daemon.py) from 17.117s down to 0.976s for 82 VMs.	2019-05-21 22:56:40 -04:00
Joshua M. Boniface	6fd4710f7f	Remove bad replacement	2019-05-21 19:51:23 -04:00
Joshua M. Boniface	79d0a2eafc	Handle raw sorting properly with new list format	2019-05-21 14:44:45 -04:00
Joshua M. Boniface	595cf1782c	Switch DNS aggregator to PostgreSQL MariaDB+Galera was terribly unstable, with the cluster failing to start or dying randomly, and generally seemed incredibly unsuitable for an HA solution. This commit switches the DNS aggregator SQL backend to PostgreSQL, implemented via Patroni HA. It also manages the Patroni state, forcing the primary instance to follow the PVC coordinator, such that the active DNS Aggregator instance is always able to communicate read+write with the local system. This required some logic changes to how the DNS Aggregator worked, specifically ensuring that database changes aren't attempted while the instance isn't actively running - to be honest this was a bug anyways that had just never been noticed. Closes #34	2019-05-21 01:07:41 -04:00
Joshua M. Boniface	73443ecbaf	Update vm.py to allow API use	2019-05-20 22:15:28 -04:00
Joshua Boniface	9e806d30f9	Only stop log parser if it's actually running	2019-05-11 12:09:42 -04:00

... 2 3 4 5 6 ...

1068 Commits