parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	4d6842f942	Don't bail out if write fails, keep retrying	2021-07-19 13:09:36 -04:00
Joshua M. Boniface	6ead21a308	Handle cleanup from a failure properly	2021-07-19 12:39:13 -04:00
Joshua M. Boniface	b7c8c2ee3d	Fix handling of this_node and d_domain in cleanup	2021-07-19 12:36:35 -04:00
Joshua M. Boniface	d48f58930b	Use harder exits and add cleanup termination	2021-07-19 12:27:16 -04:00
Joshua M. Boniface	7c36388c8f	Add post-networking delay and adjust daemon delay	2021-07-19 12:23:45 -04:00
Joshua M. Boniface	e9df043c0a	Ensure ZK logging does not block startup	2021-07-19 12:19:59 -04:00
Joshua M. Boniface	71e4d0b32a	Bump version to 0.9.28 v0.9.28	2021-07-19 09:29:34 -04:00
Joshua M. Boniface	f16bad4691	Revamp confirmation options for vm modify Before, "-y"/"--yes" only confirmed the reboot portion. Instead, modify this to confirm both the diff portion and the restart portion, and add separate flags to bypass one or the other independently, ensuring the administrator has lots of flexibility. UNSAFE mode implies "-y" so both would be auto-confirmed if that option is set.	2021-07-19 00:25:43 -04:00
Joshua M. Boniface	15d92c483f	Bump version to 0.9.27 v0.9.27	2021-07-19 00:03:40 -04:00
Joshua M. Boniface	7dd17e71e7	Fix bug with VM editing with file Current config is needed for the diff but it was in a conditional.	2021-07-19 00:02:19 -04:00
Joshua M. Boniface	5be968123f	Readd 1 second queue get timeout Otherwise daemon stops will sometimes inexplicably block. v0.9.26	2021-07-18 22:17:57 -04:00
Joshua M. Boniface	99fd7ebe63	Fix excessive CPU due to looping	2021-07-18 22:06:50 -04:00
Joshua M. Boniface	cffc96d156	Fix failure in creating base keys	2021-07-18 21:00:23 -04:00
Joshua M. Boniface	602093029c	Bump version to 0.9.26	2021-07-18 20:49:52 -04:00
Joshua M. Boniface	bd7a773d6b	Add node log following functionality	2021-07-18 20:37:53 -04:00
Joshua M. Boniface	8d671b3422	Add some tag tests to test-cluster.sh	2021-07-18 20:37:37 -04:00
Joshua M. Boniface	2358ad6bbe	Reduce the number of lines per call 500 was a lot every half second; 200 seems more reasonable. Even a fast kernel boot should generate < 200 lines in half a second.	2021-07-18 20:23:45 -04:00
Joshua M. Boniface	a0e9b57d39	Increase log line frequency	2021-07-18 20:19:59 -04:00
Joshua M. Boniface	2d48127e9c	Use even better/faster set comparison	2021-07-18 20:18:35 -04:00
Joshua M. Boniface	55f2b00366	Add some spaces for better readability	2021-07-18 20:18:23 -04:00
Joshua M. Boniface	ba257048ad	Improve output formatting of node logs	2021-07-18 20:06:08 -04:00
Joshua M. Boniface	b770e15a91	Fix final termination of logger We need to do a bit more finagling with the logger on termination to ensure that all messages are written and the queue drained before actually terminating.	2021-07-18 19:53:00 -04:00
Joshua M. Boniface	e23a65128a	Remove del of logger item	2021-07-18 19:03:47 -04:00
Joshua M. Boniface	982dfd52c6	Adjust date output format	2021-07-18 19:00:54 -04:00
Joshua M. Boniface	3a2478ee0c	Cleanly terminate logger on cleanup	2021-07-18 18:57:44 -04:00
Joshua M. Boniface	a088aa4484	Add node log functions to API and CLI	2021-07-18 18:54:28 -04:00
Joshua M. Boniface	323c7c41ae	Implement node logging into Zookeeper Adds the ability to send node daemon logs to Zookeeper to facilitate a command like "pvc node log", similar to "pvc vm log". Each node stores its logs in a separate tree under "/logs" which can then be combined or queried. By default, set by config, only 2000 lines are kept.	2021-07-18 17:11:43 -04:00
Joshua M. Boniface	cd1db3d587	Ensure node name is part of confing	2021-07-18 16:38:58 -04:00
Joshua M. Boniface	401f102344	Add serial BIOS to default libvirt schema	2021-07-15 10:45:14 -04:00
Joshua M. Boniface	4ac020888b	Add some tag tests to test-cluster.sh	2021-07-14 15:02:03 -04:00
Joshua M. Boniface	8f3b68d48a	Mention multiple option for tags in VM define	2021-07-14 01:12:10 -04:00
Joshua M. Boniface	6d4c26c8d8	Don't show tag line in info if no tags	2021-07-14 00:59:24 -04:00
Joshua M. Boniface	75fb60b1b4	Add VM list filtering by tag Uses same method as state or node filtering, rather than altering how the main LIMIT field works.	2021-07-14 00:59:20 -04:00
Joshua M. Boniface	9ea9ac3b8a	Revamp tag handling and display Add an additional protected class, limit manipulation to one at a time, and ensure future flexibility. Also makes display consistent with other VM elements.	2021-07-13 22:39:52 -04:00
Joshua M. Boniface	27f1758791	Add tags manipulation to API Also fixes some checks for Metadata too since these two actions are almost identical, and adds tags to define endpoint.	2021-07-13 19:05:33 -04:00
Joshua M. Boniface	c0a3467b70	Simplify VM metadata reads Directly call the new common getDomainMetadata function to avoid excessive Zookeeper calls for this information.	2021-07-13 19:05:33 -04:00
Joshua M. Boniface	9a199992a1	Add functions for manipulating VM tags Adds tags to schema (v3), to VM definition, adds function to modify tags, adds function to get tags, and adds tags to VM data output. Tags will enable more granular classification of VMs based either on administrator configuration or from automated system events.	2021-07-13 19:05:33 -04:00
Joshua M. Boniface	c6d552ae57	Rework success checks for IPMI fencing Previously, if the node failed to restart, it was declared a "bad fence" and no further action would be taken. However, there are some situations, for instance critical hardware failures, where intelligent systems will not attempt (or succeed at) starting up the node in such a case, which would result in dead, known-offline nodes without recovery. Tweak this behaviour somewhat. The main path of Reboot -> Check On -> Success + fence-flush is retained, but some additional side-paths are now defined: 1. We attempt to power "on" the chassis 1 second after the reboot, just in case it is off and can be recovered. We then wait another 2 seconds and check the power status (as we did before). 2. If the reboot succeeded, follow this series of choices: a. If the chassis is on, the fence succeeded. b. If the chassis is off, the fence "succeeded" as well. c. If the chassis is in some other state, the fence failed. 3. If the reboot failed, follow this series of choices: a. If the chassis is off, the fence itself failed, but we can treat it as "succeeded"" since the chassis is in a known-offline state. This is the most likely situation when there is a critical hardware failure, and the server's IPMI does not allow itself to start back up again. b. If the chassis is in any other state ("on" or unknown), the fence itself failed and we must treat this as a fence failure. Overall, this should alleviate the aforementioned issue of a critical failure rendering the node persistently "off" not triggering a fence-flush and ensure fencing is more robust.	2021-07-13 17:54:41 -04:00
Joshua M. Boniface	2e9f6ac201	Bump version to 0.9.25 v0.9.25	2021-07-11 23:19:09 -04:00
Joshua M. Boniface	f09849bedf	Don't overwrite shutdown state on termination Just a minor quibble and not really impactful.	2021-07-11 23:18:14 -04:00
Joshua M. Boniface	8c975e5c46	Add chroot context manager example to debootstrap Closes #132	2021-07-11 23:10:41 -04:00
Joshua M. Boniface	c76149141f	Only log ZK connections when persistent Prevents spam in the API logs.	2021-07-10 23:35:49 -04:00
Joshua M. Boniface	f00c4d07f4	Add date output to keepalive Helps track when there is a log follow in "-o cat" mode.	2021-07-10 23:24:59 -04:00
Joshua M. Boniface	20b66c10e1	Move two more commands to Rados library	2021-07-10 17:28:42 -04:00
Joshua M. Boniface	cfeba50b17	Revert "Return to all command-based Ceph gathering" This reverts commit `65d14ccd92`. This was actually a bad idea. For inexplicable reasons, running these Ceph commands manually (not even via Python, but in a normal shell) takes 7 * two orders of magnitude longer than running them with the Rados module, so long in fact that some basic commands like "ceph health" would sometimes take longer than the 1 second timeout to complete. The Rados commands would however take about 1ms instead. Despite the occasional issues when monitors drop out, the Rados module is clearly far superior to the shell commands for any moderately-loaded Ceph cluster. We can look into solving timeouts another way (perhaps with Processes instead of Threads) at a later time. Rados module "ceph health": b'{"checks":{},"status":"HEALTH_OK"}' 0.001204 (s) b'{"checks":{},"status":"HEALTH_OK"}' 0.001258 (s) Command "ceph health": joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null real 0m0.772s user 0m0.707s sys 0m0.046s joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null real 0m0.796s user 0m0.728s sys 0m0.054s	2021-07-10 03:47:45 -04:00
Joshua M. Boniface	0699c48d10	Fix bad schema path name v0.9.24	2021-07-09 16:47:09 -04:00
Joshua M. Boniface	551bae2518	Bump version to 0.9.24	2021-07-09 15:58:36 -04:00
Joshua M. Boniface	4832245d9c	Handle non-RBD disks and non-RBD errors better	2021-07-09 15:48:57 -04:00
Joshua M. Boniface	2138f2f59f	Fail VM removal on disk removal failures Prevents bad states where the VM is "removed" but some of its disks remain due to e.g. stuck watchers. Rearrange the sequence so it goes stop, delete disks, then delete VM, and then return a failure if any of the disk(s) fail to remove, allowing the task to be rerun after fixing the problem.	2021-07-09 15:39:06 -04:00
Joshua M. Boniface	d1d355a96b	Avoid errors if stats data is None	2021-07-09 13:13:54 -04:00

... 5 6 7 8 9 ...

2646 Commits