parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	709c9cb73e	Pause pvchealthd startup until node daemon is run If the health daemon starts too soon during a node bootup, it will generate generate tons of erroneous faults while the node starts up. Adds a conditional wait for the current node daemon to be in "run" state before the health daemon really starts up.	2023-12-13 14:53:54 -05:00
Joshua M. Boniface	f41c5176be	Ensure health value is an int properly	2023-12-13 14:34:02 -05:00
Joshua M. Boniface	38e43b46c3	Update health detail messages format	2023-12-13 03:17:47 -05:00
Joshua M. Boniface	ed9c37982a	Move metric collection into daemon library	2023-12-11 19:20:30 -05:00
Joshua M. Boniface	0f24184b78	Explicitly clear resources of fenced node This actually solves the bug originally "fixed" in `5f1432ccdd` without breaking VM resource allocations for working nodes.	2023-12-11 12:14:56 -05:00
Joshua M. Boniface	1ba37fe33d	Restore VM resource allocation location Commit `5f1432ccdd` changed where these happen due to a bug after fencing. However this completely broke node resource reporting as only the final instance will be queried here. Revert this change and look further into the original bug.	2023-12-11 11:52:59 -05:00
Joshua M. Boniface	1a05077b10	Fix missing fstring	2023-12-11 11:29:49 -05:00
Joshua M. Boniface	57c28376a6	Port one final Ceph function to read_many	2023-12-11 10:25:36 -05:00
Joshua M. Boniface	e781d742e6	Fix bug with volume and snapshot listing	2023-12-11 10:21:46 -05:00
Joshua M. Boniface	6c6d1508a1	Add VNC info to screenshots	2023-12-11 03:40:49 -05:00
Joshua M. Boniface	741dafb26b	Port VM functions to read_many	2023-12-11 03:34:36 -05:00
Joshua M. Boniface	032d3ebf18	Remove debug output from image	2023-12-11 03:23:10 -05:00
Joshua M. Boniface	5d9e83e8ed	Fix output bugs in VM information	2023-12-11 03:04:46 -05:00
Joshua M. Boniface	ad0bd8649f	Finish missing sentence	2023-12-11 02:39:39 -05:00
Joshua M. Boniface	9b5e53e4b6	Add Grafana dashboard screenshot	2023-12-11 00:39:24 -05:00
Joshua M. Boniface	9617660342	Update Prometheus Grafana dashboard	2023-12-11 00:23:08 -05:00
Joshua M. Boniface	ab0a1e0946	Update and streamline README and update images	2023-12-10 23:57:01 -05:00
Joshua M. Boniface	7c116b2fbc	Ensure node health value is an int	2023-12-10 23:56:50 -05:00
Joshua M. Boniface	1023c55087	Fix bug in VM state list	2023-12-10 23:44:01 -05:00
Joshua M. Boniface	9235187c6f	Port Ceph functions to read_many Only ports getOSDInformation, as all the others feature 3 or less reads which is acceptable sequentially.	2023-12-10 22:24:38 -05:00
Joshua M. Boniface	0c94f1b4f8	Port Network functions to read_many	2023-12-10 22:19:21 -05:00
Joshua M. Boniface	44a4f0e1f7	Use new info detail output instead of new lists Avoids multiple additional ZK calls by using data that is now in the status detail output.	2023-12-10 22:19:09 -05:00
Joshua M. Boniface	5d53a3e529	Add state and faults detail to cluster information We already parse this information out anyways, so might as well add it to the API output JSON. This can be leveraged by the Prometheus endpoint as well to avoid duplicate listings.	2023-12-10 17:29:32 -05:00
Joshua M. Boniface	35e22cb50f	Simplify cluster status handling This significantly simplifies cluster state handling by removing most of the superfluous get_list() calls, replacing them with basic child reads since most of them are just for a count anyways. The ones that require states simplify this down to a child read plus direct reads for the exact items required while leveraging the new read_many() function.	2023-12-10 17:05:46 -05:00
Joshua M. Boniface	a3171b666b	Split node health into separate function	2023-12-10 16:52:10 -05:00
Joshua M. Boniface	48e41d7b05	Port Faults getFault and getAllFaults to read_many	2023-12-10 16:05:16 -05:00
Joshua M. Boniface	d6aecf195e	Port Node getNodeInformation to read_many	2023-12-10 15:53:28 -05:00
Joshua M. Boniface	9329784010	Implement async ZK read function Adds a function, "read_many", which can take in multiple ZK keys and return the values from all of them, using asyncio to avoid reading sequentially. Initial tests show a marked improvement in read performance of multiple read()-heavy functions (e.g. "get_list()" functions) with this method.	2023-12-10 15:35:40 -05:00
Joshua M. Boniface	9dc5097dbc	Bump version to 0.9.85 v0.9.85	2023-12-10 01:00:33 -05:00
Joshua M. Boniface	5776cb3a09	Remove Prometheus client dependencies We don't actually use this (yet!) so remove the dependency for now.	2023-12-10 00:58:09 -05:00
Joshua M. Boniface	53d632f283	Fix bug in example PVC Grafana dashboard	2023-12-10 00:50:05 -05:00
Joshua M. Boniface	7bc0760b78	Add time to "starting keepalive" message Matches the pvchealthd output and provides a useful message detail to this otherwise contextless message.	2023-12-10 00:40:32 -05:00
Joshua M. Boniface	9aee2a9075	Bump version to 0.9.84 v0.9.84	2023-12-09 23:05:40 -05:00
Joshua M. Boniface	8f0ae3e2dd	Fix config file for database migrations	2023-12-09 22:51:54 -05:00
Joshua M. Boniface	946d3eaf43	Add wait after stopping VM	2023-12-09 18:14:03 -05:00
Joshua M. Boniface	1f6347d24b	Add Prometheus monitoring examples	2023-12-09 17:42:51 -05:00
Joshua M. Boniface	e8552b471b	Require at least one FAULT_ID	2023-12-09 17:31:56 -05:00
Joshua M. Boniface	fc443a323b	Allow ack/delete of multiple faults at once	2023-12-09 17:28:13 -05:00
Joshua M. Boniface	b0557edb76	Ensure entry in name is uppercase	2023-12-09 17:01:41 -05:00
Joshua M. Boniface	47bd7bf2f5	Only run cluster-wide health checks on primary Avoids multiple coordinators trying to write updated cluster-wide fault events. Instead, they are now only written by the primary (or the incoming primary if still in a transition).	2023-12-09 16:50:51 -05:00
Joshua M. Boniface	b9fbfe2ed5	Improve fault ID format Instead of using random hex characters from an md5sum, use a nice name in all-caps similar to how Ceph does. This further helps prevent dupes but also permits a changing health delta within a single event (which would really only ever apply to plugin faults).	2023-12-09 16:48:14 -05:00
Joshua M. Boniface	764e3e3722	Fix bug in fault header format	2023-12-09 16:47:56 -05:00
Joshua M. Boniface	7e6d922877	Improve fault detail handling further Since we already had a "details" field, simply move where it gets added to the message later, in generate_fault, after the main message value was used to generate the ID.	2023-12-09 16:13:36 -05:00
Joshua M. Boniface	4ca2381077	Rework metrics output and add combined endpoint	2023-12-09 15:47:40 -05:00
Joshua M. Boniface	4003204f14	Remove bracketed text from fault_str This ensures that certain faults e.g. Ceph status faults, will be combined despite the added text in brackets, while still keeping them mostly separate. Also ensure the health text is updated each time to assist with this, as this health text may now change independent of the fault ID.	2023-12-09 15:34:18 -05:00
Joshua M. Boniface	a70c1d63b0	Separate state totals from states, separate states	2023-12-09 13:59:17 -05:00
Joshua M. Boniface	2bea78d25e	Make all remaining limits optional	2023-12-09 13:43:58 -05:00
Joshua M. Boniface	fd717b702d	Use external list of fault states	2023-12-09 12:51:41 -05:00
Joshua M. Boniface	132cde5591	Add totals and nice-format states Avoids tons of annoying rewriting in the UI later.	2023-12-09 12:50:19 -05:00
Joshua M. Boniface	ba565ead4c	Report all state combinations in Prom metrics Ensures that every state combination is always shown to metrics, even if it contains 0 entries.	2023-12-09 12:40:37 -05:00

... 4 5 6 7 8 ...

3367 Commits