parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	3e4cc53fdd	Add node network statistics and utilization values Adds a new physical network interface stats parser to the node keepalives, and leverages this information to provide a network utilization overview in the Prometheus metrics.	2023-12-21 15:45:01 -05:00
Joshua M. Boniface	d2d2a9c617	Include our newline atomically Sometimes clashing log entries would print on the same line, likely due to some sort of race condition in Python's print() built-in. Instead, add a newline to our actual message and print without an end character. This ensures atomic printing of our log messages.	2023-12-21 13:12:43 -05:00
Joshua M. Boniface	6ed4efad33	Add new network.stats key to nodes	2023-12-21 12:48:48 -05:00
Joshua M. Boniface	39f9f3640c	Rename health metrics and add resource metrics	2023-12-21 09:40:49 -05:00
Joshua M. Boniface	c64e888d30	Fix incorrect cast of None v0.9.86	2023-12-14 16:00:53 -05:00
Joshua M. Boniface	f1249452e5	Fix bug if no nodes are present	2023-12-14 15:32:18 -05:00
Joshua M. Boniface	0a93f526e0	Bump version to 0.9.86	2023-12-14 14:46:29 -05:00
Joshua M. Boniface	7c9512fb22	Fix broken config file in API migration script	2023-12-14 14:45:58 -05:00
Joshua M. Boniface	e88b97f3a9	Print fenced state in red	2023-12-13 15:02:18 -05:00
Joshua M. Boniface	709c9cb73e	Pause pvchealthd startup until node daemon is run If the health daemon starts too soon during a node bootup, it will generate generate tons of erroneous faults while the node starts up. Adds a conditional wait for the current node daemon to be in "run" state before the health daemon really starts up.	2023-12-13 14:53:54 -05:00
Joshua M. Boniface	f41c5176be	Ensure health value is an int properly	2023-12-13 14:34:02 -05:00
Joshua M. Boniface	38e43b46c3	Update health detail messages format	2023-12-13 03:17:47 -05:00
Joshua M. Boniface	ed9c37982a	Move metric collection into daemon library	2023-12-11 19:20:30 -05:00
Joshua M. Boniface	0f24184b78	Explicitly clear resources of fenced node This actually solves the bug originally "fixed" in `5f1432ccdd` without breaking VM resource allocations for working nodes.	2023-12-11 12:14:56 -05:00
Joshua M. Boniface	1ba37fe33d	Restore VM resource allocation location Commit `5f1432ccdd` changed where these happen due to a bug after fencing. However this completely broke node resource reporting as only the final instance will be queried here. Revert this change and look further into the original bug.	2023-12-11 11:52:59 -05:00
Joshua M. Boniface	1a05077b10	Fix missing fstring	2023-12-11 11:29:49 -05:00
Joshua M. Boniface	57c28376a6	Port one final Ceph function to read_many	2023-12-11 10:25:36 -05:00
Joshua M. Boniface	e781d742e6	Fix bug with volume and snapshot listing	2023-12-11 10:21:46 -05:00
Joshua M. Boniface	6c6d1508a1	Add VNC info to screenshots	2023-12-11 03:40:49 -05:00
Joshua M. Boniface	741dafb26b	Port VM functions to read_many	2023-12-11 03:34:36 -05:00
Joshua M. Boniface	032d3ebf18	Remove debug output from image	2023-12-11 03:23:10 -05:00
Joshua M. Boniface	5d9e83e8ed	Fix output bugs in VM information	2023-12-11 03:04:46 -05:00
Joshua M. Boniface	ad0bd8649f	Finish missing sentence	2023-12-11 02:39:39 -05:00
Joshua M. Boniface	9b5e53e4b6	Add Grafana dashboard screenshot	2023-12-11 00:39:24 -05:00
Joshua M. Boniface	9617660342	Update Prometheus Grafana dashboard	2023-12-11 00:23:08 -05:00
Joshua M. Boniface	ab0a1e0946	Update and streamline README and update images	2023-12-10 23:57:01 -05:00
Joshua M. Boniface	7c116b2fbc	Ensure node health value is an int	2023-12-10 23:56:50 -05:00
Joshua M. Boniface	1023c55087	Fix bug in VM state list	2023-12-10 23:44:01 -05:00
Joshua M. Boniface	9235187c6f	Port Ceph functions to read_many Only ports getOSDInformation, as all the others feature 3 or less reads which is acceptable sequentially.	2023-12-10 22:24:38 -05:00
Joshua M. Boniface	0c94f1b4f8	Port Network functions to read_many	2023-12-10 22:19:21 -05:00
Joshua M. Boniface	44a4f0e1f7	Use new info detail output instead of new lists Avoids multiple additional ZK calls by using data that is now in the status detail output.	2023-12-10 22:19:09 -05:00
Joshua M. Boniface	5d53a3e529	Add state and faults detail to cluster information We already parse this information out anyways, so might as well add it to the API output JSON. This can be leveraged by the Prometheus endpoint as well to avoid duplicate listings.	2023-12-10 17:29:32 -05:00
Joshua M. Boniface	35e22cb50f	Simplify cluster status handling This significantly simplifies cluster state handling by removing most of the superfluous get_list() calls, replacing them with basic child reads since most of them are just for a count anyways. The ones that require states simplify this down to a child read plus direct reads for the exact items required while leveraging the new read_many() function.	2023-12-10 17:05:46 -05:00
Joshua M. Boniface	a3171b666b	Split node health into separate function	2023-12-10 16:52:10 -05:00
Joshua M. Boniface	48e41d7b05	Port Faults getFault and getAllFaults to read_many	2023-12-10 16:05:16 -05:00
Joshua M. Boniface	d6aecf195e	Port Node getNodeInformation to read_many	2023-12-10 15:53:28 -05:00
Joshua M. Boniface	9329784010	Implement async ZK read function Adds a function, "read_many", which can take in multiple ZK keys and return the values from all of them, using asyncio to avoid reading sequentially. Initial tests show a marked improvement in read performance of multiple read()-heavy functions (e.g. "get_list()" functions) with this method.	2023-12-10 15:35:40 -05:00
Joshua M. Boniface	9dc5097dbc	Bump version to 0.9.85 v0.9.85	2023-12-10 01:00:33 -05:00
Joshua M. Boniface	5776cb3a09	Remove Prometheus client dependencies We don't actually use this (yet!) so remove the dependency for now.	2023-12-10 00:58:09 -05:00
Joshua M. Boniface	53d632f283	Fix bug in example PVC Grafana dashboard	2023-12-10 00:50:05 -05:00
Joshua M. Boniface	7bc0760b78	Add time to "starting keepalive" message Matches the pvchealthd output and provides a useful message detail to this otherwise contextless message.	2023-12-10 00:40:32 -05:00
Joshua M. Boniface	9aee2a9075	Bump version to 0.9.84 v0.9.84	2023-12-09 23:05:40 -05:00
Joshua M. Boniface	8f0ae3e2dd	Fix config file for database migrations	2023-12-09 22:51:54 -05:00
Joshua M. Boniface	946d3eaf43	Add wait after stopping VM	2023-12-09 18:14:03 -05:00
Joshua M. Boniface	1f6347d24b	Add Prometheus monitoring examples	2023-12-09 17:42:51 -05:00
Joshua M. Boniface	e8552b471b	Require at least one FAULT_ID	2023-12-09 17:31:56 -05:00
Joshua M. Boniface	fc443a323b	Allow ack/delete of multiple faults at once	2023-12-09 17:28:13 -05:00
Joshua M. Boniface	b0557edb76	Ensure entry in name is uppercase	2023-12-09 17:01:41 -05:00
Joshua M. Boniface	47bd7bf2f5	Only run cluster-wide health checks on primary Avoids multiple coordinators trying to write updated cluster-wide fault events. Instead, they are now only written by the primary (or the incoming primary if still in a transition).	2023-12-09 16:50:51 -05:00
Joshua M. Boniface	b9fbfe2ed5	Improve fault ID format Instead of using random hex characters from an md5sum, use a nice name in all-caps similar to how Ceph does. This further helps prevent dupes but also permits a changing health delta within a single event (which would really only ever apply to plugin faults).	2023-12-09 16:48:14 -05:00

... 3 4 5 6 7 ...

3326 Commits