Joshua Boniface
1a05077b10
Fix missing fstring
2023-12-11 11:29:49 -05:00
Joshua Boniface
57c28376a6
Port one final Ceph function to read_many
2023-12-11 10:25:36 -05:00
Joshua Boniface
e781d742e6
Fix bug with volume and snapshot listing
2023-12-11 10:21:46 -05:00
Joshua Boniface
6c6d1508a1
Add VNC info to screenshots
2023-12-11 03:40:49 -05:00
Joshua Boniface
741dafb26b
Port VM functions to read_many
2023-12-11 03:34:36 -05:00
Joshua Boniface
032d3ebf18
Remove debug output from image
2023-12-11 03:23:10 -05:00
Joshua Boniface
5d9e83e8ed
Fix output bugs in VM information
2023-12-11 03:04:46 -05:00
Joshua Boniface
ad0bd8649f
Finish missing sentence
2023-12-11 02:39:39 -05:00
Joshua Boniface
9b5e53e4b6
Add Grafana dashboard screenshot
2023-12-11 00:39:24 -05:00
Joshua Boniface
9617660342
Update Prometheus Grafana dashboard
2023-12-11 00:23:08 -05:00
Joshua Boniface
ab0a1e0946
Update and streamline README and update images
2023-12-10 23:57:01 -05:00
Joshua Boniface
7c116b2fbc
Ensure node health value is an int
2023-12-10 23:56:50 -05:00
Joshua Boniface
1023c55087
Fix bug in VM state list
2023-12-10 23:44:01 -05:00
Joshua Boniface
9235187c6f
Port Ceph functions to read_many
...
Only ports getOSDInformation, as all the others feature 3 or less reads
which is acceptable sequentially.
2023-12-10 22:24:38 -05:00
Joshua Boniface
0c94f1b4f8
Port Network functions to read_many
2023-12-10 22:19:21 -05:00
Joshua Boniface
44a4f0e1f7
Use new info detail output instead of new lists
...
Avoids multiple additional ZK calls by using data that is now in the
status detail output.
2023-12-10 22:19:09 -05:00
Joshua Boniface
5d53a3e529
Add state and faults detail to cluster information
...
We already parse this information out anyways, so might as well add it
to the API output JSON. This can be leveraged by the Prometheus endpoint
as well to avoid duplicate listings.
2023-12-10 17:29:32 -05:00
Joshua Boniface
35e22cb50f
Simplify cluster status handling
...
This significantly simplifies cluster state handling by removing most of
the superfluous get_list() calls, replacing them with basic child reads
since most of them are just for a count anyways. The ones that require
states simplify this down to a child read plus direct reads for the
exact items required while leveraging the new read_many() function.
2023-12-10 17:05:46 -05:00
Joshua Boniface
a3171b666b
Split node health into separate function
2023-12-10 16:52:10 -05:00
Joshua Boniface
48e41d7b05
Port Faults getFault and getAllFaults to read_many
2023-12-10 16:05:16 -05:00
Joshua Boniface
d6aecf195e
Port Node getNodeInformation to read_many
2023-12-10 15:53:28 -05:00
Joshua Boniface
9329784010
Implement async ZK read function
...
Adds a function, "read_many", which can take in multiple ZK keys and
return the values from all of them, using asyncio to avoid reading
sequentially.
Initial tests show a marked improvement in read performance of multiple
read()-heavy functions (e.g. "get_list()" functions) with this method.
2023-12-10 15:35:40 -05:00
Joshua Boniface
9dc5097dbc
Bump version to 0.9.85
2023-12-10 01:00:33 -05:00
Joshua Boniface
5776cb3a09
Remove Prometheus client dependencies
...
We don't actually use this (yet!) so remove the dependency for now.
2023-12-10 00:58:09 -05:00
Joshua Boniface
53d632f283
Fix bug in example PVC Grafana dashboard
2023-12-10 00:50:05 -05:00
Joshua Boniface
7bc0760b78
Add time to "starting keepalive" message
...
Matches the pvchealthd output and provides a useful message detail to
this otherwise contextless message.
2023-12-10 00:40:32 -05:00
Joshua Boniface
9aee2a9075
Bump version to 0.9.84
2023-12-09 23:05:40 -05:00
Joshua Boniface
8f0ae3e2dd
Fix config file for database migrations
2023-12-09 22:51:54 -05:00
Joshua Boniface
946d3eaf43
Add wait after stopping VM
2023-12-09 18:14:03 -05:00
Joshua Boniface
1f6347d24b
Add Prometheus monitoring examples
2023-12-09 17:42:51 -05:00
Joshua Boniface
e8552b471b
Require at least one FAULT_ID
2023-12-09 17:31:56 -05:00
Joshua Boniface
fc443a323b
Allow ack/delete of multiple faults at once
2023-12-09 17:28:13 -05:00
Joshua Boniface
b0557edb76
Ensure entry in name is uppercase
2023-12-09 17:01:41 -05:00
Joshua Boniface
47bd7bf2f5
Only run cluster-wide health checks on primary
...
Avoids multiple coordinators trying to write updated cluster-wide fault
events. Instead, they are now only written by the primary (or the
incoming primary if still in a transition).
2023-12-09 16:50:51 -05:00
Joshua Boniface
b9fbfe2ed5
Improve fault ID format
...
Instead of using random hex characters from an md5sum, use a nice name
in all-caps similar to how Ceph does. This further helps prevent dupes
but also permits a changing health delta within a single event (which
would really only ever apply to plugin faults).
2023-12-09 16:48:14 -05:00
Joshua Boniface
764e3e3722
Fix bug in fault header format
2023-12-09 16:47:56 -05:00
Joshua Boniface
7e6d922877
Improve fault detail handling further
...
Since we already had a "details" field, simply move where it gets added
to the message later, in generate_fault, after the main message value
was used to generate the ID.
2023-12-09 16:13:36 -05:00
Joshua Boniface
4ca2381077
Rework metrics output and add combined endpoint
2023-12-09 15:47:40 -05:00
Joshua Boniface
4003204f14
Remove bracketed text from fault_str
...
This ensures that certain faults e.g. Ceph status faults, will be
combined despite the added text in brackets, while still keeping them
mostly separate.
Also ensure the health text is updated each time to assist with this, as
this health text may now change independent of the fault ID.
2023-12-09 15:34:18 -05:00
Joshua Boniface
a70c1d63b0
Separate state totals from states, separate states
2023-12-09 13:59:17 -05:00
Joshua Boniface
2bea78d25e
Make all remaining limits optional
2023-12-09 13:43:58 -05:00
Joshua Boniface
fd717b702d
Use external list of fault states
2023-12-09 12:51:41 -05:00
Joshua Boniface
132cde5591
Add totals and nice-format states
...
Avoids tons of annoying rewriting in the UI later.
2023-12-09 12:50:19 -05:00
Joshua Boniface
ba565ead4c
Report all state combinations in Prom metrics
...
Ensures that every state combination is always shown to metrics, even if
it contains 0 entries.
2023-12-09 12:40:37 -05:00
Joshua Boniface
317ca4b98c
Move defined state combinations into common
2023-12-09 12:36:32 -05:00
Joshua Boniface
2b8abea8df
Remove debug printing
2023-12-09 12:22:36 -05:00
Joshua Boniface
9b3c9f1be5
Add Ceph metrics proxy and health fault counts
2023-12-09 12:22:36 -05:00
Joshua Boniface
7373bfed3f
Add Prometheus metric exporter
...
Adds a "fake" Prometheus metrics endpoint which returns cluster status
information in Prometheus format.
2023-12-09 12:22:36 -05:00
Joshua Boniface
d0e7c19602
Add prometheus client dependencies
2023-12-09 12:22:36 -05:00
Joshua Boniface
f01c12c86b
Import from pvcworkerd not pvcapid
2023-12-09 12:22:19 -05:00