2bb24d3b57
Update Prometheus dashboard and add README
2023-12-27 15:57:12 -05:00
8083b7a3e6
Bump version to 0.9.87
v0.9.87
2023-12-27 13:40:51 -05:00
3346ce9bb0
Add missing shutdown state from combinations
2023-12-27 13:40:30 -05:00
572596c575
Fix missing f-string placeholder
2023-12-27 13:21:20 -05:00
e654fbba08
Move debug condition handling to Logger
...
Avoids many dozens of conditionals sprinkled throughout the code by
centralizing this check into the main Logger instance.
2023-12-27 13:01:45 -05:00
52bf5ad0ef
Update store_path set location
...
Prevents a bug if no cluster is selected while doing connection list
commands.
2023-12-27 12:42:19 -05:00
576afc1e94
Update Grafana dashboard layouts
2023-12-27 12:24:46 -05:00
4375f66793
Use proper get() for invalid values
2023-12-27 12:03:48 -05:00
3df3ca5b44
Fix value for OSD utilization
...
Ceph provides in KB; convert to bytes.
2023-12-27 11:56:50 -05:00
cb3c2cd86d
Adjust name of PVC cluster dashboard
2023-12-27 11:42:58 -05:00
d0de4f1825
Update Grafana dashboard to overview
...
Adds resource utilization in addition to health.
2023-12-27 11:38:39 -05:00
494c20263d
Move monitoring folder to top level
2023-12-27 11:37:49 -05:00
431ee69620
Use proper percentage for pool util
2023-12-27 10:03:00 -05:00
88f4d79d5a
Handle invalid values on older Libvirt versions
2023-12-27 09:51:24 -05:00
84d22751d8
Fix bad JSON data handler
2023-12-27 09:43:37 -05:00
40ff005a09
Fix handling of Ceph OSD bytes
2023-12-26 12:43:51 -05:00
ab4ec7a5fa
Remove WebUI from README
2023-12-25 02:48:44 -05:00
9604f655d0
Improve node utilization metrics and fix bugs
2023-12-25 02:47:41 -05:00
3e4cc53fdd
Add node network statistics and utilization values
...
Adds a new physical network interface stats parser to the node
keepalives, and leverages this information to provide a network
utilization overview in the Prometheus metrics.
2023-12-21 15:45:01 -05:00
d2d2a9c617
Include our newline atomically
...
Sometimes clashing log entries would print on the same line, likely due
to some sort of race condition in Python's print() built-in.
Instead, add a newline to our actual message and print without an end
character. This ensures atomic printing of our log messages.
2023-12-21 13:12:43 -05:00
6ed4efad33
Add new network.stats key to nodes
2023-12-21 12:48:48 -05:00
39f9f3640c
Rename health metrics and add resource metrics
2023-12-21 09:40:49 -05:00
c64e888d30
Fix incorrect cast of None
v0.9.86
2023-12-14 16:00:53 -05:00
f1249452e5
Fix bug if no nodes are present
2023-12-14 15:32:18 -05:00
0a93f526e0
Bump version to 0.9.86
2023-12-14 14:46:29 -05:00
7c9512fb22
Fix broken config file in API migration script
2023-12-14 14:45:58 -05:00
e88b97f3a9
Print fenced state in red
2023-12-13 15:02:18 -05:00
709c9cb73e
Pause pvchealthd startup until node daemon is run
...
If the health daemon starts too soon during a node bootup, it will
generate generate tons of erroneous faults while the node starts up.
Adds a conditional wait for the current node daemon to be in "run"
state before the health daemon really starts up.
2023-12-13 14:53:54 -05:00
f41c5176be
Ensure health value is an int properly
2023-12-13 14:34:02 -05:00
38e43b46c3
Update health detail messages format
2023-12-13 03:17:47 -05:00
ed9c37982a
Move metric collection into daemon library
2023-12-11 19:20:30 -05:00
0f24184b78
Explicitly clear resources of fenced node
...
This actually solves the bug originally "fixed" in
5f1432ccdd38996dac0f528035634cbc82827abd without breaking VM resource
allocations for working nodes.
2023-12-11 12:14:56 -05:00
1ba37fe33d
Restore VM resource allocation location
...
Commit 5f1432ccdd38996dac0f528035634cbc82827abd changed where these
happen due to a bug after fencing. However this completely broke node
resource reporting as only the final instance will be queried here.
Revert this change and look further into the original bug.
2023-12-11 11:52:59 -05:00
1a05077b10
Fix missing fstring
2023-12-11 11:29:49 -05:00
57c28376a6
Port one final Ceph function to read_many
2023-12-11 10:25:36 -05:00
e781d742e6
Fix bug with volume and snapshot listing
2023-12-11 10:21:46 -05:00
6c6d1508a1
Add VNC info to screenshots
2023-12-11 03:40:49 -05:00
741dafb26b
Port VM functions to read_many
2023-12-11 03:34:36 -05:00
032d3ebf18
Remove debug output from image
2023-12-11 03:23:10 -05:00
5d9e83e8ed
Fix output bugs in VM information
2023-12-11 03:04:46 -05:00
ad0bd8649f
Finish missing sentence
2023-12-11 02:39:39 -05:00
9b5e53e4b6
Add Grafana dashboard screenshot
2023-12-11 00:39:24 -05:00
9617660342
Update Prometheus Grafana dashboard
2023-12-11 00:23:08 -05:00
ab0a1e0946
Update and streamline README and update images
2023-12-10 23:57:01 -05:00
7c116b2fbc
Ensure node health value is an int
2023-12-10 23:56:50 -05:00
1023c55087
Fix bug in VM state list
2023-12-10 23:44:01 -05:00
9235187c6f
Port Ceph functions to read_many
...
Only ports getOSDInformation, as all the others feature 3 or less reads
which is acceptable sequentially.
2023-12-10 22:24:38 -05:00
0c94f1b4f8
Port Network functions to read_many
2023-12-10 22:19:21 -05:00
44a4f0e1f7
Use new info detail output instead of new lists
...
Avoids multiple additional ZK calls by using data that is now in the
status detail output.
2023-12-10 22:19:09 -05:00
5d53a3e529
Add state and faults detail to cluster information
...
We already parse this information out anyways, so might as well add it
to the API output JSON. This can be leveraged by the Prometheus endpoint
as well to avoid duplicate listings.
2023-12-10 17:29:32 -05:00