Joshua Boniface
3346ce9bb0
Add missing shutdown state from combinations
2023-12-27 13:40:30 -05:00
Joshua Boniface
572596c575
Fix missing f-string placeholder
2023-12-27 13:21:20 -05:00
Joshua Boniface
e654fbba08
Move debug condition handling to Logger
...
Avoids many dozens of conditionals sprinkled throughout the code by
centralizing this check into the main Logger instance.
2023-12-27 13:01:45 -05:00
Joshua Boniface
52bf5ad0ef
Update store_path set location
...
Prevents a bug if no cluster is selected while doing connection list
commands.
2023-12-27 12:42:19 -05:00
Joshua Boniface
576afc1e94
Update Grafana dashboard layouts
2023-12-27 12:24:46 -05:00
Joshua Boniface
4375f66793
Use proper get() for invalid values
2023-12-27 12:03:48 -05:00
Joshua Boniface
3df3ca5b44
Fix value for OSD utilization
...
Ceph provides in KB; convert to bytes.
2023-12-27 11:56:50 -05:00
Joshua Boniface
cb3c2cd86d
Adjust name of PVC cluster dashboard
2023-12-27 11:42:58 -05:00
Joshua Boniface
d0de4f1825
Update Grafana dashboard to overview
...
Adds resource utilization in addition to health.
2023-12-27 11:38:39 -05:00
Joshua Boniface
494c20263d
Move monitoring folder to top level
2023-12-27 11:37:49 -05:00
Joshua Boniface
431ee69620
Use proper percentage for pool util
2023-12-27 10:03:00 -05:00
Joshua Boniface
88f4d79d5a
Handle invalid values on older Libvirt versions
2023-12-27 09:51:24 -05:00
Joshua Boniface
84d22751d8
Fix bad JSON data handler
2023-12-27 09:43:37 -05:00
Joshua Boniface
40ff005a09
Fix handling of Ceph OSD bytes
2023-12-26 12:43:51 -05:00
Joshua Boniface
ab4ec7a5fa
Remove WebUI from README
2023-12-25 02:48:44 -05:00
Joshua Boniface
9604f655d0
Improve node utilization metrics and fix bugs
2023-12-25 02:47:41 -05:00
Joshua Boniface
3e4cc53fdd
Add node network statistics and utilization values
...
Adds a new physical network interface stats parser to the node
keepalives, and leverages this information to provide a network
utilization overview in the Prometheus metrics.
2023-12-21 15:45:01 -05:00
Joshua Boniface
d2d2a9c617
Include our newline atomically
...
Sometimes clashing log entries would print on the same line, likely due
to some sort of race condition in Python's print() built-in.
Instead, add a newline to our actual message and print without an end
character. This ensures atomic printing of our log messages.
2023-12-21 13:12:43 -05:00
Joshua Boniface
6ed4efad33
Add new network.stats key to nodes
2023-12-21 12:48:48 -05:00
Joshua Boniface
39f9f3640c
Rename health metrics and add resource metrics
2023-12-21 09:40:49 -05:00
Joshua Boniface
c64e888d30
Fix incorrect cast of None
2023-12-14 16:00:53 -05:00
Joshua Boniface
f1249452e5
Fix bug if no nodes are present
2023-12-14 15:32:18 -05:00
Joshua Boniface
0a93f526e0
Bump version to 0.9.86
2023-12-14 14:46:29 -05:00
Joshua Boniface
7c9512fb22
Fix broken config file in API migration script
2023-12-14 14:45:58 -05:00
Joshua Boniface
e88b97f3a9
Print fenced state in red
2023-12-13 15:02:18 -05:00
Joshua Boniface
709c9cb73e
Pause pvchealthd startup until node daemon is run
...
If the health daemon starts too soon during a node bootup, it will
generate generate tons of erroneous faults while the node starts up.
Adds a conditional wait for the current node daemon to be in "run"
state before the health daemon really starts up.
2023-12-13 14:53:54 -05:00
Joshua Boniface
f41c5176be
Ensure health value is an int properly
2023-12-13 14:34:02 -05:00
Joshua Boniface
38e43b46c3
Update health detail messages format
2023-12-13 03:17:47 -05:00
Joshua Boniface
ed9c37982a
Move metric collection into daemon library
2023-12-11 19:20:30 -05:00
Joshua Boniface
0f24184b78
Explicitly clear resources of fenced node
...
This actually solves the bug originally "fixed" in
5f1432ccdd
without breaking VM resource
allocations for working nodes.
2023-12-11 12:14:56 -05:00
Joshua Boniface
1ba37fe33d
Restore VM resource allocation location
...
Commit 5f1432ccdd
changed where these
happen due to a bug after fencing. However this completely broke node
resource reporting as only the final instance will be queried here.
Revert this change and look further into the original bug.
2023-12-11 11:52:59 -05:00
Joshua Boniface
1a05077b10
Fix missing fstring
2023-12-11 11:29:49 -05:00
Joshua Boniface
57c28376a6
Port one final Ceph function to read_many
2023-12-11 10:25:36 -05:00
Joshua Boniface
e781d742e6
Fix bug with volume and snapshot listing
2023-12-11 10:21:46 -05:00
Joshua Boniface
6c6d1508a1
Add VNC info to screenshots
2023-12-11 03:40:49 -05:00
Joshua Boniface
741dafb26b
Port VM functions to read_many
2023-12-11 03:34:36 -05:00
Joshua Boniface
032d3ebf18
Remove debug output from image
2023-12-11 03:23:10 -05:00
Joshua Boniface
5d9e83e8ed
Fix output bugs in VM information
2023-12-11 03:04:46 -05:00
Joshua Boniface
ad0bd8649f
Finish missing sentence
2023-12-11 02:39:39 -05:00
Joshua Boniface
9b5e53e4b6
Add Grafana dashboard screenshot
2023-12-11 00:39:24 -05:00
Joshua Boniface
9617660342
Update Prometheus Grafana dashboard
2023-12-11 00:23:08 -05:00
Joshua Boniface
ab0a1e0946
Update and streamline README and update images
2023-12-10 23:57:01 -05:00
Joshua Boniface
7c116b2fbc
Ensure node health value is an int
2023-12-10 23:56:50 -05:00
Joshua Boniface
1023c55087
Fix bug in VM state list
2023-12-10 23:44:01 -05:00
Joshua Boniface
9235187c6f
Port Ceph functions to read_many
...
Only ports getOSDInformation, as all the others feature 3 or less reads
which is acceptable sequentially.
2023-12-10 22:24:38 -05:00
Joshua Boniface
0c94f1b4f8
Port Network functions to read_many
2023-12-10 22:19:21 -05:00
Joshua Boniface
44a4f0e1f7
Use new info detail output instead of new lists
...
Avoids multiple additional ZK calls by using data that is now in the
status detail output.
2023-12-10 22:19:09 -05:00
Joshua Boniface
5d53a3e529
Add state and faults detail to cluster information
...
We already parse this information out anyways, so might as well add it
to the API output JSON. This can be leveraged by the Prometheus endpoint
as well to avoid duplicate listings.
2023-12-10 17:29:32 -05:00
Joshua Boniface
35e22cb50f
Simplify cluster status handling
...
This significantly simplifies cluster state handling by removing most of
the superfluous get_list() calls, replacing them with basic child reads
since most of them are just for a count anyways. The ones that require
states simplify this down to a child read plus direct reads for the
exact items required while leveraging the new read_many() function.
2023-12-10 17:05:46 -05:00
Joshua Boniface
a3171b666b
Split node health into separate function
2023-12-10 16:52:10 -05:00