Commit Graph

3207 Commits

Author SHA1 Message Date
Joshua Boniface 29a740882b Update some graph positions and sizes 2023-12-29 14:37:00 -05:00
Joshua Boniface 04f6f7acab Add VM dashboard 2023-12-29 14:28:42 -05:00
Joshua Boniface c1ae571213 Add additional VM details to Prometheus 2023-12-29 14:09:39 -05:00
Joshua Boniface 88a377c1aa Update positions and IOPS field 2023-12-29 14:09:25 -05:00
Joshua Boniface 73bf256650 Update Prometheus readmes 2023-12-29 11:22:52 -05:00
Joshua Boniface 123c7ce857 Update copyright header on all files for 2024
Last release of 2023 is probably the best time to do this.
2023-12-29 11:16:59 -05:00
Joshua Boniface 3b3ffaf2d4 Add Prometheus file SD output to connection list
Allows an administrator to easily generate a Prometheus file service
discovery configuration via the CLI for all clusters they have
configured. Assumes that all the various connection details are correct,
and due to the limits of the file SD config does not include the scheme
or SSL verification options (as these are global in Prometheus).
2023-12-29 11:13:54 -05:00
Joshua Boniface 2309b9dcf0 Make final dashboard read-only 2023-12-29 10:31:48 -05:00
Joshua Boniface 51b9f062b7 Add descriptions for each panel and reset version. 2023-12-29 10:30:28 -05:00
Joshua Boniface e4ca74c201 Add Zookeeper performance to Grafana dashboard 2023-12-29 09:44:40 -05:00
Joshua Boniface 4969e90f8a Allow enable/disable of Prometheus endpoints
Since these are unauthenticated, it might be the case that an
administrator wishes to completely disable these metrics endpoints.
Provide that option via pvc.conf through pvc-ansible's existing
enable_prometheus_exporters option and the new enable_prometheus
configuration flag.

Defaults to "yes" to provide all functionality unless explicitly
disabled, as the author assumes that the PVC API is secured in other
ways as well and that metric information is not completely sensitive.
2023-12-29 09:25:10 -05:00
Joshua Boniface 52f68909f6 Update Grafana dashboard 2023-12-28 14:55:43 -05:00
Joshua Boniface 0bcf8cfe19 Add Zookeeper metrics proxy 2023-12-28 13:53:15 -05:00
Joshua Boniface 2bb24d3b57 Update Prometheus dashboard and add README 2023-12-27 15:57:12 -05:00
Joshua Boniface 8083b7a3e6 Bump version to 0.9.87 2023-12-27 13:40:51 -05:00
Joshua Boniface 3346ce9bb0 Add missing shutdown state from combinations 2023-12-27 13:40:30 -05:00
Joshua Boniface 572596c575 Fix missing f-string placeholder 2023-12-27 13:21:20 -05:00
Joshua Boniface e654fbba08 Move debug condition handling to Logger
Avoids many dozens of conditionals sprinkled throughout the code by
centralizing this check into the main Logger instance.
2023-12-27 13:01:45 -05:00
Joshua Boniface 52bf5ad0ef Update store_path set location
Prevents a bug if no cluster is selected while doing connection list
commands.
2023-12-27 12:42:19 -05:00
Joshua Boniface 576afc1e94 Update Grafana dashboard layouts 2023-12-27 12:24:46 -05:00
Joshua Boniface 4375f66793 Use proper get() for invalid values 2023-12-27 12:03:48 -05:00
Joshua Boniface 3df3ca5b44 Fix value for OSD utilization
Ceph provides in KB; convert to bytes.
2023-12-27 11:56:50 -05:00
Joshua Boniface cb3c2cd86d Adjust name of PVC cluster dashboard 2023-12-27 11:42:58 -05:00
Joshua Boniface d0de4f1825 Update Grafana dashboard to overview
Adds resource utilization in addition to health.
2023-12-27 11:38:39 -05:00
Joshua Boniface 494c20263d Move monitoring folder to top level 2023-12-27 11:37:49 -05:00
Joshua Boniface 431ee69620 Use proper percentage for pool util 2023-12-27 10:03:00 -05:00
Joshua Boniface 88f4d79d5a Handle invalid values on older Libvirt versions 2023-12-27 09:51:24 -05:00
Joshua Boniface 84d22751d8 Fix bad JSON data handler 2023-12-27 09:43:37 -05:00
Joshua Boniface 40ff005a09 Fix handling of Ceph OSD bytes 2023-12-26 12:43:51 -05:00
Joshua Boniface ab4ec7a5fa Remove WebUI from README 2023-12-25 02:48:44 -05:00
Joshua Boniface 9604f655d0 Improve node utilization metrics and fix bugs 2023-12-25 02:47:41 -05:00
Joshua Boniface 3e4cc53fdd Add node network statistics and utilization values
Adds a new physical network interface stats parser to the node
keepalives, and leverages this information to provide a network
utilization overview in the Prometheus metrics.
2023-12-21 15:45:01 -05:00
Joshua Boniface d2d2a9c617 Include our newline atomically
Sometimes clashing log entries would print on the same line, likely due
to some sort of race condition in Python's print() built-in.

Instead, add a newline to our actual message and print without an end
character. This ensures atomic printing of our log messages.
2023-12-21 13:12:43 -05:00
Joshua Boniface 6ed4efad33 Add new network.stats key to nodes 2023-12-21 12:48:48 -05:00
Joshua Boniface 39f9f3640c Rename health metrics and add resource metrics 2023-12-21 09:40:49 -05:00
Joshua Boniface c64e888d30 Fix incorrect cast of None 2023-12-14 16:00:53 -05:00
Joshua Boniface f1249452e5 Fix bug if no nodes are present 2023-12-14 15:32:18 -05:00
Joshua Boniface 0a93f526e0 Bump version to 0.9.86 2023-12-14 14:46:29 -05:00
Joshua Boniface 7c9512fb22 Fix broken config file in API migration script 2023-12-14 14:45:58 -05:00
Joshua Boniface e88b97f3a9 Print fenced state in red 2023-12-13 15:02:18 -05:00
Joshua Boniface 709c9cb73e Pause pvchealthd startup until node daemon is run
If the health daemon starts too soon during a node bootup, it will
generate generate tons of erroneous faults while the node starts up.
Adds a conditional wait for the current node daemon to be in "run"
state before the health daemon really starts up.
2023-12-13 14:53:54 -05:00
Joshua Boniface f41c5176be Ensure health value is an int properly 2023-12-13 14:34:02 -05:00
Joshua Boniface 38e43b46c3 Update health detail messages format 2023-12-13 03:17:47 -05:00
Joshua Boniface ed9c37982a Move metric collection into daemon library 2023-12-11 19:20:30 -05:00
Joshua Boniface 0f24184b78 Explicitly clear resources of fenced node
This actually solves the bug originally "fixed" in
5f1432ccdd without breaking VM resource
allocations for working nodes.
2023-12-11 12:14:56 -05:00
Joshua Boniface 1ba37fe33d Restore VM resource allocation location
Commit 5f1432ccdd changed where these
happen due to a bug after fencing. However this completely broke node
resource reporting as only the final instance will be queried here.

Revert this change and look further into the original bug.
2023-12-11 11:52:59 -05:00
Joshua Boniface 1a05077b10 Fix missing fstring 2023-12-11 11:29:49 -05:00
Joshua Boniface 57c28376a6 Port one final Ceph function to read_many 2023-12-11 10:25:36 -05:00
Joshua Boniface e781d742e6 Fix bug with volume and snapshot listing 2023-12-11 10:21:46 -05:00
Joshua Boniface 6c6d1508a1 Add VNC info to screenshots 2023-12-11 03:40:49 -05:00