Update Munin plugin example

2023-02-16 16:06:00 -05:00
parent 3bd93563e6
commit eda1b95d5f
2 changed files with 63 additions and 102 deletions
--- a/node-daemon/monitoring/README.md
+++ b/node-daemon/monitoring/README.md
@ -4,38 +4,25 @@ This directory contains several monitoring resources that can be used with vario

 ## Munin

-The included Munin plugin can be activated by linking to it from `/etc/munin/plugins/pvc`. By default, this plugin triggers a CRITICAL state when either the PVC or Storage cluster becomes Degraded, and is otherwise OK. The overall health is graphed numerically (Optimal is 0, Maintenance is 1, Degraded is 2) so that the cluster health can be tracked over time.
+The included Munin plugins can be activated by linking to them from `/etc/munin/plugins/`. Two plugins are provided:

-When using this plugin, it might be useful to adjust the thresholds with a plugin configuration. For instance, one could adjust the Degraded value from CRITICAL to WARNING by adjusting the critical threshold to a value higher than 1.99 (e.g. 3, 10, etc.) so that only the WARNING threshold will be hit. Alternatively one could instead make Maintenance mode trigger a WARNING by lowering the threshold to 0.99.
+* `pvc`: Checks the PVC cluster and node health, providing two graphs, one for each.

-Example plugin configuration:
+* `ceph_utilization`: Checks the Ceph cluster statistics, providing multiple graphs. Note that this plugin is independent of PVC itself, and makes local calls to various Ceph commands itself.

-```
-[pvc]
-# Make cluster warn on maintenance
-env.pvc_cluster_warning 0.99
-# Disable critical threshold (>2)
-env.pvc_cluster_critical 3
-# Make storage warn on maintenance, crit on degraded (latter is default)
-env.pvc_storage_warning 0.99
-env.pvc_storage_critical 1.99
-```
+The `pvc` plugin provides no configuration; the status is hardcoded such that <=90% health is warning, <=50% health is critical, and maintenance state forces OK.
+
+The `ceph_utilization` plugin provides no configuration; only the cluster utilization graph alerts such that >80% used is warning and >90% used is critical. Ceph itself begins warning above 80% as well.

 ## CheckMK

-The included CheckMK plugin is divided into two parts: the agent plugin, and the monitoring server plugin, and can be activated as follows:
+The included CheckMK plugin is divided into two parts: the agent plugin, and the monitoring server plugin. This monitoring server plugin requires CheckMK version 2.0 or higher. The two parts can be installed as follows:

-### Agent plugin: `pvc`
+* `pvc`: Place this file in the `/usr/lib/check_mk_agent/plugins/` directory on each node.

-Place this file in the `/usr/lib/check_mk_agent/plugins/` directory on each node.
+* `pvc.py`: Place this file in the `~/local/lib/python3/cmk/base/plugins/agent_based/` directory on the CheckMK monitoring host for each monitoring site.

-### Server plugin: `pvc.py`
-
-This monitoring server plugin requires CheckMK version 2.0 or higher.
-
-Place this file in the `~/local/lib/python3/cmk/base/plugins/agent_based/` directory for each monitoring site.
-
-### Output
+The plugin provides no configuration: the status is hardcoded such that <=90% health is warning, <=50% health is critical, and maintenance state forces OK.

 With both the agent and server plugins installed, you can then run `cmk -II <node>` (or use WATO) to inventory each node, which should produce two new checks: