153 Commits

Author SHA1 Message Date
3ad6ff2d9c Initial implementation of monitoring plugin system 2023-02-13 12:06:26 -05:00
c7c47d9f86 Bump version to 0.9.61 2023-02-08 10:08:05 -05:00
0b8d26081b Bump version to 0.9.60 2022-12-06 15:42:55 -05:00
f3ba4b6294 Bump version to 0.9.59 2022-11-15 15:50:15 -05:00
a28df75a5d Bump version to 0.9.58 2022-11-07 12:27:48 -05:00
d63e80675a Bump version to 0.9.57 2022-11-06 01:39:50 -04:00
ef3c22d793 Bump version to 0.9.56 2022-10-27 14:21:04 -04:00
078f85b431 Add node autoready oneshot unit
This replicates some of the more important functionality of the defunct
pvc-flush.service unit. On presence of a trigger file (i.e.
/etc/pvc/autoready), it will trigger a "node ready" on boot. It does
nothing on shutdown as this must be handled by other mechanisms, though
a similar autoflush could be added as well.
2022-10-27 14:09:14 -04:00
c84ee0f4f1 Bump version to 0.9.55 2022-10-04 13:21:40 -04:00
4b41ee2817 Bump version to 0.9.54 2022-08-23 11:01:05 -04:00
6146b062d6 Bump version to 0.9.53 2022-08-12 17:47:11 -04:00
73c1ac732e Bump version to 0.9.52 2022-08-12 11:09:25 -04:00
5ae430e1c5 Bump version to 0.9.51 2022-07-25 23:25:41 -04:00
4731faa2f0 Remove pvc-flush service
This service caused more headaches than it was worth, so remove it.

The original goal was to cleanly flush nodes on shutdown and unflush
them on startup, but this is tightly controlled by Ansible playbooks at
this point, and this is something best left to the Administrator and
their particular situation anyways.
2022-07-25 23:21:34 -04:00
8cfcd02ac2 Fix bad changelog entries 2022-07-06 16:57:55 -04:00
e464dcb483 Bump version to 0.9.50 2022-07-06 16:01:14 -04:00
baf5a132ff Bump version to 0.9.49 2022-05-06 15:49:39 -04:00
85463f9aec Bump version to 0.9.48 2022-04-29 15:03:52 -04:00
313a5d1c7d Bump version to 0.9.47 2021-12-28 22:03:08 -05:00
c3d255be65 Bump version to 0.9.46 2021-12-28 15:02:14 -05:00
02a2f6a27a Bump version to 0.9.45 2021-11-25 09:34:20 -05:00
3aa20fbaa3 Bump version to 0.9.44 2021-11-11 16:20:38 -05:00
6febcfdd97 Bump version to 0.9.43 2021-11-08 02:29:17 -05:00
52c3e8ced3 Fix bad test in postinst 2021-10-19 00:27:12 -04:00
23165482df Bump version to 0.9.42 2021-10-12 15:25:42 -04:00
d1f2ce0b0a Bump version to 0.9.41 2021-10-09 19:39:21 -04:00
ee348593c9 Bump version to 0.9.40 2021-10-07 14:42:04 -04:00
e79d200244 Bump version to 0.9.39 2021-10-07 11:52:38 -04:00
3449069e3d Bump version to 0.9.38 2021-10-03 22:32:41 -04:00
e9b69c4124 Revamp postinst for the API daemon
Ensures that the worker is always restarted and make the NOTE
conditional more specific.
2021-10-03 15:15:26 -04:00
19ac1e17c3 Bump version to 0.9.37 2021-09-30 02:08:14 -04:00
eba142f470 Bump version to 0.9.36 2021-09-23 14:01:38 -04:00
3e3776a25b Bump version to 0.9.35 2021-09-13 02:20:46 -04:00
e9735113af Bump version to 0.9.34 2021-08-24 16:15:25 -04:00
560c013e95 Bump version to 0.9.33 2021-08-21 03:28:48 -04:00
4014ef7714 Bump version to 0.9.32 2021-08-19 12:37:58 -04:00
7ecc6a2635 Bump version to 0.9.31 2021-07-30 12:08:12 -04:00
32613ff119 Remove obsolete Suggests lines from control 2021-07-20 00:35:21 -04:00
2a99a27feb Bump version to 0.9.30 2021-07-20 00:01:45 -04:00
fa1d93e933 Bump version to 0.9.29 2021-07-19 16:55:41 -04:00
71e4d0b32a Bump version to 0.9.28 2021-07-19 09:29:34 -04:00
15d92c483f Bump version to 0.9.27 2021-07-19 00:03:40 -04:00
602093029c Bump version to 0.9.26 2021-07-18 20:49:52 -04:00
2e9f6ac201 Bump version to 0.9.25 2021-07-11 23:19:09 -04:00
cfeba50b17 Revert "Return to all command-based Ceph gathering"
This reverts commit 65d14ccd92f3c008e8728aea85c47beb0644c1ec.

This was actually a bad idea. For inexplicable reasons, running these
Ceph commands manually (not even via Python, but in a normal shell)
takes 7 * two orders of magnitude longer than running them with the
Rados module, so long in fact that some basic commands like "ceph
health" would sometimes take longer than the 1 second timeout to
complete. The Rados commands would however take about 1ms instead.

Despite the occasional issues when monitors drop out, the Rados module
is clearly far superior to the shell commands for any moderately-loaded
Ceph cluster. We can look into solving timeouts another way (perhaps
with Processes instead of Threads) at a later time.

Rados module "ceph health":
    b'{"checks":{},"status":"HEALTH_OK"}'
    0.001204 (s)
    b'{"checks":{},"status":"HEALTH_OK"}'
    0.001258 (s)
Command "ceph health":
    joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
    real    0m0.772s
    user    0m0.707s
    sys     0m0.046s
    joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
    real    0m0.796s
    user    0m0.728s
    sys     0m0.054s
2021-07-10 03:47:45 -04:00
551bae2518 Bump version to 0.9.24 2021-07-09 15:58:36 -04:00
65d14ccd92 Return to all command-based Ceph gathering
Using the Rados module was very problematic, specifically because it had
no sensible timeout parameters and thus would hang for many seconds.
This has poor implications since it blocks further keepalives.

Instead, remove the Rados usage entirely and go back completely to using
manual OS commands to gather this information. While this may cause PID
exhaustion more quickly it's worthwhile to avoid failure scenarios when
Ceph stats time out.

Closes #137
2021-07-06 11:30:45 -04:00
adc022f55d Add missing install of pvcapid-worker.sh 2021-07-06 09:40:42 -04:00
7082982a33 Bump version to 0.9.23 2021-07-05 23:40:32 -04:00
37cd278bc2 Bump version to 0.9.22 2021-07-05 14:18:51 -04:00