Commit Graph

389 Commits

Author SHA1 Message Date
e8f0005894 Bump version to 0.9.69 (nice) 2023-08-29 22:02:13 -04:00
e15f9ed509 Ensure Patroni failures do not block takeover 2023-08-29 22:00:11 -04:00
26921d81cc Found cpuload to 2 decimal places 2023-08-29 21:41:44 -04:00
2e1269eaae Bump version to 0.9.68 2023-08-27 20:59:23 -04:00
1c79ce05ac Bump version to 0.9.67 2023-08-27 14:47:20 -04:00
d08b90f90d Bump version to 0.9.66 2023-08-27 11:41:22 -04:00
ce9eaaac8e Bump version to 0.9.65 2023-08-23 01:56:57 -04:00
529ecfdcf0 Bump version to 0.9.64 2023-08-18 12:34:27 -04:00
36558c73b8 Fix bugs for node flush for stop/shutdown/restart
Previously VMs in stop/shutdown/restart states wouldn't be properly
handled during a node flush. This fixes the bugs and ensures that the
transient VM states (shutdown/restart) are completed before proceeding,
and then avoids setting a stopped/shutdown VM to shutdown/auotstart.
2023-08-18 11:25:59 -04:00
3fa111aba5 Bump version to 0.9.63 2023-04-28 14:47:04 -04:00
2af217ced1 Use try when watching health value in NodeInstance 2023-03-07 09:53:01 -05:00
6ac4b7a54e Adjust keepalive health printing and ordering 2023-02-24 11:08:30 -05:00
faa96ff6c4 Correct error handling if monitoring plugins fail 2023-02-24 10:19:41 -05:00
646785b7f8 Bump version to 0.9.62 2023-02-22 18:13:45 -05:00
a9e7713abf Add health delta change to message output 2023-02-22 15:02:08 -05:00
0f3cd13da1 Fix bad string value for message 2023-02-22 15:02:08 -05:00
4ab0bdd9e8 Disallow health less than 0 2023-02-15 16:50:24 -05:00
3a1b8f0e7a Add JSON health to cluster data 2023-02-15 15:26:57 -05:00
fc16e26f23 Run setup during plugin loads 2023-02-15 10:11:38 -05:00
8aa74aae62 Use percentage in keepalie output 2023-02-15 01:56:02 -05:00
8e6632bf10 Adjust text on log message 2023-02-13 22:21:23 -05:00
96d3aff7ad Add logging flag for montioring plugin output 2023-02-13 22:04:39 -05:00
54373c5bec Fix bugs if plugins fail to load 2023-02-13 21:51:48 -05:00
af436a93cc Set node health to None when restarting 2023-02-13 15:54:46 -05:00
edb3aea990 Add node health value and send out API 2023-02-13 15:53:39 -05:00
4d786c11e3 Move Ceph cluster health reporting to plugin
Also removes several outputs from the normal keepalive that were
superfluous/static so that the main output fits on one line.
2023-02-13 13:29:40 -05:00
25f3faa08f Move Ceph cluster health reporting to plugin
Also removes several outputs from the normal keepalive that were
superfluous/static so that the main output fits on one line.
2023-02-13 12:13:56 -05:00
3ad6ff2d9c Initial implementation of monitoring plugin system 2023-02-13 12:06:26 -05:00
c7c47d9f86 Bump version to 0.9.61 2023-02-08 10:08:05 -05:00
0b8d26081b Bump version to 0.9.60 2022-12-06 15:42:55 -05:00
f3ba4b6294 Bump version to 0.9.59 2022-11-15 15:50:15 -05:00
a28df75a5d Bump version to 0.9.58 2022-11-07 12:27:48 -05:00
d63e80675a Bump version to 0.9.57 2022-11-06 01:39:50 -04:00
ef3c22d793 Bump version to 0.9.56 2022-10-27 14:21:04 -04:00
a81d419a2e Update copyright header year 2022-10-06 11:55:27 -04:00
c84ee0f4f1 Bump version to 0.9.55 2022-10-04 13:21:40 -04:00
76c51460b0 Avoid raise/handle deadlocks
Can cause log flooding in some edge cases and isn't really needed any
longer. Use a proper conditional followed by an actual error handler.
2022-10-03 14:04:12 -04:00
4b41ee2817 Bump version to 0.9.54 2022-08-23 11:01:05 -04:00
6146b062d6 Bump version to 0.9.53 2022-08-12 17:47:11 -04:00
73c1ac732e Bump version to 0.9.52 2022-08-12 11:09:25 -04:00
58dd5830eb Add additional kb_ values to OSD stats
Allows for easier parsing later to get e.g. % values and more details on
the used amounts.
2022-08-11 11:06:36 -04:00
5ae430e1c5 Bump version to 0.9.51 2022-07-25 23:25:41 -04:00
e464dcb483 Bump version to 0.9.50 2022-07-06 16:01:14 -04:00
27214c8190 Fix bug with space-containing detect strings 2022-07-06 15:58:57 -04:00
baf5a132ff Bump version to 0.9.49 2022-05-06 15:49:39 -04:00
21bbb0393f Add support for replacing/refreshing OSDs
Adds commands to both replace an OSD disk, and refresh (reimport) an
existing OSD disk on a new node. This handles the cases where an OSD
disk should be replaced (either due to upgrades or failures) or where a
node is rebuilt in-place and an existing OSD must be re-imported to it.

This should avoid the need to do a full remove/add sequence for either
case.

Also cleans up some aspects of OSD removal that are identical between
methods (e.g. using safe-to-destroy and sleeping after stopping) and
fixes a bug if an OSD does not truly exist when the daemon starts up.
2022-05-06 15:32:06 -04:00
1f8f3252a6 Fix bug with initial JSON for stats 2022-05-02 13:28:19 -04:00
b47c9832b7 Refactor OSD removal to use new ZK data
With the OSD LVM information stored in Zookeeper, we can use this to
determine the actual block device to zap rather than relying on runtime
determination and guestimation.
2022-05-02 12:52:22 -04:00
d2757004db Store additional OSD information in ZK
Ensures that information like the FSIDs and the OSD LVM volume are
stored in Zookeeper at creation time and updated at daemon start time
(to ensure the data is populated at least once, or if the /dev/sdX
path changes).

This will allow safer operation of OSD removals and the potential
implementation of re-activation after node replacements.
2022-05-02 12:11:39 -04:00
7323269775 Ensure initial OSD stats is populated
Values are all invalid but this ensures the client won't error out when
trying to show an OSD that has never checked in yet.
2022-04-29 16:50:30 -04:00