Commit Graph

413 Commits

Author SHA1 Message Date
Joshua Boniface 0769f1ea52 Increase service start time to 10s 2023-10-23 22:24:03 -04:00
Joshua Boniface c6c44bf775 Bump version to 0.9.78 2023-09-30 12:57:55 -04:00
Joshua Boniface 7c0f12750e Bump version to 0.9.77 2023-09-19 11:05:55 -04:00
Joshua Boniface 51e78480fa Bump version to 0.9.76 2023-09-18 10:15:52 -04:00
Joshua Boniface f46bfc962f Bump version to 0.9.75 2023-09-16 23:06:38 -04:00
Joshua Boniface 457b7bed3d Handle exceptions in fence migrations 2023-09-16 22:56:09 -04:00
Joshua Boniface 86115b2928 Add startup message for IPMI reachability
It's good to know that this succeeded in addition to knowing if it
failed.
2023-09-16 22:41:58 -04:00
Joshua Boniface 1a906b589e Bump version to 0.9.74 2023-09-16 00:18:13 -04:00
Joshua Boniface 48662e90c1 Remove obsolete monitoring_instance passing 2023-09-15 22:47:45 -04:00
Joshua Boniface 079381c03e Move printing to end and add runtime 2023-09-15 22:40:09 -04:00
Joshua Boniface 794cea4a02 Reverse ordering, run checks before starting timer 2023-09-15 22:25:37 -04:00
Joshua Boniface 479e156234 Run monitoring plugins once on startup 2023-09-15 17:53:16 -04:00
Joshua Boniface 86830286f3 Adjust message printing to be on one line 2023-09-15 17:00:34 -04:00
Joshua Boniface 4d51318a40 Make monitoring interval configurable 2023-09-15 16:54:51 -04:00
Joshua Boniface cba6f5be48 Fix wording of non-coordinator state 2023-09-15 16:51:04 -04:00
Joshua Boniface 254303b9d4 Use coordinator_state instead of router_state
Makes it much clearer what this variable represents.
2023-09-15 16:47:56 -04:00
Joshua Boniface 40b7d68853 Separate monitoring and move to 60s interval
Removes the dependency of the monitoring subsystem from the node
keepalives, and runs them at a 60s interval to avoid excessive backups
if a plugin takes too long.

Adds its own logs and related items as required.

Finally adds a new required argument to the run() of plugins, the
coordinator state, which can be used by a plugin to determine actions
based on whether the node is a primary, secondary, or non-coordinator.
2023-09-15 16:47:11 -04:00
Joshua Boniface a8115cafd1 Bump version to 0.9.73 2023-09-02 02:16:19 -04:00
Joshua Boniface 570da99605 Avoid failures if no children found 2023-09-02 01:36:17 -04:00
Joshua Boniface fdda47e8a2 Bump version to 0.9.72 2023-09-01 16:34:45 -04:00
Joshua Boniface bb2aac145d Bump version to 0.9.71 2023-09-01 00:36:38 -04:00
Joshua Boniface 6c407d54c3 Bump version to 0.9.70 2023-08-31 14:15:54 -04:00
Joshua Boniface cb413e5ce6 [Bookworm] Fix Ceph 16 OSD stat parsing 2023-08-31 00:45:03 -04:00
Joshua Boniface 123499f75f [Bookworm] Specify YAML loader explicitly 2023-08-31 00:16:19 -04:00
Joshua Boniface 83b8ce7b62 Bump version to 0.9.69 (nice) 2023-08-29 22:02:13 -04:00
Joshua Boniface 5e43f9bd7c Ensure Patroni failures do not block takeover 2023-08-29 22:00:11 -04:00
Joshua Boniface ed087d83c2 Found cpuload to 2 decimal places 2023-08-29 21:41:44 -04:00
Joshua Boniface 83d475bd15 Bump version to 0.9.68 2023-08-27 20:59:23 -04:00
Joshua Boniface 705ec802a3 Bump version to 0.9.67 2023-08-27 14:47:20 -04:00
Joshua Boniface 0b90f37518 Bump version to 0.9.66 2023-08-27 11:41:22 -04:00
Joshua Boniface 1e083d7652 Bump version to 0.9.65 2023-08-23 01:56:57 -04:00
Joshua Boniface 075dbe7cc9 Bump version to 0.9.64 2023-08-18 12:34:27 -04:00
Joshua Boniface b5f996febd Fix bugs for node flush for stop/shutdown/restart
Previously VMs in stop/shutdown/restart states wouldn't be properly
handled during a node flush. This fixes the bugs and ensures that the
transient VM states (shutdown/restart) are completed before proceeding,
and then avoids setting a stopped/shutdown VM to shutdown/auotstart.
2023-08-18 11:25:59 -04:00
Joshua Boniface 3a90fda109 Bump version to 0.9.63 2023-04-28 14:47:04 -04:00
Joshua Boniface 2c3a3cdf52 Use try when watching health value in NodeInstance 2023-03-07 09:53:01 -05:00
Joshua Boniface 7c07fbefff Adjust keepalive health printing and ordering 2023-02-24 11:08:30 -05:00
Joshua Boniface 202dc3ed59 Correct error handling if monitoring plugins fail 2023-02-24 10:19:41 -05:00
Joshua Boniface 45ad3b9a17 Bump version to 0.9.62 2023-02-22 18:13:45 -05:00
Joshua Boniface e45b3108a2 Add health delta change to message output 2023-02-22 15:02:08 -05:00
Joshua Boniface 118237a53b Fix bad string value for message 2023-02-22 15:02:08 -05:00
Joshua Boniface 1093ca6264 Disallow health less than 0 2023-02-15 16:50:24 -05:00
Joshua Boniface f4eef30770 Add JSON health to cluster data 2023-02-15 15:26:57 -05:00
Joshua Boniface 0ecf219910 Run setup during plugin loads 2023-02-15 10:11:38 -05:00
Joshua Boniface 0f4edc54d1 Use percentage in keepalie output 2023-02-15 01:56:02 -05:00
Joshua Boniface 14d29f2986 Adjust text on log message 2023-02-13 22:21:23 -05:00
Joshua Boniface bc88d764b0 Add logging flag for montioring plugin output 2023-02-13 22:04:39 -05:00
Joshua Boniface b07396c39a Fix bugs if plugins fail to load 2023-02-13 21:51:48 -05:00
Joshua Boniface 1ea4800212 Set node health to None when restarting 2023-02-13 15:54:46 -05:00
Joshua Boniface 9c14d84bfc Add node health value and send out API 2023-02-13 15:53:39 -05:00
Joshua Boniface d8f346abdd Move Ceph cluster health reporting to plugin
Also removes several outputs from the normal keepalive that were
superfluous/static so that the main output fits on one line.
2023-02-13 13:29:40 -05:00