38 Commits

Author SHA1 Message Date
e654fbba08 Move debug condition handling to Logger
Avoids many dozens of conditionals sprinkled throughout the code by
centralizing this check into the main Logger instance.
2023-12-27 13:01:45 -05:00
0a93f526e0 Bump version to 0.9.86 2023-12-14 14:46:29 -05:00
709c9cb73e Pause pvchealthd startup until node daemon is run
If the health daemon starts too soon during a node bootup, it will
generate generate tons of erroneous faults while the node starts up.
Adds a conditional wait for the current node daemon to be in "run"
state before the health daemon really starts up.
2023-12-13 14:53:54 -05:00
9dc5097dbc Bump version to 0.9.85 2023-12-10 01:00:33 -05:00
9aee2a9075 Bump version to 0.9.84 2023-12-09 23:05:40 -05:00
b0557edb76 Ensure entry in name is uppercase 2023-12-09 17:01:41 -05:00
47bd7bf2f5 Only run cluster-wide health checks on primary
Avoids multiple coordinators trying to write updated cluster-wide fault
events. Instead, they are now only written by the primary (or the
incoming primary if still in a transition).
2023-12-09 16:50:51 -05:00
b9fbfe2ed5 Improve fault ID format
Instead of using random hex characters from an md5sum, use a nice name
in all-caps similar to how Ceph does. This further helps prevent dupes
but also permits a changing health delta within a single event (which
would really only ever apply to plugin faults).
2023-12-09 16:48:14 -05:00
7e6d922877 Improve fault detail handling further
Since we already had a "details" field, simply move where it gets added
to the message later, in generate_fault, after the main message value
was used to generate the ID.
2023-12-09 16:13:36 -05:00
82a7fd3c80 Add more debugging info to psql 2023-12-07 21:36:05 -05:00
ddd9d9ee07 Adjust psql check to avoid weird failures 2023-12-07 15:07:59 -05:00
9e2e749c55 Combine pvchealthd output into single log message 2023-12-07 14:00:43 -05:00
157b8c20bf Add Patroni output to debug logs 2023-12-07 14:00:35 -05:00
bf158dc2d9 Shorten debug output 2023-12-07 13:31:20 -05:00
1b84553405 Use passed coordinator state 2023-12-07 11:19:26 -05:00
60dac143f2 Use simpler health calculation 2023-12-07 11:17:31 -05:00
a13273335d Add colon to result text 2023-12-07 11:15:42 -05:00
e7f21b7058 Enhance and fix bugs in psql plugin
1. Check Patronictl statuses
2. Don't error during node primary transitions
2023-12-07 11:14:16 -05:00
9dbadfdd6e Move back to per-plugin fault reporting 2023-12-07 11:13:56 -05:00
5691f75ac9 Fix bad import 2023-12-06 14:28:32 -05:00
4a02c2c8e3 Add additional faults 2023-12-06 13:27:39 -05:00
79eb54d5da Move fault generation to common library 2023-12-06 13:17:10 -05:00
067e73337f Shorten health IDs to 8 characters 2023-12-04 15:48:27 -05:00
b59f743690 Improve logging and handling of fault entries 2023-12-01 17:38:28 -05:00
4c3f235e05 Avoid running fault updates in maintenance mode
When the cluster is in maintenance mode, all faults should be ignored.
2023-12-01 17:38:28 -05:00
9c2b1b29ee Add node health to fault states
Adjusts ordering and ensures that node health states are included in
faults if they are less than 50%.

Also adjusts fault ID generation and runs fault checks only coordinator
nodes to avoid too many runs.
2023-12-01 17:38:28 -05:00
8594eb697f Add initial fault generation in pvchealthd
References: #164
2023-12-01 17:38:27 -05:00
988de1218f Bump version to 0.9.83 2023-12-01 17:37:42 -05:00
915a84ee3c Fix psql check for new configs 2023-12-01 03:58:21 -05:00
03a738f878 Move config parser into daemon_lib
And reformat/add config values for API.
2023-11-30 00:05:37 -05:00
97eb63ebab Clean up config naming and dead files 2023-11-29 21:21:51 -05:00
077dd8708f Add check start message 2023-11-29 21:21:51 -05:00
b6b5786c3b Output list in cyan (s state) 2023-11-29 21:21:51 -05:00
d2b764a2c7 Output more details on startup 2023-11-29 21:21:51 -05:00
7a7c975eff Ensure return from health shutdown 2023-11-29 21:21:51 -05:00
647cba3cf5 Expand startup width for new daemon name 2023-11-29 21:21:51 -05:00
921ecb3a05 Fix name in kydb plugin 2023-11-29 21:21:51 -05:00
41f4e4fb2f Split health monitoring into discrete daemon/pkg 2023-11-29 21:21:51 -05:00