Joshua Boniface
709c9cb73e
Pause pvchealthd startup until node daemon is run
...
If the health daemon starts too soon during a node bootup, it will
generate generate tons of erroneous faults while the node starts up.
Adds a conditional wait for the current node daemon to be in "run"
state before the health daemon really starts up.
2023-12-13 14:53:54 -05:00
Joshua Boniface
9dc5097dbc
Bump version to 0.9.85
2023-12-10 01:00:33 -05:00
Joshua Boniface
9aee2a9075
Bump version to 0.9.84
2023-12-09 23:05:40 -05:00
Joshua Boniface
b0557edb76
Ensure entry in name is uppercase
2023-12-09 17:01:41 -05:00
Joshua Boniface
47bd7bf2f5
Only run cluster-wide health checks on primary
...
Avoids multiple coordinators trying to write updated cluster-wide fault
events. Instead, they are now only written by the primary (or the
incoming primary if still in a transition).
2023-12-09 16:50:51 -05:00
Joshua Boniface
b9fbfe2ed5
Improve fault ID format
...
Instead of using random hex characters from an md5sum, use a nice name
in all-caps similar to how Ceph does. This further helps prevent dupes
but also permits a changing health delta within a single event (which
would really only ever apply to plugin faults).
2023-12-09 16:48:14 -05:00
Joshua Boniface
7e6d922877
Improve fault detail handling further
...
Since we already had a "details" field, simply move where it gets added
to the message later, in generate_fault, after the main message value
was used to generate the ID.
2023-12-09 16:13:36 -05:00
Joshua Boniface
82a7fd3c80
Add more debugging info to psql
2023-12-07 21:36:05 -05:00
Joshua Boniface
ddd9d9ee07
Adjust psql check to avoid weird failures
2023-12-07 15:07:59 -05:00
Joshua Boniface
9e2e749c55
Combine pvchealthd output into single log message
2023-12-07 14:00:43 -05:00
Joshua Boniface
157b8c20bf
Add Patroni output to debug logs
2023-12-07 14:00:35 -05:00
Joshua Boniface
bf158dc2d9
Shorten debug output
2023-12-07 13:31:20 -05:00
Joshua Boniface
1b84553405
Use passed coordinator state
2023-12-07 11:19:26 -05:00
Joshua Boniface
60dac143f2
Use simpler health calculation
2023-12-07 11:17:31 -05:00
Joshua Boniface
a13273335d
Add colon to result text
2023-12-07 11:15:42 -05:00
Joshua Boniface
e7f21b7058
Enhance and fix bugs in psql plugin
...
1. Check Patronictl statuses
2. Don't error during node primary transitions
2023-12-07 11:14:16 -05:00
Joshua Boniface
9dbadfdd6e
Move back to per-plugin fault reporting
2023-12-07 11:13:56 -05:00
Joshua Boniface
5691f75ac9
Fix bad import
2023-12-06 14:28:32 -05:00
Joshua Boniface
4a02c2c8e3
Add additional faults
2023-12-06 13:27:39 -05:00
Joshua Boniface
79eb54d5da
Move fault generation to common library
2023-12-06 13:17:10 -05:00
Joshua Boniface
067e73337f
Shorten health IDs to 8 characters
2023-12-04 15:48:27 -05:00
Joshua Boniface
b59f743690
Improve logging and handling of fault entries
2023-12-01 17:38:28 -05:00
Joshua Boniface
4c3f235e05
Avoid running fault updates in maintenance mode
...
When the cluster is in maintenance mode, all faults should be ignored.
2023-12-01 17:38:28 -05:00
Joshua Boniface
9c2b1b29ee
Add node health to fault states
...
Adjusts ordering and ensures that node health states are included in
faults if they are less than 50%.
Also adjusts fault ID generation and runs fault checks only coordinator
nodes to avoid too many runs.
2023-12-01 17:38:28 -05:00
Joshua Boniface
8594eb697f
Add initial fault generation in pvchealthd
...
References: #164
2023-12-01 17:38:27 -05:00
Joshua Boniface
988de1218f
Bump version to 0.9.83
2023-12-01 17:37:42 -05:00
Joshua Boniface
915a84ee3c
Fix psql check for new configs
2023-12-01 03:58:21 -05:00
Joshua Boniface
03a738f878
Move config parser into daemon_lib
...
And reformat/add config values for API.
2023-11-30 00:05:37 -05:00
Joshua Boniface
97eb63ebab
Clean up config naming and dead files
2023-11-29 21:21:51 -05:00
Joshua Boniface
077dd8708f
Add check start message
2023-11-29 21:21:51 -05:00
Joshua Boniface
b6b5786c3b
Output list in cyan (s state)
2023-11-29 21:21:51 -05:00
Joshua Boniface
d2b764a2c7
Output more details on startup
2023-11-29 21:21:51 -05:00
Joshua Boniface
7a7c975eff
Ensure return from health shutdown
2023-11-29 21:21:51 -05:00
Joshua Boniface
647cba3cf5
Expand startup width for new daemon name
2023-11-29 21:21:51 -05:00
Joshua Boniface
921ecb3a05
Fix name in kydb plugin
2023-11-29 21:21:51 -05:00
Joshua Boniface
41f4e4fb2f
Split health monitoring into discrete daemon/pkg
2023-11-29 21:21:51 -05:00