Commit Graph

24 Commits

Author SHA1 Message Date
Joshua Boniface e654fbba08 Move debug condition handling to Logger
Avoids many dozens of conditionals sprinkled throughout the code by
centralizing this check into the main Logger instance.
2023-12-27 13:01:45 -05:00
Joshua Boniface b0557edb76 Ensure entry in name is uppercase 2023-12-09 17:01:41 -05:00
Joshua Boniface 47bd7bf2f5 Only run cluster-wide health checks on primary
Avoids multiple coordinators trying to write updated cluster-wide fault
events. Instead, they are now only written by the primary (or the
incoming primary if still in a transition).
2023-12-09 16:50:51 -05:00
Joshua Boniface b9fbfe2ed5 Improve fault ID format
Instead of using random hex characters from an md5sum, use a nice name
in all-caps similar to how Ceph does. This further helps prevent dupes
but also permits a changing health delta within a single event (which
would really only ever apply to plugin faults).
2023-12-09 16:48:14 -05:00
Joshua Boniface 7e6d922877 Improve fault detail handling further
Since we already had a "details" field, simply move where it gets added
to the message later, in generate_fault, after the main message value
was used to generate the ID.
2023-12-09 16:13:36 -05:00
Joshua Boniface 9e2e749c55 Combine pvchealthd output into single log message 2023-12-07 14:00:43 -05:00
Joshua Boniface bf158dc2d9 Shorten debug output 2023-12-07 13:31:20 -05:00
Joshua Boniface 60dac143f2 Use simpler health calculation 2023-12-07 11:17:31 -05:00
Joshua Boniface a13273335d Add colon to result text 2023-12-07 11:15:42 -05:00
Joshua Boniface 9dbadfdd6e Move back to per-plugin fault reporting 2023-12-07 11:13:56 -05:00
Joshua Boniface 5691f75ac9 Fix bad import 2023-12-06 14:28:32 -05:00
Joshua Boniface 4a02c2c8e3 Add additional faults 2023-12-06 13:27:39 -05:00
Joshua Boniface 79eb54d5da Move fault generation to common library 2023-12-06 13:17:10 -05:00
Joshua Boniface 067e73337f Shorten health IDs to 8 characters 2023-12-04 15:48:27 -05:00
Joshua Boniface b59f743690 Improve logging and handling of fault entries 2023-12-01 17:38:28 -05:00
Joshua Boniface 4c3f235e05 Avoid running fault updates in maintenance mode
When the cluster is in maintenance mode, all faults should be ignored.
2023-12-01 17:38:28 -05:00
Joshua Boniface 9c2b1b29ee Add node health to fault states
Adjusts ordering and ensures that node health states are included in
faults if they are less than 50%.

Also adjusts fault ID generation and runs fault checks only coordinator
nodes to avoid too many runs.
2023-12-01 17:38:28 -05:00
Joshua Boniface 8594eb697f Add initial fault generation in pvchealthd
References: #164
2023-12-01 17:38:27 -05:00
Joshua Boniface 97eb63ebab Clean up config naming and dead files 2023-11-29 21:21:51 -05:00
Joshua Boniface 077dd8708f Add check start message 2023-11-29 21:21:51 -05:00
Joshua Boniface b6b5786c3b Output list in cyan (s state) 2023-11-29 21:21:51 -05:00
Joshua Boniface d2b764a2c7 Output more details on startup 2023-11-29 21:21:51 -05:00
Joshua Boniface 7a7c975eff Ensure return from health shutdown 2023-11-29 21:21:51 -05:00
Joshua Boniface 41f4e4fb2f Split health monitoring into discrete daemon/pkg 2023-11-29 21:21:51 -05:00