Commit Graph

55 Commits

Author SHA1 Message Date
Joshua Boniface 8cb44c0c5d Bump version to 0.9.100 2024-08-30 11:03:33 -04:00
Joshua Boniface 02a775c99b Bump version to 0.9.99 2024-08-28 11:15:55 -04:00
Joshua Boniface 9aca8e215b Run IPMI check 3 times with 2s timeout
Avoids potential timeouts or deadlocks, and retries if a single try
fails.
2024-07-28 12:36:01 -04:00
Joshua Boniface 1aa5999109 Bump version to 0.9.98 2024-06-05 12:01:31 -04:00
Joshua Boniface f1fe0c63f5 Bump version to 0.9.97 2024-04-19 10:32:16 -04:00
Joshua Boniface 78c774b607 Bump version to 0.9.96 2024-03-08 14:23:07 -05:00
Joshua Boniface dee8d186cf Bump version to 0.9.95 2024-02-12 13:12:48 -05:00
Joshua Boniface d63cc2e661 Bump version to 0.9.94 2024-02-06 13:31:50 -05:00
Joshua Boniface 18f09196be Bump version to 0.9.93 2024-01-30 09:51:21 -05:00
Joshua Boniface df40b779af Bump version to 0.9.92 2024-01-29 09:39:10 -05:00
Joshua Boniface f29b4c2755 Bump version to 0.9.91 2024-01-23 10:40:59 -05:00
Joshua Boniface 86ca363697 Bump version to 0.9.90 2024-01-11 10:22:48 -05:00
Joshua Boniface e9b6072fa0 Bump version to 0.9.89 2024-01-09 12:15:53 -05:00
Joshua Boniface 1d480f5629 Bump version to 0.9.88 2023-12-29 14:56:33 -05:00
Joshua Boniface 123c7ce857 Update copyright header on all files for 2024
Last release of 2023 is probably the best time to do this.
2023-12-29 11:16:59 -05:00
Joshua Boniface 8083b7a3e6 Bump version to 0.9.87 2023-12-27 13:40:51 -05:00
Joshua Boniface 572596c575 Fix missing f-string placeholder 2023-12-27 13:21:20 -05:00
Joshua Boniface e654fbba08 Move debug condition handling to Logger
Avoids many dozens of conditionals sprinkled throughout the code by
centralizing this check into the main Logger instance.
2023-12-27 13:01:45 -05:00
Joshua Boniface 0a93f526e0 Bump version to 0.9.86 2023-12-14 14:46:29 -05:00
Joshua Boniface 709c9cb73e Pause pvchealthd startup until node daemon is run
If the health daemon starts too soon during a node bootup, it will
generate generate tons of erroneous faults while the node starts up.
Adds a conditional wait for the current node daemon to be in "run"
state before the health daemon really starts up.
2023-12-13 14:53:54 -05:00
Joshua Boniface 9dc5097dbc Bump version to 0.9.85 2023-12-10 01:00:33 -05:00
Joshua Boniface 9aee2a9075 Bump version to 0.9.84 2023-12-09 23:05:40 -05:00
Joshua Boniface b0557edb76 Ensure entry in name is uppercase 2023-12-09 17:01:41 -05:00
Joshua Boniface 47bd7bf2f5 Only run cluster-wide health checks on primary
Avoids multiple coordinators trying to write updated cluster-wide fault
events. Instead, they are now only written by the primary (or the
incoming primary if still in a transition).
2023-12-09 16:50:51 -05:00
Joshua Boniface b9fbfe2ed5 Improve fault ID format
Instead of using random hex characters from an md5sum, use a nice name
in all-caps similar to how Ceph does. This further helps prevent dupes
but also permits a changing health delta within a single event (which
would really only ever apply to plugin faults).
2023-12-09 16:48:14 -05:00
Joshua Boniface 7e6d922877 Improve fault detail handling further
Since we already had a "details" field, simply move where it gets added
to the message later, in generate_fault, after the main message value
was used to generate the ID.
2023-12-09 16:13:36 -05:00
Joshua Boniface 82a7fd3c80 Add more debugging info to psql 2023-12-07 21:36:05 -05:00
Joshua Boniface ddd9d9ee07 Adjust psql check to avoid weird failures 2023-12-07 15:07:59 -05:00
Joshua Boniface 9e2e749c55 Combine pvchealthd output into single log message 2023-12-07 14:00:43 -05:00
Joshua Boniface 157b8c20bf Add Patroni output to debug logs 2023-12-07 14:00:35 -05:00
Joshua Boniface bf158dc2d9 Shorten debug output 2023-12-07 13:31:20 -05:00
Joshua Boniface 1b84553405 Use passed coordinator state 2023-12-07 11:19:26 -05:00
Joshua Boniface 60dac143f2 Use simpler health calculation 2023-12-07 11:17:31 -05:00
Joshua Boniface a13273335d Add colon to result text 2023-12-07 11:15:42 -05:00
Joshua Boniface e7f21b7058 Enhance and fix bugs in psql plugin
1. Check Patronictl statuses
2. Don't error during node primary transitions
2023-12-07 11:14:16 -05:00
Joshua Boniface 9dbadfdd6e Move back to per-plugin fault reporting 2023-12-07 11:13:56 -05:00
Joshua Boniface 5691f75ac9 Fix bad import 2023-12-06 14:28:32 -05:00
Joshua Boniface 4a02c2c8e3 Add additional faults 2023-12-06 13:27:39 -05:00
Joshua Boniface 79eb54d5da Move fault generation to common library 2023-12-06 13:17:10 -05:00
Joshua Boniface 067e73337f Shorten health IDs to 8 characters 2023-12-04 15:48:27 -05:00
Joshua Boniface b59f743690 Improve logging and handling of fault entries 2023-12-01 17:38:28 -05:00
Joshua Boniface 4c3f235e05 Avoid running fault updates in maintenance mode
When the cluster is in maintenance mode, all faults should be ignored.
2023-12-01 17:38:28 -05:00
Joshua Boniface 9c2b1b29ee Add node health to fault states
Adjusts ordering and ensures that node health states are included in
faults if they are less than 50%.

Also adjusts fault ID generation and runs fault checks only coordinator
nodes to avoid too many runs.
2023-12-01 17:38:28 -05:00
Joshua Boniface 8594eb697f Add initial fault generation in pvchealthd
References: #164
2023-12-01 17:38:27 -05:00
Joshua Boniface 988de1218f Bump version to 0.9.83 2023-12-01 17:37:42 -05:00
Joshua Boniface 915a84ee3c Fix psql check for new configs 2023-12-01 03:58:21 -05:00
Joshua Boniface 03a738f878 Move config parser into daemon_lib
And reformat/add config values for API.
2023-11-30 00:05:37 -05:00
Joshua Boniface 97eb63ebab Clean up config naming and dead files 2023-11-29 21:21:51 -05:00
Joshua Boniface 077dd8708f Add check start message 2023-11-29 21:21:51 -05:00
Joshua Boniface b6b5786c3b Output list in cyan (s state) 2023-11-29 21:21:51 -05:00