Commit Graph

3305 Commits

Author SHA1 Message Date
Joshua Boniface 9e2e749c55 Combine pvchealthd output into single log message 2023-12-07 14:00:43 -05:00
Joshua Boniface 157b8c20bf Add Patroni output to debug logs 2023-12-07 14:00:35 -05:00
Joshua Boniface bf158dc2d9 Shorten debug output 2023-12-07 13:31:20 -05:00
Joshua Boniface 1b84553405 Use passed coordinator state 2023-12-07 11:19:26 -05:00
Joshua Boniface 60dac143f2 Use simpler health calculation 2023-12-07 11:17:31 -05:00
Joshua Boniface a13273335d Add colon to result text 2023-12-07 11:15:42 -05:00
Joshua Boniface e7f21b7058 Enhance and fix bugs in psql plugin
1. Check Patronictl statuses
2. Don't error during node primary transitions
2023-12-07 11:14:16 -05:00
Joshua Boniface 9dbadfdd6e Move back to per-plugin fault reporting 2023-12-07 11:13:56 -05:00
Joshua Boniface 61b39d0739 Fix incorrect cluster health calculation 2023-12-07 11:13:36 -05:00
Joshua Boniface 4bf80a5913 Fix missing datetime shrink 2023-12-06 17:15:36 -05:00
Joshua Boniface 6c0dfe16cf Improve word splitting for fault messages
This ensures that fault messages are split on word boundaries and that
the column length is equal to the longest of these if applicable.
2023-12-06 17:10:19 -05:00
Joshua Boniface 3fde494fc5 Add status back to short fault list 2023-12-06 16:53:23 -05:00
Joshua Boniface 0945b3faf3 Use same fault formatting for short and long 2023-12-06 16:19:44 -05:00
Joshua Boniface 1416f9edc0 Remove bad sort values 2023-12-06 14:38:29 -05:00
Joshua Boniface 5691f75ac9 Fix bad import 2023-12-06 14:28:32 -05:00
Joshua Boniface e0bf7f7d1a Fix bad ID values in acknowledge 2023-12-06 14:18:31 -05:00
Joshua Boniface 0c34c88a1f Fix bad dict key name 2023-12-06 14:16:19 -05:00
Joshua Boniface 20acf3295f Add mass ack/delete of faults 2023-12-06 13:59:39 -05:00
Joshua Boniface 4a02c2c8e3 Add additional faults 2023-12-06 13:27:39 -05:00
Joshua Boniface 6fc5c927a1 Properly sort status faults 2023-12-06 13:27:18 -05:00
Joshua Boniface d1e34e7333 Store fault times only to the second
Any more precision is unnecessary and saves 6 chars when displaying
these times elsewhere.
2023-12-06 13:20:18 -05:00
Joshua Boniface 79eb54d5da Move fault generation to common library 2023-12-06 13:17:10 -05:00
Joshua Boniface 536fb2080f Fix get_terminal_size over SSH 2023-12-06 13:11:28 -05:00
Joshua Boniface 2267a9c85d Improve output formatting for simplicity 2023-12-05 10:37:35 -05:00
Joshua Boniface 067e73337f Shorten health IDs to 8 characters 2023-12-04 15:48:27 -05:00
Joshua Boniface 672e58133f Implement interfaces to faults 2023-12-04 01:37:54 -05:00
Joshua Boniface b59f743690 Improve logging and handling of fault entries 2023-12-01 17:38:28 -05:00
Joshua Boniface 4c3f235e05 Avoid running fault updates in maintenance mode
When the cluster is in maintenance mode, all faults should be ignored.
2023-12-01 17:38:28 -05:00
Joshua Boniface 3dc48c1783 Lower default monitoring interval to 15s
Faults are also reported on the monitoring interval, so 60s seems like
too long. Lower this to 15 seconds by default instead.
2023-12-01 17:38:28 -05:00
Joshua Boniface 9c2b1b29ee Add node health to fault states
Adjusts ordering and ensures that node health states are included in
faults if they are less than 50%.

Also adjusts fault ID generation and runs fault checks only coordinator
nodes to avoid too many runs.
2023-12-01 17:38:28 -05:00
Joshua Boniface 8594eb697f Add initial fault generation in pvchealthd
References: #164
2023-12-01 17:38:27 -05:00
Joshua Boniface 988de1218f Bump version to 0.9.83 2023-12-01 17:37:42 -05:00
Joshua Boniface 0ffcbf3152 Fix bad file paths 2023-12-01 17:25:12 -05:00
Joshua Boniface ad8d8cf7a7 Avoid removing changelog file until the end
Avoids losing a changelog if something else fails.
2023-12-01 17:23:43 -05:00
Joshua Boniface 915a84ee3c Fix psql check for new configs 2023-12-01 03:58:21 -05:00
Joshua Boniface 6315a068d1 Use SafeLoader for config load 2023-12-01 02:01:24 -05:00
Joshua Boniface 2afd064445 Update CLI to read from pvc.conf 2023-12-01 01:53:33 -05:00
Joshua Boniface 7cb9ebae6b Remove legacy configuration handler
This is not going to be needed.
2023-12-01 01:25:40 -05:00
Joshua Boniface 1fb0463dea Adjust daemon service startup
Add healthd, adjust workerd, lower waittime
2023-11-30 03:28:02 -05:00
Joshua Boniface 13549fc995 Depend pvcnoded on pvcworkerd 2023-11-30 03:24:01 -05:00
Joshua Boniface 102c3c3106 Port all Celery worker functions to discrete pkg
Moves all tasks run by the Celery worker into a discrete package/module
for easier installation. Also adjusts several parameters throughout to
accomplish this.
2023-11-30 02:24:54 -05:00
Joshua Boniface 0c0fb65c62 Rework Flask API to route Celery tasks manually
Avoids needing to define any of these tasks here; they can all be
defined in the pvcworkerd code.
2023-11-30 00:40:09 -05:00
Joshua Boniface 03a738f878 Move config parser into daemon_lib
And reformat/add config values for API.
2023-11-30 00:05:37 -05:00
Joshua Boniface 4df5fdbca6 Update description of example conf 2023-11-29 21:21:51 -05:00
Joshua Boniface 97eb63ebab Clean up config naming and dead files 2023-11-29 21:21:51 -05:00
Joshua Boniface 4a2eba0961 Improve node output messages (from pvchealthd)
1. Output startup "list" entries in cyan with s state
2. Add start of keepalive run message
2023-11-29 21:21:51 -05:00
Joshua Boniface 077dd8708f Add check start message 2023-11-29 21:21:51 -05:00
Joshua Boniface b6b5786c3b Output list in cyan (s state) 2023-11-29 21:21:51 -05:00
Joshua Boniface ad738dec40 Clean up plugin pycache too 2023-11-29 21:21:51 -05:00
Joshua Boniface d2b764a2c7 Output more details on startup 2023-11-29 21:21:51 -05:00