Commit Graph

3295 Commits

Author SHA1 Message Date
Joshua Boniface 6c0dfe16cf Improve word splitting for fault messages
This ensures that fault messages are split on word boundaries and that
the column length is equal to the longest of these if applicable.
2023-12-06 17:10:19 -05:00
Joshua Boniface 3fde494fc5 Add status back to short fault list 2023-12-06 16:53:23 -05:00
Joshua Boniface 0945b3faf3 Use same fault formatting for short and long 2023-12-06 16:19:44 -05:00
Joshua Boniface 1416f9edc0 Remove bad sort values 2023-12-06 14:38:29 -05:00
Joshua Boniface 5691f75ac9 Fix bad import 2023-12-06 14:28:32 -05:00
Joshua Boniface e0bf7f7d1a Fix bad ID values in acknowledge 2023-12-06 14:18:31 -05:00
Joshua Boniface 0c34c88a1f Fix bad dict key name 2023-12-06 14:16:19 -05:00
Joshua Boniface 20acf3295f Add mass ack/delete of faults 2023-12-06 13:59:39 -05:00
Joshua Boniface 4a02c2c8e3 Add additional faults 2023-12-06 13:27:39 -05:00
Joshua Boniface 6fc5c927a1 Properly sort status faults 2023-12-06 13:27:18 -05:00
Joshua Boniface d1e34e7333 Store fault times only to the second
Any more precision is unnecessary and saves 6 chars when displaying
these times elsewhere.
2023-12-06 13:20:18 -05:00
Joshua Boniface 79eb54d5da Move fault generation to common library 2023-12-06 13:17:10 -05:00
Joshua Boniface 536fb2080f Fix get_terminal_size over SSH 2023-12-06 13:11:28 -05:00
Joshua Boniface 2267a9c85d Improve output formatting for simplicity 2023-12-05 10:37:35 -05:00
Joshua Boniface 067e73337f Shorten health IDs to 8 characters 2023-12-04 15:48:27 -05:00
Joshua Boniface 672e58133f Implement interfaces to faults 2023-12-04 01:37:54 -05:00
Joshua Boniface b59f743690 Improve logging and handling of fault entries 2023-12-01 17:38:28 -05:00
Joshua Boniface 4c3f235e05 Avoid running fault updates in maintenance mode
When the cluster is in maintenance mode, all faults should be ignored.
2023-12-01 17:38:28 -05:00
Joshua Boniface 3dc48c1783 Lower default monitoring interval to 15s
Faults are also reported on the monitoring interval, so 60s seems like
too long. Lower this to 15 seconds by default instead.
2023-12-01 17:38:28 -05:00
Joshua Boniface 9c2b1b29ee Add node health to fault states
Adjusts ordering and ensures that node health states are included in
faults if they are less than 50%.

Also adjusts fault ID generation and runs fault checks only coordinator
nodes to avoid too many runs.
2023-12-01 17:38:28 -05:00
Joshua Boniface 8594eb697f Add initial fault generation in pvchealthd
References: #164
2023-12-01 17:38:27 -05:00
Joshua Boniface 988de1218f Bump version to 0.9.83 2023-12-01 17:37:42 -05:00
Joshua Boniface 0ffcbf3152 Fix bad file paths 2023-12-01 17:25:12 -05:00
Joshua Boniface ad8d8cf7a7 Avoid removing changelog file until the end
Avoids losing a changelog if something else fails.
2023-12-01 17:23:43 -05:00
Joshua Boniface 915a84ee3c Fix psql check for new configs 2023-12-01 03:58:21 -05:00
Joshua Boniface 6315a068d1 Use SafeLoader for config load 2023-12-01 02:01:24 -05:00
Joshua Boniface 2afd064445 Update CLI to read from pvc.conf 2023-12-01 01:53:33 -05:00
Joshua Boniface 7cb9ebae6b Remove legacy configuration handler
This is not going to be needed.
2023-12-01 01:25:40 -05:00
Joshua Boniface 1fb0463dea Adjust daemon service startup
Add healthd, adjust workerd, lower waittime
2023-11-30 03:28:02 -05:00
Joshua Boniface 13549fc995 Depend pvcnoded on pvcworkerd 2023-11-30 03:24:01 -05:00
Joshua Boniface 102c3c3106 Port all Celery worker functions to discrete pkg
Moves all tasks run by the Celery worker into a discrete package/module
for easier installation. Also adjusts several parameters throughout to
accomplish this.
2023-11-30 02:24:54 -05:00
Joshua Boniface 0c0fb65c62 Rework Flask API to route Celery tasks manually
Avoids needing to define any of these tasks here; they can all be
defined in the pvcworkerd code.
2023-11-30 00:40:09 -05:00
Joshua Boniface 03a738f878 Move config parser into daemon_lib
And reformat/add config values for API.
2023-11-30 00:05:37 -05:00
Joshua Boniface 4df5fdbca6 Update description of example conf 2023-11-29 21:21:51 -05:00
Joshua Boniface 97eb63ebab Clean up config naming and dead files 2023-11-29 21:21:51 -05:00
Joshua Boniface 4a2eba0961 Improve node output messages (from pvchealthd)
1. Output startup "list" entries in cyan with s state
2. Add start of keepalive run message
2023-11-29 21:21:51 -05:00
Joshua Boniface 077dd8708f Add check start message 2023-11-29 21:21:51 -05:00
Joshua Boniface b6b5786c3b Output list in cyan (s state) 2023-11-29 21:21:51 -05:00
Joshua Boniface ad738dec40 Clean up plugin pycache too 2023-11-29 21:21:51 -05:00
Joshua Boniface d2b764a2c7 Output more details on startup 2023-11-29 21:21:51 -05:00
Joshua Boniface b8aecd9c83 Wait less time between restarts 2023-11-29 21:21:51 -05:00
Joshua Boniface 11db3c5b20 Fix ordering during termination 2023-11-29 21:21:51 -05:00
Joshua Boniface 7a7c975eff Ensure return from health shutdown 2023-11-29 21:21:51 -05:00
Joshua Boniface fa12a3c9b1 Permit buffered log appending 2023-11-29 21:21:51 -05:00
Joshua Boniface 787f4216b3 Expand Zookeeper log daemon prefix to match 2023-11-29 21:21:51 -05:00
Joshua Boniface 647cba3cf5 Expand startup width for new daemon name 2023-11-29 21:21:51 -05:00
Joshua Boniface 921ecb3a05 Fix name in kydb plugin 2023-11-29 21:21:51 -05:00
Joshua Boniface 6a68cf665b Wait between service restarts 2023-11-29 21:21:51 -05:00
Joshua Boniface 41f4e4fb2f Split health monitoring into discrete daemon/pkg 2023-11-29 21:21:51 -05:00
Joshua Boniface 74a416165d Move default autobackup config to pvc.conf 2023-11-29 21:21:37 -05:00