Commit Graph

815 Commits

Author SHA1 Message Date
Joshua Boniface 38e43b46c3 Update health detail messages format 2023-12-13 03:17:47 -05:00
Joshua Boniface 0f24184b78 Explicitly clear resources of fenced node
This actually solves the bug originally "fixed" in
5f1432ccdd without breaking VM resource
allocations for working nodes.
2023-12-11 12:14:56 -05:00
Joshua Boniface 1ba37fe33d Restore VM resource allocation location
Commit 5f1432ccdd changed where these
happen due to a bug after fencing. However this completely broke node
resource reporting as only the final instance will be queried here.

Revert this change and look further into the original bug.
2023-12-11 11:52:59 -05:00
Joshua Boniface 1a05077b10 Fix missing fstring 2023-12-11 11:29:49 -05:00
Joshua Boniface 9617660342 Update Prometheus Grafana dashboard 2023-12-11 00:23:08 -05:00
Joshua Boniface 9dc5097dbc Bump version to 0.9.85 2023-12-10 01:00:33 -05:00
Joshua Boniface 53d632f283 Fix bug in example PVC Grafana dashboard 2023-12-10 00:50:05 -05:00
Joshua Boniface 7bc0760b78 Add time to "starting keepalive" message
Matches the pvchealthd output and provides a useful message detail to
this otherwise contextless message.
2023-12-10 00:40:32 -05:00
Joshua Boniface 9aee2a9075 Bump version to 0.9.84 2023-12-09 23:05:40 -05:00
Joshua Boniface 1f6347d24b Add Prometheus monitoring examples 2023-12-09 17:42:51 -05:00
Joshua Boniface 988de1218f Bump version to 0.9.83 2023-12-01 17:37:42 -05:00
Joshua Boniface 1fb0463dea Adjust daemon service startup
Add healthd, adjust workerd, lower waittime
2023-11-30 03:28:02 -05:00
Joshua Boniface 03a738f878 Move config parser into daemon_lib
And reformat/add config values for API.
2023-11-30 00:05:37 -05:00
Joshua Boniface 4a2eba0961 Improve node output messages (from pvchealthd)
1. Output startup "list" entries in cyan with s state
2. Add start of keepalive run message
2023-11-29 21:21:51 -05:00
Joshua Boniface 647cba3cf5 Expand startup width for new daemon name 2023-11-29 21:21:51 -05:00
Joshua Boniface 41f4e4fb2f Split health monitoring into discrete daemon/pkg 2023-11-29 21:21:51 -05:00
Joshua Boniface 83ceb41138 Add daemon name to Logger entries 2023-11-29 15:18:37 -05:00
Joshua Boniface 2545a7b744 Allow similar for IPMI hostnames 2023-11-28 16:09:01 -05:00
Joshua Boniface ce907ff26a Allow specifying static IPs instead of a file 2023-11-28 15:28:31 -05:00
Joshua Boniface 71e589e461 Remove superflous debug output
This is printed in the startup logo block anyways.
2023-11-27 13:46:30 -05:00
Joshua Boniface fc3d292081 Add missing subdirectory configs 2023-11-27 13:40:07 -05:00
Joshua Boniface eab1ae873b Ensure upstream_gateway key will exist 2023-11-27 13:37:57 -05:00
Joshua Boniface eaf93cdf96 Readd missing subsystem configurations 2023-11-27 13:33:41 -05:00
Joshua Boniface c8f4cbb39e Fix node entry keys 2023-11-27 13:24:01 -05:00
Joshua Boniface 786fae7769 Improve logo output 2023-11-27 13:01:43 -05:00
Joshua Boniface bcc57638a9 Refactor pvcnoded to use new configuration 2023-11-26 15:41:25 -05:00
Joshua Boniface 2666e0603e Update dnsmasq script to use new config file 2023-11-26 14:18:13 -05:00
Joshua Boniface dab7396196 Move to unified pvc.conf configuration file 2023-11-26 14:16:21 -05:00
Joshua Boniface 460a2dd09f Bump version to 0.9.82 2023-11-25 15:38:50 -05:00
Joshua Boniface 3e001b08b6 Bump version to 0.9.81 2023-11-17 01:29:41 -05:00
Joshua Boniface e818df5dae Use enable/disable --now instead of two commands
Avoids needing two calls here especially for the stop.
2023-11-16 02:40:35 -05:00
Joshua Boniface c76a5afd04 Avoid waits during node secondary
Waiting for the daemons to stop took too much time on some nodes and
could throw off the lockstep. Instead, leverage background=True to run
the systemctl os_commands in the background (when they complete is
irrelevant), stop the Metadata API first, and don't delay during its
stop at all.
2023-11-16 02:34:12 -05:00
Joshua Boniface 18e43a9377 Adjust name in worker log output 2023-11-16 02:25:14 -05:00
Joshua Boniface aef38639cf Rename pvcapid-worker to pvcworkerd 2023-11-15 20:31:39 -05:00
Joshua Boniface 5f1432ccdd Fix memory allocation updates and add more debug
Previously, we were assigning memalloc/memprov/vcpualloc during an
earlier phase using the main d_domain list. I'm not sure exactly why,
but this was throwing off stats after a fence. Instead, set these values
later on while parsing the actually-active VMs.
2023-11-10 10:29:32 -05:00
Joshua Boniface d6b8808448 Clean up fencing handler
1. Remove all format strings in favour of f-strings
2. Ensure all logger messages have a prefix
3. Add a few more logger messages for clarity
2023-11-10 10:09:54 -05:00
Joshua Boniface 83c4c6633d Readd RBD lock detection and clearing on startup
This is still needed due to the nature of the locks and freeing them on
startup, and to preserve lock=fail behaviour on VM startup.

Also fixes the fencing lock flush to directly use the client library
outside of Celery. I don't like this hack but it seems prudent until we
move fencing to the workers as well.
2023-11-10 01:33:48 -05:00
Joshua Boniface 2a9bc632fa Add node monitoring plugin for KeyDB/Redis 2023-11-10 00:56:46 -05:00
Joshua Boniface 08411708f6 Clean up dangling references to cmd pipes
Also removes the schema references for these CMD pipes as they are no
longer required.
2023-11-09 23:28:14 -05:00
Joshua Boniface ce17c60a20 Port OSD on-node tasks to Celery worker system
Adds Celery versions of the osd_add, osd_replace, osd_refresh,
osd_remove, and osd_db_vg_add functions.
2023-11-09 23:28:08 -05:00
Joshua Boniface 89681d54b9 Port VM on-node tasks to Celery worker system
Adds Celery versions of the flush_locks, device_attach, and
device_detach functions.
2023-11-06 20:40:46 -05:00
Joshua Boniface f0c2e9d295 Don't start pvcapid-worker on primary
It will be running anyways
2023-11-05 19:44:00 -05:00
Joshua Boniface 2c15036f86 Add KeyDB to node startup services
Also ensure API worker starts on all nodes, not just coordinators.
2023-11-05 19:26:38 -05:00
Joshua Boniface 30d7e49401 Start API worker with node daemon on coordinators 2023-11-04 13:08:16 -04:00
Joshua Boniface 7490f13b7c Check for partition tables on new devices 2023-11-04 03:13:58 -04:00
Joshua Boniface e32054be81 Refactor refresh as well 2023-11-04 02:44:52 -04:00
Joshua Boniface b3d13fe9be Add log message for zap 2023-11-04 01:02:51 -04:00
Joshua Boniface 48b2ccbd95 Add timeout for safe-to-destroy
Continuously take the OSD down and out while doing so.
2023-11-04 00:55:05 -04:00
Joshua Boniface 1535078842 Fix lvremove, lvcreate, and update ZK details 2023-11-04 00:30:14 -04:00
Joshua Boniface 0e45613634 Use right key with correct data 2023-11-04 00:02:00 -04:00