39f9f3640c
Rename health metrics and add resource metrics
2023-12-21 09:40:49 -05:00
0a93f526e0
Bump version to 0.9.86
2023-12-14 14:46:29 -05:00
38e43b46c3
Update health detail messages format
2023-12-13 03:17:47 -05:00
0f24184b78
Explicitly clear resources of fenced node
...
This actually solves the bug originally "fixed" in
5f1432ccdd38996dac0f528035634cbc82827abd without breaking VM resource
allocations for working nodes.
2023-12-11 12:14:56 -05:00
1ba37fe33d
Restore VM resource allocation location
...
Commit 5f1432ccdd38996dac0f528035634cbc82827abd changed where these
happen due to a bug after fencing. However this completely broke node
resource reporting as only the final instance will be queried here.
Revert this change and look further into the original bug.
2023-12-11 11:52:59 -05:00
1a05077b10
Fix missing fstring
2023-12-11 11:29:49 -05:00
9617660342
Update Prometheus Grafana dashboard
2023-12-11 00:23:08 -05:00
9dc5097dbc
Bump version to 0.9.85
2023-12-10 01:00:33 -05:00
53d632f283
Fix bug in example PVC Grafana dashboard
2023-12-10 00:50:05 -05:00
7bc0760b78
Add time to "starting keepalive" message
...
Matches the pvchealthd output and provides a useful message detail to
this otherwise contextless message.
2023-12-10 00:40:32 -05:00
9aee2a9075
Bump version to 0.9.84
2023-12-09 23:05:40 -05:00
1f6347d24b
Add Prometheus monitoring examples
2023-12-09 17:42:51 -05:00
988de1218f
Bump version to 0.9.83
2023-12-01 17:37:42 -05:00
1fb0463dea
Adjust daemon service startup
...
Add healthd, adjust workerd, lower waittime
2023-11-30 03:28:02 -05:00
03a738f878
Move config parser into daemon_lib
...
And reformat/add config values for API.
2023-11-30 00:05:37 -05:00
4a2eba0961
Improve node output messages (from pvchealthd)
...
1. Output startup "list" entries in cyan with s state
2. Add start of keepalive run message
2023-11-29 21:21:51 -05:00
647cba3cf5
Expand startup width for new daemon name
2023-11-29 21:21:51 -05:00
41f4e4fb2f
Split health monitoring into discrete daemon/pkg
2023-11-29 21:21:51 -05:00
83ceb41138
Add daemon name to Logger entries
2023-11-29 15:18:37 -05:00
2545a7b744
Allow similar for IPMI hostnames
2023-11-28 16:09:01 -05:00
ce907ff26a
Allow specifying static IPs instead of a file
2023-11-28 15:28:31 -05:00
71e589e461
Remove superflous debug output
...
This is printed in the startup logo block anyways.
2023-11-27 13:46:30 -05:00
fc3d292081
Add missing subdirectory configs
2023-11-27 13:40:07 -05:00
eab1ae873b
Ensure upstream_gateway key will exist
2023-11-27 13:37:57 -05:00
eaf93cdf96
Readd missing subsystem configurations
2023-11-27 13:33:41 -05:00
c8f4cbb39e
Fix node entry keys
2023-11-27 13:24:01 -05:00
786fae7769
Improve logo output
2023-11-27 13:01:43 -05:00
bcc57638a9
Refactor pvcnoded to use new configuration
2023-11-26 15:41:25 -05:00
2666e0603e
Update dnsmasq script to use new config file
2023-11-26 14:18:13 -05:00
dab7396196
Move to unified pvc.conf configuration file
2023-11-26 14:16:21 -05:00
460a2dd09f
Bump version to 0.9.82
2023-11-25 15:38:50 -05:00
3e001b08b6
Bump version to 0.9.81
2023-11-17 01:29:41 -05:00
e818df5dae
Use enable/disable --now instead of two commands
...
Avoids needing two calls here especially for the stop.
2023-11-16 02:40:35 -05:00
c76a5afd04
Avoid waits during node secondary
...
Waiting for the daemons to stop took too much time on some nodes and
could throw off the lockstep. Instead, leverage background=True to run
the systemctl os_commands in the background (when they complete is
irrelevant), stop the Metadata API first, and don't delay during its
stop at all.
2023-11-16 02:34:12 -05:00
18e43a9377
Adjust name in worker log output
2023-11-16 02:25:14 -05:00
aef38639cf
Rename pvcapid-worker to pvcworkerd
2023-11-15 20:31:39 -05:00
5f1432ccdd
Fix memory allocation updates and add more debug
...
Previously, we were assigning memalloc/memprov/vcpualloc during an
earlier phase using the main d_domain list. I'm not sure exactly why,
but this was throwing off stats after a fence. Instead, set these values
later on while parsing the actually-active VMs.
2023-11-10 10:29:32 -05:00
d6b8808448
Clean up fencing handler
...
1. Remove all format strings in favour of f-strings
2. Ensure all logger messages have a prefix
3. Add a few more logger messages for clarity
2023-11-10 10:09:54 -05:00
83c4c6633d
Readd RBD lock detection and clearing on startup
...
This is still needed due to the nature of the locks and freeing them on
startup, and to preserve lock=fail behaviour on VM startup.
Also fixes the fencing lock flush to directly use the client library
outside of Celery. I don't like this hack but it seems prudent until we
move fencing to the workers as well.
2023-11-10 01:33:48 -05:00
2a9bc632fa
Add node monitoring plugin for KeyDB/Redis
2023-11-10 00:56:46 -05:00
08411708f6
Clean up dangling references to cmd pipes
...
Also removes the schema references for these CMD pipes as they are no
longer required.
2023-11-09 23:28:14 -05:00
ce17c60a20
Port OSD on-node tasks to Celery worker system
...
Adds Celery versions of the osd_add, osd_replace, osd_refresh,
osd_remove, and osd_db_vg_add functions.
2023-11-09 23:28:08 -05:00
89681d54b9
Port VM on-node tasks to Celery worker system
...
Adds Celery versions of the flush_locks, device_attach, and
device_detach functions.
2023-11-06 20:40:46 -05:00
f0c2e9d295
Don't start pvcapid-worker on primary
...
It will be running anyways
2023-11-05 19:44:00 -05:00
2c15036f86
Add KeyDB to node startup services
...
Also ensure API worker starts on all nodes, not just coordinators.
2023-11-05 19:26:38 -05:00
30d7e49401
Start API worker with node daemon on coordinators
2023-11-04 13:08:16 -04:00
7490f13b7c
Check for partition tables on new devices
2023-11-04 03:13:58 -04:00
e32054be81
Refactor refresh as well
2023-11-04 02:44:52 -04:00
b3d13fe9be
Add log message for zap
2023-11-04 01:02:51 -04:00
48b2ccbd95
Add timeout for safe-to-destroy
...
Continuously take the OSD down and out while doing so.
2023-11-04 00:55:05 -04:00