Joshua Boniface
494c20263d
Move monitoring folder to top level
2023-12-27 11:37:49 -05:00
Joshua Boniface
3e4cc53fdd
Add node network statistics and utilization values
...
Adds a new physical network interface stats parser to the node
keepalives, and leverages this information to provide a network
utilization overview in the Prometheus metrics.
2023-12-21 15:45:01 -05:00
Joshua Boniface
39f9f3640c
Rename health metrics and add resource metrics
2023-12-21 09:40:49 -05:00
Joshua Boniface
0a93f526e0
Bump version to 0.9.86
2023-12-14 14:46:29 -05:00
Joshua Boniface
38e43b46c3
Update health detail messages format
2023-12-13 03:17:47 -05:00
Joshua Boniface
0f24184b78
Explicitly clear resources of fenced node
...
This actually solves the bug originally "fixed" in
5f1432ccdd
without breaking VM resource
allocations for working nodes.
2023-12-11 12:14:56 -05:00
Joshua Boniface
1ba37fe33d
Restore VM resource allocation location
...
Commit 5f1432ccdd
changed where these
happen due to a bug after fencing. However this completely broke node
resource reporting as only the final instance will be queried here.
Revert this change and look further into the original bug.
2023-12-11 11:52:59 -05:00
Joshua Boniface
1a05077b10
Fix missing fstring
2023-12-11 11:29:49 -05:00
Joshua Boniface
9617660342
Update Prometheus Grafana dashboard
2023-12-11 00:23:08 -05:00
Joshua Boniface
9dc5097dbc
Bump version to 0.9.85
2023-12-10 01:00:33 -05:00
Joshua Boniface
53d632f283
Fix bug in example PVC Grafana dashboard
2023-12-10 00:50:05 -05:00
Joshua Boniface
7bc0760b78
Add time to "starting keepalive" message
...
Matches the pvchealthd output and provides a useful message detail to
this otherwise contextless message.
2023-12-10 00:40:32 -05:00
Joshua Boniface
9aee2a9075
Bump version to 0.9.84
2023-12-09 23:05:40 -05:00
Joshua Boniface
1f6347d24b
Add Prometheus monitoring examples
2023-12-09 17:42:51 -05:00
Joshua Boniface
988de1218f
Bump version to 0.9.83
2023-12-01 17:37:42 -05:00
Joshua Boniface
1fb0463dea
Adjust daemon service startup
...
Add healthd, adjust workerd, lower waittime
2023-11-30 03:28:02 -05:00
Joshua Boniface
03a738f878
Move config parser into daemon_lib
...
And reformat/add config values for API.
2023-11-30 00:05:37 -05:00
Joshua Boniface
4a2eba0961
Improve node output messages (from pvchealthd)
...
1. Output startup "list" entries in cyan with s state
2. Add start of keepalive run message
2023-11-29 21:21:51 -05:00
Joshua Boniface
647cba3cf5
Expand startup width for new daemon name
2023-11-29 21:21:51 -05:00
Joshua Boniface
41f4e4fb2f
Split health monitoring into discrete daemon/pkg
2023-11-29 21:21:51 -05:00
Joshua Boniface
83ceb41138
Add daemon name to Logger entries
2023-11-29 15:18:37 -05:00
Joshua Boniface
2545a7b744
Allow similar for IPMI hostnames
2023-11-28 16:09:01 -05:00
Joshua Boniface
ce907ff26a
Allow specifying static IPs instead of a file
2023-11-28 15:28:31 -05:00
Joshua Boniface
71e589e461
Remove superflous debug output
...
This is printed in the startup logo block anyways.
2023-11-27 13:46:30 -05:00
Joshua Boniface
fc3d292081
Add missing subdirectory configs
2023-11-27 13:40:07 -05:00
Joshua Boniface
eab1ae873b
Ensure upstream_gateway key will exist
2023-11-27 13:37:57 -05:00
Joshua Boniface
eaf93cdf96
Readd missing subsystem configurations
2023-11-27 13:33:41 -05:00
Joshua Boniface
c8f4cbb39e
Fix node entry keys
2023-11-27 13:24:01 -05:00
Joshua Boniface
786fae7769
Improve logo output
2023-11-27 13:01:43 -05:00
Joshua Boniface
bcc57638a9
Refactor pvcnoded to use new configuration
2023-11-26 15:41:25 -05:00
Joshua Boniface
2666e0603e
Update dnsmasq script to use new config file
2023-11-26 14:18:13 -05:00
Joshua Boniface
dab7396196
Move to unified pvc.conf configuration file
2023-11-26 14:16:21 -05:00
Joshua Boniface
460a2dd09f
Bump version to 0.9.82
2023-11-25 15:38:50 -05:00
Joshua Boniface
3e001b08b6
Bump version to 0.9.81
2023-11-17 01:29:41 -05:00
Joshua Boniface
e818df5dae
Use enable/disable --now instead of two commands
...
Avoids needing two calls here especially for the stop.
2023-11-16 02:40:35 -05:00
Joshua Boniface
c76a5afd04
Avoid waits during node secondary
...
Waiting for the daemons to stop took too much time on some nodes and
could throw off the lockstep. Instead, leverage background=True to run
the systemctl os_commands in the background (when they complete is
irrelevant), stop the Metadata API first, and don't delay during its
stop at all.
2023-11-16 02:34:12 -05:00
Joshua Boniface
18e43a9377
Adjust name in worker log output
2023-11-16 02:25:14 -05:00
Joshua Boniface
aef38639cf
Rename pvcapid-worker to pvcworkerd
2023-11-15 20:31:39 -05:00
Joshua Boniface
5f1432ccdd
Fix memory allocation updates and add more debug
...
Previously, we were assigning memalloc/memprov/vcpualloc during an
earlier phase using the main d_domain list. I'm not sure exactly why,
but this was throwing off stats after a fence. Instead, set these values
later on while parsing the actually-active VMs.
2023-11-10 10:29:32 -05:00
Joshua Boniface
d6b8808448
Clean up fencing handler
...
1. Remove all format strings in favour of f-strings
2. Ensure all logger messages have a prefix
3. Add a few more logger messages for clarity
2023-11-10 10:09:54 -05:00
Joshua Boniface
83c4c6633d
Readd RBD lock detection and clearing on startup
...
This is still needed due to the nature of the locks and freeing them on
startup, and to preserve lock=fail behaviour on VM startup.
Also fixes the fencing lock flush to directly use the client library
outside of Celery. I don't like this hack but it seems prudent until we
move fencing to the workers as well.
2023-11-10 01:33:48 -05:00
Joshua Boniface
2a9bc632fa
Add node monitoring plugin for KeyDB/Redis
2023-11-10 00:56:46 -05:00
Joshua Boniface
08411708f6
Clean up dangling references to cmd pipes
...
Also removes the schema references for these CMD pipes as they are no
longer required.
2023-11-09 23:28:14 -05:00
Joshua Boniface
ce17c60a20
Port OSD on-node tasks to Celery worker system
...
Adds Celery versions of the osd_add, osd_replace, osd_refresh,
osd_remove, and osd_db_vg_add functions.
2023-11-09 23:28:08 -05:00
Joshua Boniface
89681d54b9
Port VM on-node tasks to Celery worker system
...
Adds Celery versions of the flush_locks, device_attach, and
device_detach functions.
2023-11-06 20:40:46 -05:00
Joshua Boniface
f0c2e9d295
Don't start pvcapid-worker on primary
...
It will be running anyways
2023-11-05 19:44:00 -05:00
Joshua Boniface
2c15036f86
Add KeyDB to node startup services
...
Also ensure API worker starts on all nodes, not just coordinators.
2023-11-05 19:26:38 -05:00
Joshua Boniface
30d7e49401
Start API worker with node daemon on coordinators
2023-11-04 13:08:16 -04:00
Joshua Boniface
7490f13b7c
Check for partition tables on new devices
2023-11-04 03:13:58 -04:00
Joshua Boniface
e32054be81
Refactor refresh as well
2023-11-04 02:44:52 -04:00