Commit Graph

830 Commits

Author SHA1 Message Date
Joshua Boniface 18f09196be Bump version to 0.9.93 2024-01-30 09:51:21 -05:00
Joshua Boniface df40b779af Bump version to 0.9.92 2024-01-29 09:39:10 -05:00
Joshua Boniface f29b4c2755 Bump version to 0.9.91 2024-01-23 10:40:59 -05:00
Joshua Boniface 86ca363697 Bump version to 0.9.90 2024-01-11 10:22:48 -05:00
Joshua Boniface a5763c9d25 Fix possible race condition applying schemas
Found an instance where two of these fired too close together, and
caused a fatal error. Use a write lock, and then catch the schema.apply
function in case it fails anyways.
2024-01-11 10:21:01 -05:00
Joshua Boniface 09269f182c Add live migrate max downtime selector meta field
Adds a new flag to VM metadata to allow setting the VM live migration
max downtime. This will enable very busy VMs that hang live migration to
have this value changed.
2024-01-11 00:05:50 -05:00
Joshua Boniface e9b6072fa0 Bump version to 0.9.89 2024-01-09 12:15:53 -05:00
Joshua Boniface 1d480f5629 Bump version to 0.9.88 2023-12-29 14:56:33 -05:00
Joshua Boniface 123c7ce857 Update copyright header on all files for 2024
Last release of 2023 is probably the best time to do this.
2023-12-29 11:16:59 -05:00
Joshua Boniface 8083b7a3e6 Bump version to 0.9.87 2023-12-27 13:40:51 -05:00
Joshua Boniface e654fbba08 Move debug condition handling to Logger
Avoids many dozens of conditionals sprinkled throughout the code by
centralizing this check into the main Logger instance.
2023-12-27 13:01:45 -05:00
Joshua Boniface 494c20263d Move monitoring folder to top level 2023-12-27 11:37:49 -05:00
Joshua Boniface 3e4cc53fdd Add node network statistics and utilization values
Adds a new physical network interface stats parser to the node
keepalives, and leverages this information to provide a network
utilization overview in the Prometheus metrics.
2023-12-21 15:45:01 -05:00
Joshua Boniface 39f9f3640c Rename health metrics and add resource metrics 2023-12-21 09:40:49 -05:00
Joshua Boniface 0a93f526e0 Bump version to 0.9.86 2023-12-14 14:46:29 -05:00
Joshua Boniface 38e43b46c3 Update health detail messages format 2023-12-13 03:17:47 -05:00
Joshua Boniface 0f24184b78 Explicitly clear resources of fenced node
This actually solves the bug originally "fixed" in
5f1432ccdd without breaking VM resource
allocations for working nodes.
2023-12-11 12:14:56 -05:00
Joshua Boniface 1ba37fe33d Restore VM resource allocation location
Commit 5f1432ccdd changed where these
happen due to a bug after fencing. However this completely broke node
resource reporting as only the final instance will be queried here.

Revert this change and look further into the original bug.
2023-12-11 11:52:59 -05:00
Joshua Boniface 1a05077b10 Fix missing fstring 2023-12-11 11:29:49 -05:00
Joshua Boniface 9617660342 Update Prometheus Grafana dashboard 2023-12-11 00:23:08 -05:00
Joshua Boniface 9dc5097dbc Bump version to 0.9.85 2023-12-10 01:00:33 -05:00
Joshua Boniface 53d632f283 Fix bug in example PVC Grafana dashboard 2023-12-10 00:50:05 -05:00
Joshua Boniface 7bc0760b78 Add time to "starting keepalive" message
Matches the pvchealthd output and provides a useful message detail to
this otherwise contextless message.
2023-12-10 00:40:32 -05:00
Joshua Boniface 9aee2a9075 Bump version to 0.9.84 2023-12-09 23:05:40 -05:00
Joshua Boniface 1f6347d24b Add Prometheus monitoring examples 2023-12-09 17:42:51 -05:00
Joshua Boniface 988de1218f Bump version to 0.9.83 2023-12-01 17:37:42 -05:00
Joshua Boniface 1fb0463dea Adjust daemon service startup
Add healthd, adjust workerd, lower waittime
2023-11-30 03:28:02 -05:00
Joshua Boniface 03a738f878 Move config parser into daemon_lib
And reformat/add config values for API.
2023-11-30 00:05:37 -05:00
Joshua Boniface 4a2eba0961 Improve node output messages (from pvchealthd)
1. Output startup "list" entries in cyan with s state
2. Add start of keepalive run message
2023-11-29 21:21:51 -05:00
Joshua Boniface 647cba3cf5 Expand startup width for new daemon name 2023-11-29 21:21:51 -05:00
Joshua Boniface 41f4e4fb2f Split health monitoring into discrete daemon/pkg 2023-11-29 21:21:51 -05:00
Joshua Boniface 83ceb41138 Add daemon name to Logger entries 2023-11-29 15:18:37 -05:00
Joshua Boniface 2545a7b744 Allow similar for IPMI hostnames 2023-11-28 16:09:01 -05:00
Joshua Boniface ce907ff26a Allow specifying static IPs instead of a file 2023-11-28 15:28:31 -05:00
Joshua Boniface 71e589e461 Remove superflous debug output
This is printed in the startup logo block anyways.
2023-11-27 13:46:30 -05:00
Joshua Boniface fc3d292081 Add missing subdirectory configs 2023-11-27 13:40:07 -05:00
Joshua Boniface eab1ae873b Ensure upstream_gateway key will exist 2023-11-27 13:37:57 -05:00
Joshua Boniface eaf93cdf96 Readd missing subsystem configurations 2023-11-27 13:33:41 -05:00
Joshua Boniface c8f4cbb39e Fix node entry keys 2023-11-27 13:24:01 -05:00
Joshua Boniface 786fae7769 Improve logo output 2023-11-27 13:01:43 -05:00
Joshua Boniface bcc57638a9 Refactor pvcnoded to use new configuration 2023-11-26 15:41:25 -05:00
Joshua Boniface 2666e0603e Update dnsmasq script to use new config file 2023-11-26 14:18:13 -05:00
Joshua Boniface dab7396196 Move to unified pvc.conf configuration file 2023-11-26 14:16:21 -05:00
Joshua Boniface 460a2dd09f Bump version to 0.9.82 2023-11-25 15:38:50 -05:00
Joshua Boniface 3e001b08b6 Bump version to 0.9.81 2023-11-17 01:29:41 -05:00
Joshua Boniface e818df5dae Use enable/disable --now instead of two commands
Avoids needing two calls here especially for the stop.
2023-11-16 02:40:35 -05:00
Joshua Boniface c76a5afd04 Avoid waits during node secondary
Waiting for the daemons to stop took too much time on some nodes and
could throw off the lockstep. Instead, leverage background=True to run
the systemctl os_commands in the background (when they complete is
irrelevant), stop the Metadata API first, and don't delay during its
stop at all.
2023-11-16 02:34:12 -05:00
Joshua Boniface 18e43a9377 Adjust name in worker log output 2023-11-16 02:25:14 -05:00
Joshua Boniface aef38639cf Rename pvcapid-worker to pvcworkerd 2023-11-15 20:31:39 -05:00
Joshua Boniface 5f1432ccdd Fix memory allocation updates and add more debug
Previously, we were assigning memalloc/memprov/vcpualloc during an
earlier phase using the main d_domain list. I'm not sure exactly why,
but this was throwing off stats after a fence. Instead, set these values
later on while parsing the actually-active VMs.
2023-11-10 10:29:32 -05:00