1. Move fence monitoring to its own thread rather than doing the listing
and triggering within the main keepalive thread.
2. Add a global lock key at /config/fence_lock and use this lock key to
prevent multiple nodes from trying to run fences simultaneously.
3. Run the fencing monitor for each node sequentially within the context
of the main fence monitoring thread, to ensure that fences of multiple
nodes happen sequentially rather than in parallel.
All of these should help to prevent any anomalies where one node can try
to fence multiple nodes at once without recourse.
Snapshot mirrors should normally be promoted using "mirror promote", and
not started manually. This adds guard rails against that to the "start",
"stop", and "disable" state commands to prevent changing mirror states
without an explicit "--force" option.
Adds two helper commands which automate sending and promoting VM
snapshots as "vm mirror" commands.
"vm mirror create" replicates the functionality of "snapshot create" and
"snapshot send", performing both in one single task using an
autogenerated dated snapshot name for automatic cross-cluster
replication.
"vm mirror promote" replicates the functionality of "vm shutdown",
"snapshot create", "snapshot send", "vm start" (remote), and,
optionally, "vm remove", performing in one single task an entire
cross-cluster VM move with or without retaining the copy on the local
cluster (if retained, the local copy becomes a snapshot mirror of the
remote, flipping their statuses).
Adds additional polled information on node cpu, memory, and network
bandwidth for the node running the test. This should provide additional
useful information about the results of the test.
Also bumps the test format to 2 to ensure clients can handle the changes
properly.
Allows an imported volume to be scanned for stats independently.
Designed to be used as part of a snapshot import via API, to allow the
"create" to happen before the real import (to check for available space,
etc.) and then run this import after when the RBD volume actually
exists.
Some newer servers do not report NVMe device paths properly using
`lsscsi` as expected. To work around this, add an `nvme`-based detect
parser that is called if the `lsscsi` parser returns a `-` (or None).
It was far too cumbersome to report every possible stage here in a
consistent way. Realistically, this command will be run silently from
cron 99.95% of the time, so all this overcomplexity to handle individual
Celery state updates just isn't worth it.
With the original system, the failure of one VM's backups would not
trigger a total fault, thus allowing other backups to complete.
Restore that behaviour.
Avoid trying to subcall other Celery worker tasks, as this just gets
very screwy with the stages. Instead reimplement what is needed directly
here. While this does cause a fair bit of code duplication, I believe
the resulting clarity is worthwhile.
Moves VM autobackups from being in-CLI to being handled by the
pvcworkerd system on the primary coordinator. Turns the CLI autobackup
command into an actual API client endpoint rather than having its logic
in the CLI.
In addition, modifies the new autobackup to leverage the new "pvc vm
snapshot" function set, just with special snapshot names. This helps
automate this within the new snapshot scaffolding.