Commit Graph

475 Commits

Author SHA1 Message Date
Joshua Boniface 28b8b3bb44 Use proper response parsing instead of raise_for 2024-10-11 10:32:15 -04:00
Joshua Boniface a6f8500309 Improve fence handling to prevent anomalies
1. Move fence monitoring to its own thread rather than doing the listing
and triggering within the main keepalive thread.
2. Add a global lock key at /config/fence_lock and use this lock key to
prevent multiple nodes from trying to run fences simultaneously.
3. Run the fencing monitor for each node sequentially within the context
of the main fence monitoring thread, to ensure that fences of multiple
nodes happen sequentially rather than in parallel.

All of these should help to prevent any anomalies where one node can try
to fence multiple nodes at once without recourse.
2024-10-10 16:42:57 -04:00
Joshua Boniface ebec1332e9 Return to relative paths for SCHEMA_ROOT_PATH 2024-10-10 16:20:02 -04:00
Joshua Boniface 214e7f835a Properly preserve state on promotion
Ensure if the state is start, stop, or disable, that state is preserved;
if it's anything else, the remote side will be started.
2024-10-10 01:21:05 -04:00
Joshua Boniface c4763ac596 Fix invalid responses during promote 2024-10-09 01:14:19 -04:00
Joshua Boniface ea5512e3d8 Only shut down VM if it is running 2024-10-09 01:10:42 -04:00
Joshua Boniface 6d31bf439e Update error text 2024-10-09 01:00:51 -04:00
Joshua Boniface c714093a2e Ensure VM start is forced 2024-10-09 00:58:43 -04:00
Joshua Boniface 04a09b9269 Fix invalid data in state change 2024-10-09 00:55:13 -04:00
Joshua Boniface 3ede0c7d38 Name mirror snapshots like autobackup snapshots 2024-10-09 00:49:22 -04:00
Joshua Boniface ab9390fdb8 Fix another bad stage counting instance 2024-10-09 00:44:20 -04:00
Joshua Boniface 1c83584788 Set correct verbage 2024-10-09 00:38:59 -04:00
Joshua Boniface 7f3ab4e119 Fix stage counting in tasks 2024-10-09 00:37:13 -04:00
Joshua Boniface 16eb09dc22 Fix ordering bug with vm_detail 2024-10-09 00:33:00 -04:00
Joshua Boniface 7ba75adef4 Fix bug if destination is missing 2024-10-09 00:27:42 -04:00
Joshua Boniface 1d90b066bc Add guard rails against manipulating mirrors
Snapshot mirrors should normally be promoted using "mirror promote", and
not started manually. This adds guard rails against that to the "start",
"stop", and "disable" state commands to prevent changing mirror states
without an explicit "--force" option.
2024-10-08 23:51:48 -04:00
Joshua Boniface 3ea7421f09 Implement friendlier VM mirror commands
Adds two helper commands which automate sending and promoting VM
snapshots as "vm mirror" commands.

"vm mirror create" replicates the functionality of "snapshot create" and
"snapshot send", performing both in one single task using an
autogenerated dated snapshot name for automatic cross-cluster
replication.

"vm mirror promote" replicates the functionality of "vm shutdown",
"snapshot create", "snapshot send", "vm start" (remote), and,
optionally, "vm remove", performing in one single task an entire
cross-cluster VM move with or without retaining the copy on the local
cluster (if retained, the local copy becomes a snapshot mirror of the
remote, flipping their statuses).
2024-10-08 23:51:39 -04:00
Joshua Boniface 235299942a Add volume resize if changed 2024-09-30 20:51:59 -04:00
Joshua Boniface 75eac356d5 Increase send blocksize and add total speed
It's much faster and seems to cause no issues.
2024-09-30 20:11:12 -04:00
Joshua Boniface fb8561cc5d Actually fix incremental sending 2024-09-30 17:00:18 -04:00
Joshua Boniface 5f7aa0b2d6 Improve incremental send speed 2024-09-30 04:15:17 -04:00
Joshua Boniface b19642aa2e Fix bug where snapshot rollback was never called 2024-09-30 03:04:35 -04:00
Joshua Boniface 7785166a7e Finish working implementation of send/receive
Required some significant refactoring due to issues with the diff send,
but it works.
2024-09-30 02:53:23 -04:00
Joshua Boniface 34f0a2f388 Add mostly complete implementation of VM send 2024-09-29 01:31:13 -04:00
Joshua Boniface f462ebbc6b Add VM snapshot send (initial) 2024-09-28 10:49:35 -04:00
Joshua Boniface 1cbadb1172 Add "mirror" VM state 2024-09-28 02:01:56 -04:00
Joshua Boniface 0e389ba1f4 Fix bug when setting split count = 1
Would set the OSD as split in Zookeeper, even though it wasn't.
2024-09-23 13:06:05 -04:00
Joshua Boniface 41cd34ba4d Allow specifying job names for benchmarks 2024-09-18 14:55:12 -04:00
Joshua Boniface 736762901c Update benchmarks to include resource utilization
Adds additional polled information on node cpu, memory, and network
bandwidth for the node running the test. This should provide additional
useful information about the results of the test.

Also bumps the test format to 2 to ensure clients can handle the changes
properly.
2024-09-18 14:32:03 -04:00
Joshua Boniface 2de999c700 Add total cluster utilization stats
Useful for evaluating the cluster resources as a whole.
2024-09-05 16:05:33 -04:00
Joshua Boniface 7543eb839d Add dedicated volume scan endpoint
Allows an imported volume to be scanned for stats independently.

Designed to be used as part of a snapshot import via API, to allow the
"create" to happen before the real import (to check for available space,
etc.) and then run this import after when the RBD volume actually
exists.
2024-09-03 20:32:27 -04:00
Joshua Boniface 0f578d7c7d Ensure decimals are captured from size regex 2024-08-30 10:51:41 -04:00
Joshua Boniface f87b96887c Add detect string parser with nvme
Some newer servers do not report NVMe device paths properly using
`lsscsi` as expected. To work around this, add an `nvme`-based detect
parser that is called if the `lsscsi` parser returns a `-` (or None).
2024-08-30 10:41:56 -04:00
Joshua Boniface 8177d5f8b7 Use absolute path for ZK schema 2024-08-27 09:40:24 -04:00
Joshua Boniface f57b8d4a15 Simplify Celery event handling
It was far too cumbersome to report every possible stage here in a
consistent way. Realistically, this command will be run silently from
cron 99.95% of the time, so all this overcomplexity to handle individual
Celery state updates just isn't worth it.
2024-08-25 21:59:12 -04:00
Joshua Boniface e938140414 Refactor autobackups to make more sense 2024-08-25 19:21:00 -04:00
Joshua Boniface 4ef5fbdbe8 Restore previous autobackup continue behaviour
With the original system, the failure of one VM's backups would not
trigger a total fault, thus allowing other backups to complete.
Restore that behaviour.
2024-08-25 17:04:43 -04:00
Joshua Boniface f7926726f2 Adjust snapshot name again 2024-08-25 16:20:59 -04:00
Joshua Boniface a957218976 Fix staging for summary report 2024-08-25 16:11:35 -04:00
Joshua Boniface 61365e6e01 Adjust autobackup snap name and output messages 2024-08-25 16:09:52 -04:00
Joshua Boniface 35fe16ce75 Revert "Adjust stage naming to reflect autobackup stages"
This reverts commit c1f320ede2.
2024-08-25 15:58:25 -04:00
Joshua Boniface c1f320ede2 Adjust stage naming to reflect autobackup stages 2024-08-25 15:55:16 -04:00
Joshua Boniface f1668bffcc Refactor autobackups to implement vm.worker defs
Avoid trying to subcall other Celery worker tasks, as this just gets
very screwy with the stages. Instead reimplement what is needed directly
here. While this does cause a fair bit of code duplication, I believe
the resulting clarity is worthwhile.
2024-08-25 15:54:03 -04:00
Joshua Boniface c0686fc5c7 Remove stage overrides
These aren't needed after pending refactor.
2024-08-25 15:17:46 -04:00
Joshua Boniface 4b37c4fea3 Fix assignment bug 2024-08-25 14:10:59 -04:00
Joshua Boniface 0d918d66fe Port VM autobackups into pvcworkerd with snaps
Moves VM autobackups from being in-CLI to being handled by the
pvcworkerd system on the primary coordinator. Turns the CLI autobackup
command into an actual API client endpoint rather than having its logic
in the CLI.

In addition, modifies the new autobackup to leverage the new "pvc vm
snapshot" function set, just with special snapshot names. This helps
automate this within the new snapshot scaffolding.
2024-08-23 17:23:06 -04:00
Joshua Boniface f6c009beac Allow overriding stages in some commands
This allows them to be called by autobackup commands while still
preserving the current Celery report flow.
2024-08-23 11:21:02 -04:00
Joshua Boniface fc89f4f2f5 Fix error message contents 2024-08-23 10:23:51 -04:00
Joshua Boniface 565011b277 Set snapshot name before start 2024-08-20 23:01:52 -04:00
Joshua Boniface 0bf9cc6b06 Improve stage handling
Run start() at the beginning, and leverage the new tweaks to the CLI to
update the total steps later. Allows errors to be handled gracefully
2024-08-20 17:50:27 -04:00