We supported creating snapshots, but not doing anything with them. This
removes the manual task of restoring a snapshot and replace it with a
PVC abstraction of rolling back to a snapshot.
While Ceph recommends cloning a snapshot instead of rolling back, due to
the time taken, in our usecase I don't think that is an optimal
strategy, as it will leave dangling clones that we'd then have to
manage.
Closes#183
Adds a check that a volume creation or resize won't violate the 80% full
rule for the storage cluster. This ensures a cluster won't get too full
if a storage volume fills up.
Also adds a force flag throughout the pipeline to override this check,
should an administrator really want to do so.
Closes#177
How this was being done didn't work, as the backup volume was already
unmounted when we tried to read the backups from it. Instead, populate
the backup summary earlier in the run, during the actual backup.
1. Use the actual response code from the server on error, or 504 on
timeouts instead of 500.
2. Retry GET requests 3 times and only error if the last fails
Shows each subarg of the task_args as its own element, if applicable,
and fits the width to the terminal using MAX_CONTENT_WIDTH instead of an
arbitrary value.
Adds a new flag to VM metadata to allow setting the VM live migration
max downtime. This will enable very busy VMs that hang live migration to
have this value changed.
Major improvements to autobackup and backups, including additional
information/fields in the backup JSON itself, improved error handling,
and the ability to email reports of autobackups using a local sendmail
utility.
Allows an administrator to easily generate a Prometheus file service
discovery configuration via the CLI for all clusters they have
configured. Assumes that all the various connection details are correct,
and due to the limits of the file SD config does not include the scheme
or SSL verification options (as these are global in Prometheus).
Ensures that messages are fully read before each append. Adds more
Zookeeper hits, but ensures logs won't be overwritten by multiple
daemons.
Also don't use a set on the client side, to avoid "removing duplicate"
entries erroneously.
Tasks are no longer bound to the primary coordinator for state updates
due to using KeyDB and a proper shared queue and result backend, so this
warning is now obsolete and no longer required.
This would interrupt "--wait" commands on provisioner tasks, but we no
longer believe that this warrants a warning, as the affected user could
simply run "pvc cluster task" to validate or resume the watcher.
Removes the obsoleted "pvc provisioner status" command and replaces it
with a generalized "pvc cluster task" command to show all
currently-active or pending tasks on the cluster workers.
Move the create_vm and run_benchmark tasks to use the new Celery
subsystem, handlers, and wait command. Remove the obsolete, dedicated
API endpoints.
Standardize the CLI client and move the repeated handler code into a
separate common function.
1. Simplify this by leveraging the existing remove_osd/add_osd
functions, since its task was functionally identical to those two in
sequential order.
2. Add support for split OSDs within the command (replacing all OSDs on
the block device(s) as required).
3. Add additional configurability and flexibility around the old device,
weight, and external DB LVs.
Allows creating multiple OSDs on a single (NVMe) block device,
leveraging the "ceph-volume lvm batch" command. Replaces the previous
method of creating OSDs.
Also adds a new ZK item for each OSD indicating if it is split or not.
This version instead still requires --yes with --restart to avoid the
confirmation option, but avoids duplicate prompts.
This might be slightly more cumbersome, but ensures consistency: every
situation that could cause a restart is confirmed even if --restart is
given.