Adds a new physical network interface stats parser to the node
keepalives, and leverages this information to provide a network
utilization overview in the Prometheus metrics.
Moves all tasks run by the Celery worker into a discrete package/module
for easier installation. Also adjusts several parameters throughout to
accomplish this.
For whatever reason, a Celery worker on <5.2.x was not picking these up.
Move them back to the root of the module so they are properly picked up
on these older versions but still prevents calling the routing functions
during an API doc generation.
Avoids calling unworkable functions when generating API docs etc. by
isolating them into a Celery startup function called by Daemon.py.
Also update to Celery 4+ settings format.
Full UUIDs were obnoxiously long, so switch to using just the first
8-character section of a UUID instead. Keeps the list nice and short,
makes them easier to copy, and is just generally nicer.
Could this cause uniqueness problems? Perhaps, but I don't see that
happening nearly frequently enough to matter.
Move the create_vm and run_benchmark tasks to use the new Celery
subsystem, handlers, and wait command. Remove the obsolete, dedicated
API endpoints.
Standardize the CLI client and move the repeated handler code into a
separate common function.
By default, tasks will continue to run as they did, on the primary
coordinator's task runner. However this opens the possibility for
defining more tasks that will run on other nodes or coordinators.
Redis did not provide a distributed solution for the worker, which
precluded several important planned functions. So instead, move to using
Zookeeper + PostgreSQL as the broker and result backend respectively.
Should be a seamless drop-in change but for future uses requires the
database host to be the primary coordinator IP rather than localhost, so
that writes can occur to the database from non-primary hosts.
1. Simplify this by leveraging the existing remove_osd/add_osd
functions, since its task was functionally identical to those two in
sequential order.
2. Add support for split OSDs within the command (replacing all OSDs on
the block device(s) as required).
3. Add additional configurability and flexibility around the old device,
weight, and external DB LVs.
Allows creating multiple OSDs on a single (NVMe) block device,
leveraging the "ceph-volume lvm batch" command. Replaces the previous
method of creating OSDs.
Also adds a new ZK item for each OSD indicating if it is split or not.
Adds a new API query parameter to define the file size, which is then
used for the temporary image. This is required for, at least VMDK, files
to work properly in qemu-img convert.
It didn't make any sense to me for mem(prov) to be the default selector,
since this has too many caveats versus mem(free). Switch to using
mem(free) as the default (i.e. "mem") and make memprov the alternative.
1. Ensure that system_template and script are not nullable in the DB.
2. Ensure that the CLI and API enforce the above and clean up CLI
arguments for profile add.
3. Ensure that, before uploading OVAs, a 'default_ova' provisioning
script is present.
4. Use the 'default_ova' script for new OVA uploads.
5. Ensure that OVA details are properly added to the vm_data dict in the
provisioner vmbuilder.
1. Add documentation on the node selector flags. In the API, reference
the daemon configuration manual which now includes details in this
section; in the CLI, provide the help in "pvc vm define" in detail and
then reference that command's help in the other commands that use this
field.
2. Ensure the naming is consistent in the CLI, using the flag name
"--node-selector" everywhere (was "--selector" for "pvc vm" commands and
"--node-selector" for "pvc provisioner" commands).
Adds commands to both replace an OSD disk, and refresh (reimport) an
existing OSD disk on a new node. This handles the cases where an OSD
disk should be replaced (either due to upgrades or failures) or where a
node is rebuilt in-place and an existing OSD must be re-imported to it.
This should avoid the need to do a full remove/add sequence for either
case.
Also cleans up some aspects of OSD removal that are identical between
methods (e.g. using safe-to-destroy and sleeping after stopping) and
fixes a bug if an OSD does not truly exist when the daemon starts up.