Move the old manage script to _legacy, and add a new _flask version with
modern Flask tooling. Decide which one to call via pvc-api-db-migrate
using /etc/debian_version call.
It didn't make any sense to me for mem(prov) to be the default selector,
since this has too many caveats versus mem(free). Switch to using
mem(free) as the default (i.e. "mem") and make memprov the alternative.
1. Move the test_matrix, volume name, and size to module-level variables
so they can be accessed externally if this is imported.
2. Separate the volume creation and volume cleanup into functions.
3. Separate the individual benchmark runs into a function.
This should enable easier calling of the various subcomponents
externally, e.g. for external benchmark scripts.
Provisions Rocky Linux 8 and 9 systems, and potentially older
CentOS/Fedora/Scientific Linux/SuSE systems. Depends on a custom build
of rinse (3.7.1) with Rocky 9 support.
1. Ensure that system_template and script are not nullable in the DB.
2. Ensure that the CLI and API enforce the above and clean up CLI
arguments for profile add.
3. Ensure that, before uploading OVAs, a 'default_ova' provisioning
script is present.
4. Use the 'default_ova' script for new OVA uploads.
5. Ensure that OVA details are properly added to the vm_data dict in the
provisioner vmbuilder.
1. Add documentation on the node selector flags. In the API, reference
the daemon configuration manual which now includes details in this
section; in the CLI, provide the help in "pvc vm define" in detail and
then reference that command's help in the other commands that use this
field.
2. Ensure the naming is consistent in the CLI, using the flag name
"--node-selector" everywhere (was "--selector" for "pvc vm" commands and
"--node-selector" for "pvc provisioner" commands).
Adds commands to both replace an OSD disk, and refresh (reimport) an
existing OSD disk on a new node. This handles the cases where an OSD
disk should be replaced (either due to upgrades or failures) or where a
node is rebuilt in-place and an existing OSD must be re-imported to it.
This should avoid the need to do a full remove/add sequence for either
case.
Also cleans up some aspects of OSD removal that are identical between
methods (e.g. using safe-to-destroy and sleeping after stopping) and
fixes a bug if an OSD does not truly exist when the daemon starts up.
Allows an administrator to adjust the PG count of a given pool. This can
be used to increase the PGs (for example after adding more OSDs) or
decrease it (to remove OSDs, reduce CPU load, etc.).
Allows specifying a particular device class ("tier") for a given pool,
for instance SSD-only or NVMe-only. This is implemented with Crush
rules on the Ceph side, and via an additional new key in the pool
Zookeeper schema which is defaulted to "default".
Allows specifying blockdevs in the OSD and OSD-DB addition commands as
detect strings rather than actual block device paths. This provides
greater flexibility for automation with pvcbootstrapd (which originates
the concept of detect strings) and in general usage as well.
Instead of requiring the VM to already be stopped, instead allow disable
state changes to perform a shutdown first. Also add a force option which
will do a hard stop instead of a shutdown.
References #148
When using the "state", "node", or "tag" arguments to a VM list, add
support for a "negate" flag to look for all VMs *not in* the state,
node, or tag state.
1. Remove ramp_time as this was giving very strange results.
2. Up the runtime to 75 seconds to compensate.
3. Print the fio command to the console to validate.
1. Runs `fio` with the `--format=json` option and removes all terse
format parsing from the results.
2. Adds a 15-second ramp time to minimize wonky ramp-up results.
3. Sets group_reporting, which isn't necessary with only a single job,
but is here for consistency.
Adds a test_format database column and a value in the API return for the
test format version, starting at 0 for the existing format as of 0.9.37.
References #143
1. Move to a time-based (60s) benchmark to avoid these taking an absurd
amount of time to show the same information.
2. Eliminate the 256k random benchmarks, since they don't really add
anything.
3. Add in a 4k single-queue benchmark as this might provide valuable
insight into latency.
4. Adjust the output to reflect the above changes.
While this does change the benchmarking, this should not invalidate any
existing benchmarks since most of the test suit is unchanged (especially
the most important 4M sequential and 4K random tests). It simply removes
an unused entry and adds a more helpful one. The time-based change
should not significantly affect the results either, just reduces the
total runtime for long-tests and increase the runtime for quick tests to
provide a better picture.
The default of 0.05 (5%) is likely ideal in the initial implementation,
but allow this to be set explicitly for maximum flexibility in
space-constrained or performance-critical use-cases.
Adds in three parts:
1. Create an API endpoint to create OSD DB volume groups on a device.
Passed through to the node via the same command pipeline as
creating/removing OSDs, and creates a volume group with a fixed name
(osd-db).
2. Adds API support for specifying whether or not to use this DB volume
group when creating a new OSD via the "ext_db" flag. Naming and sizing
is fixed for simplicity and based on Ceph recommendations (5% of OSD
size). The Zookeeper schema tracks the block device to use during
removal.
3. Adds CLI support for the new and modified API endpoints, as well as
displaying the block device and DB block device in the OSD list.
While I debated supporting adding a DB device to an existing OSD, in
practice this ended up being a very complex operation involving stopping
the OSD and setting some options, so this is not supported; this can be
specified during OSD creation only.
Closes#142
Adds a new API endpoint to support hot attach/detach of devices, and the
corresponding client-side logic to use this endpoint when doing VM
network/storage add/remove actions.
The live attach is now the default behaviour for these types of
additions and removals, and can be disabled if needed.
Closes#141
Add an additional protected class, limit manipulation to one at a time,
and ensure future flexibility. Also makes display consistent with other
VM elements.
Like the other Celery job this does not work properly with the
ZKConnection decorator due to conflicting "self", so just connect
manually exactly like the provisioner task does.
Celery 5.x introduced a new worker argument format that is not
backwards-compatible with the older Celery 4.x format. This created a
conundrum since we use one service unit for both Debian 10 (4.x) and
Debian 11 (5.x). Instead of worse hacks, create a wrapper script to
start the worker with the correct arguments instead.
Done to make the resulting config match the expectations when using "vm
network add", which is that networks are below disks, not above.
Not a functional change, just ensures the VM XML is consistent after
many changes.
Ensures that the bytes_tohuman returns an integer to avoid the hacky
workaround of stripping off the B.
Adds a verification on the size of a new volume, that it is not larger
than the free space of the pool to prevent errors/excessively-large
volumes from being created.
Closes#120
Add nicer easy-to-find (yay ASCII art) banners for the startup printouts
of both the node and API daemons. Also adds the safe loader to pvcnoded
to prevent hassle messages and a version string in the API daemon file.