It didn't make any sense to me for mem(prov) to be the default selector,
since this has too many caveats versus mem(free). Switch to using
mem(free) as the default (i.e. "mem") and make memprov the alternative.
1. Error out when trying to add a new network to a VM if the network
doesn't exist on the cluster.
2. When showing the VM list, only show invalid networks in red, not the
whole list.
1. Ensure that system_template and script are not nullable in the DB.
2. Ensure that the CLI and API enforce the above and clean up CLI
arguments for profile add.
3. Ensure that, before uploading OVAs, a 'default_ova' provisioning
script is present.
4. Use the 'default_ova' script for new OVA uploads.
5. Ensure that OVA details are properly added to the vm_data dict in the
provisioner vmbuilder.
This is a breakage between the older version of Celery (Deb10) and
newer. The hard removal broke Deb10 instances.
So try that first, and on failure, assume newer Celery format.
1. Add documentation on the node selector flags. In the API, reference
the daemon configuration manual which now includes details in this
section; in the CLI, provide the help in "pvc vm define" in detail and
then reference that command's help in the other commands that use this
field.
2. Ensure the naming is consistent in the CLI, using the flag name
"--node-selector" everywhere (was "--selector" for "pvc vm" commands and
"--node-selector" for "pvc provisioner" commands).
Adds commands to both replace an OSD disk, and refresh (reimport) an
existing OSD disk on a new node. This handles the cases where an OSD
disk should be replaced (either due to upgrades or failures) or where a
node is rebuilt in-place and an existing OSD must be re-imported to it.
This should avoid the need to do a full remove/add sequence for either
case.
Also cleans up some aspects of OSD removal that are identical between
methods (e.g. using safe-to-destroy and sleeping after stopping) and
fixes a bug if an OSD does not truly exist when the daemon starts up.
If there is...
1. No '--cluster' passed, and
2. No 'local' cluster, and
3. There is exactly one cluster configured
...then use that cluster by default in the CLI.
Allows an administrator to adjust the PG count of a given pool. This can
be used to increase the PGs (for example after adding more OSDs) or
decrease it (to remove OSDs, reduce CPU load, etc.).
Allows specifying a particular device class ("tier") for a given pool,
for instance SSD-only or NVMe-only. This is implemented with Crush
rules on the Ceph side, and via an additional new key in the pool
Zookeeper schema which is defaulted to "default".
Allows specifying blockdevs in the OSD and OSD-DB addition commands as
detect strings rather than actual block device paths. This provides
greater flexibility for automation with pvcbootstrapd (which originates
the concept of detect strings) and in general usage as well.
This ensures that any client command is logged by the local system.
Helps ensure Accounting for users of the CLI. Currently logs the full
command executed along with the $USER environment variable contents.
This is recommended by the Python Requests documentation:
> It’s a good practice to set connect timeouts to slightly larger than a
multiple of 3, which is the default TCP packet retransmission window.
This allows us to keep a very low connect timeout of 3 seconds, but also
ensure that long commands (e.g. --wait or VM disable) can take as long
as the API requires to complete.
Avoids having to explicitly set very long single-instance timeouts for
other functions which would block forever on an unreachable API.
Allows preserving colour within e.g. watch, where Click would normally
determine that it is "not a terminal". This is done via the wrapper echo
which filters via the local config.
Instead of erroring, just use the implication that restarting a VM does
not want a live modification, and proceed from there. Update the help
text to match.
When using the "state", "node", or "tag" arguments to a VM list, add
support for a "negate" flag to look for all VMs *not in* the state,
node, or tag state.
Allows choosing different list and info functions based on the benchmark
version found. Currently only implements "legacy" version 0 with more to
be added.
1. Move to a time-based (60s) benchmark to avoid these taking an absurd
amount of time to show the same information.
2. Eliminate the 256k random benchmarks, since they don't really add
anything.
3. Add in a 4k single-queue benchmark as this might provide valuable
insight into latency.
4. Adjust the output to reflect the above changes.
While this does change the benchmarking, this should not invalidate any
existing benchmarks since most of the test suit is unchanged (especially
the most important 4M sequential and 4K random tests). It simply removes
an unused entry and adds a more helpful one. The time-based change
should not significantly affect the results either, just reduces the
total runtime for long-tests and increase the runtime for quick tests to
provide a better picture.
The previous implementation did not work with /dev/nvme devices or any
/dev/disk/by-* devices due to some logical failures in the partition
naming scheme, so fix these, and be explicit about what is supported in
the PVC CLI command output.
The 'echo | gdisk' implementation of partition creation also did not
work due to limitations of subprocess.run; instead, use sgdisk which
allows these commands to be written out explicitly and is included in
the same package as gdisk.
The default of 0.05 (5%) is likely ideal in the initial implementation,
but allow this to be set explicitly for maximum flexibility in
space-constrained or performance-critical use-cases.
Adds in three parts:
1. Create an API endpoint to create OSD DB volume groups on a device.
Passed through to the node via the same command pipeline as
creating/removing OSDs, and creates a volume group with a fixed name
(osd-db).
2. Adds API support for specifying whether or not to use this DB volume
group when creating a new OSD via the "ext_db" flag. Naming and sizing
is fixed for simplicity and based on Ceph recommendations (5% of OSD
size). The Zookeeper schema tracks the block device to use during
removal.
3. Adds CLI support for the new and modified API endpoints, as well as
displaying the block device and DB block device in the OSD list.
While I debated supporting adding a DB device to an existing OSD, in
practice this ended up being a very complex operation involving stopping
the OSD and setting some options, so this is not supported; this can be
specified during OSD creation only.
Closes#142
Adds a new API endpoint to support hot attach/detach of devices, and the
corresponding client-side logic to use this endpoint when doing VM
network/storage add/remove actions.
The live attach is now the default behaviour for these types of
additions and removals, and can be disabled if needed.
Closes#141
Before, "-y"/"--yes" only confirmed the reboot portion. Instead, modify
this to confirm both the diff portion and the restart portion, and add
separate flags to bypass one or the other independently, ensuring the
administrator has lots of flexibility. UNSAFE mode implies "-y" so both
would be auto-confirmed if that option is set.
Add an additional protected class, limit manipulation to one at a time,
and ensure future flexibility. Also makes display consistent with other
VM elements.
Matches the new node list output format with the additional header line,
as well as revamps some other aspects:
1. Adjusts the UUID to be under the name in the info output.
2. Removes the UUID from the list output to save space, because this
is generally not needed in day-to-day quick-list output.
3. Renames the "Node" header to "Current" to better reflect what
that column actually means and avoid conflicting with the parent
header.
Also adjusts the layout of the node list output to avoid excessively
long lines. Adds another header line with categories and spacing dashes
for easier visual parsing.
Also fixes up the Debian packaging such that this works how I would
want, with proper module installation while leaving everything else
untouched. Finally implements automatic installation and removal of the
BASH completion for the PVC command.
For the "vm modify", revamp the way confirmations are presented. Do the
edits/load, show changes, verify XML, then prompt to write and the
restart. The previous order didn't make much sense.
For any of these `--restart` triggered VM modifications, also alter how
the confirmation works. If the user declines the restart, do not abort;
instead, just set restart=False and continue with the modification.
Allows another way (beyond --yes) to avoid confirming "unsafe"
operations. While there is probably nearly zero usecase for this (at
least to any sane admin), it is provided to allow maximum flexibility.
Sets in the node daemon, returns via the API, and shows in the CLI,
information about the live VNC listen address and port for VNC-enabled
VMs.
Closes#115
Adds cluster backup (JSON dump) and restore functions for use in
disaster recovery.
Further, adds additional confirmation to the initialization (as well as
restore) endpoints to avoid accidental triggering, and also groups the
init, backup, and restore commands in the CLI into a new "task"
subsection.
Before, the default False was problematic and would reset consoles if
the template was otherwise modified. Instead switch the flags to be full
true/false flags, and on modify, adjust the default to be None so they
will not be changed.
Allow a VM to specify its migration type as a default choice. The valid
options are "default" (i.e. behave as now), "live" which forces a live
migration only, and "shutdown" which forces a shutdown migration only.
The new option is treated as a VM meta option and is set to default if
not found.
Adds a separate field to the node memory, "provisioned", which totals
the amount of memory provisioned to all VMs on the node, regardless of
state, and in contrast to "allocated" which only counts running VMs.
Allows for the detection of potential overprovisioned states when
factoring in non-running VMs.
Includes the supporting code to get this data, since the original
implementation of VM memory selection was dependent on the VM being
running and getting this from libvirt. Now, if the VM is not active, it
gets this from the domain XML instead.
Since these will almost always connect to an IP rather than a "real"
hostname, don't verify the SSL cert (if applicable). Also allow the
overriding of SSL verification via an environment variable.
As a consequence, to reduce spam, SSL warnings are disabled for urllib3.
Instead, we warn in the "Using cluster" output whenever verification is
disabled.
Don't try to chmod every time, instead only chmod when first creating
the file. Also allow loading the default permission from an envvar
rather than hardcoding it.
Provide textual explanations for the degraded status, including
specific node/VM/OSD issues as well as detailed Ceph health. "Single
pane of glass" mentality.
Makes this output a little more realistic and allows proper monitoring
of the Ceph cluster status (separate from the PVC status which is
tracking only OSD up/in state).
Allow the specifying of arbitrary provisioner script install() args on
the provisioner create CLI, either overriding or adding additional
per-VM arguments to those found in the profile. Reference example is
setting a "vm_fqdn" on a per-run basis.
Closes#100
Provides a CLI and API argument to force live migration, which triggers
a new VM state "migrate-live". The node daemon VMInstance during migrate
will read this flag from the state and, if enforced, will not trigger a
shutdown migration.
Closes#95