Adds commands to both replace an OSD disk, and refresh (reimport) an
existing OSD disk on a new node. This handles the cases where an OSD
disk should be replaced (either due to upgrades or failures) or where a
node is rebuilt in-place and an existing OSD must be re-imported to it.
This should avoid the need to do a full remove/add sequence for either
case.
Also cleans up some aspects of OSD removal that are identical between
methods (e.g. using safe-to-destroy and sleeping after stopping) and
fixes a bug if an OSD does not truly exist when the daemon starts up.
Allows an administrator to adjust the PG count of a given pool. This can
be used to increase the PGs (for example after adding more OSDs) or
decrease it (to remove OSDs, reduce CPU load, etc.).
Allows specifying a particular device class ("tier") for a given pool,
for instance SSD-only or NVMe-only. This is implemented with Crush
rules on the Ceph side, and via an additional new key in the pool
Zookeeper schema which is defaulted to "default".
Allows specifying blockdevs in the OSD and OSD-DB addition commands as
detect strings rather than actual block device paths. This provides
greater flexibility for automation with pvcbootstrapd (which originates
the concept of detect strings) and in general usage as well.
Instead of requiring the VM to already be stopped, instead allow disable
state changes to perform a shutdown first. Also add a force option which
will do a hard stop instead of a shutdown.
References #148
When using the "state", "node", or "tag" arguments to a VM list, add
support for a "negate" flag to look for all VMs *not in* the state,
node, or tag state.
1. Remove ramp_time as this was giving very strange results.
2. Up the runtime to 75 seconds to compensate.
3. Print the fio command to the console to validate.
1. Runs `fio` with the `--format=json` option and removes all terse
format parsing from the results.
2. Adds a 15-second ramp time to minimize wonky ramp-up results.
3. Sets group_reporting, which isn't necessary with only a single job,
but is here for consistency.
Adds a test_format database column and a value in the API return for the
test format version, starting at 0 for the existing format as of 0.9.37.
References #143
1. Move to a time-based (60s) benchmark to avoid these taking an absurd
amount of time to show the same information.
2. Eliminate the 256k random benchmarks, since they don't really add
anything.
3. Add in a 4k single-queue benchmark as this might provide valuable
insight into latency.
4. Adjust the output to reflect the above changes.
While this does change the benchmarking, this should not invalidate any
existing benchmarks since most of the test suit is unchanged (especially
the most important 4M sequential and 4K random tests). It simply removes
an unused entry and adds a more helpful one. The time-based change
should not significantly affect the results either, just reduces the
total runtime for long-tests and increase the runtime for quick tests to
provide a better picture.
The default of 0.05 (5%) is likely ideal in the initial implementation,
but allow this to be set explicitly for maximum flexibility in
space-constrained or performance-critical use-cases.
Adds in three parts:
1. Create an API endpoint to create OSD DB volume groups on a device.
Passed through to the node via the same command pipeline as
creating/removing OSDs, and creates a volume group with a fixed name
(osd-db).
2. Adds API support for specifying whether or not to use this DB volume
group when creating a new OSD via the "ext_db" flag. Naming and sizing
is fixed for simplicity and based on Ceph recommendations (5% of OSD
size). The Zookeeper schema tracks the block device to use during
removal.
3. Adds CLI support for the new and modified API endpoints, as well as
displaying the block device and DB block device in the OSD list.
While I debated supporting adding a DB device to an existing OSD, in
practice this ended up being a very complex operation involving stopping
the OSD and setting some options, so this is not supported; this can be
specified during OSD creation only.
Closes#142
Adds a new API endpoint to support hot attach/detach of devices, and the
corresponding client-side logic to use this endpoint when doing VM
network/storage add/remove actions.
The live attach is now the default behaviour for these types of
additions and removals, and can be disabled if needed.
Closes#141
Add an additional protected class, limit manipulation to one at a time,
and ensure future flexibility. Also makes display consistent with other
VM elements.
Like the other Celery job this does not work properly with the
ZKConnection decorator due to conflicting "self", so just connect
manually exactly like the provisioner task does.
Done to make the resulting config match the expectations when using "vm
network add", which is that networks are below disks, not above.
Not a functional change, just ensures the VM XML is consistent after
many changes.
Ensures that the bytes_tohuman returns an integer to avoid the hacky
workaround of stripping off the B.
Adds a verification on the size of a new volume, that it is not larger
than the free space of the pool to prevent errors/excessively-large
volumes from being created.
Closes#120
Add nicer easy-to-find (yay ASCII art) banners for the startup printouts
of both the node and API daemons. Also adds the safe loader to pvcnoded
to prevent hassle messages and a version string in the API daemon file.
Sets in the node daemon, returns via the API, and shows in the CLI,
information about the live VNC listen address and port for VNC-enabled
VMs.
Closes#115
Adds cluster backup (JSON dump) and restore functions for use in
disaster recovery.
Further, adds additional confirmation to the initialization (as well as
restore) endpoints to avoid accidental triggering, and also groups the
init, backup, and restore commands in the CLI into a new "task"
subsection.
Properly fixes the issue with OVA upload bodies by allowing the
restriction of the 'location' directive when parsing specific request
args. Thus the 'form' location can be included by default but removed
for those parsers that have a file body.
This reverts commit d63e757c32.
This did not work; by readding 'form' checking, the attempt to isolate
the large file upload was again thwarted. Another solution, perhaps
specific to the uploads, is needed instead.
Allow a VM to specify its migration type as a default choice. The valid
options are "default" (i.e. behave as now), "live" which forces a live
migration only, and "shutdown" which forces a shutdown migration only.
The new option is treated as a VM meta option and is set to default if
not found.
Gevent was completely failure. The API would block during large file
uploads with no obvious solutions beyond "use gunicorn", which is not
suited to this. I originally had this working with the Flask "debug"
server, so just move to using that all the time. SSL is added using a
custom context with the OpenSSL library, so include that as a
dependency.
By default, Werkzeug would require the entire file (be it an OVA or
image file) to be uploaded and saved to a temporary, fake file under
`/tmp`, before any further processing could occur. This blocked most of
the execution of these functions until the upload was completed.
This entirely defeated the purpose of what I was trying to do, which was
to save the uploads directly to the temporary blockdev in each case,
thus avoiding any sort of memory or (host) disk usage.
The solution is two-fold:
1. First, ensure that the `location='args'` value is set in
RequestParser; without this, the `files` portion would be parsed
during the argument parsing, which was the original source of this
blocking behaviour.
2. Instead of the convoluted request handling that was being done
originally here, instead entirely defer the parsing of the `files`
arguments until the point in the code where they are ready to be
saved. Then, using an override stream_factory that simply opens the
temporary blockdev, the upload can commence while being written
directly out to it, rather than using `/tmp` space.
This does alter the error handling slightly; it is impossible to check
if the argument was passed until this point in the code, so it may take
longer to fail if the API consumer does not specify a file as they
should. This is a minor trade-off and I would expect my API consumers to
be sane here.
Adds a separate field to the node memory, "provisioned", which totals
the amount of memory provisioned to all VMs on the node, regardless of
state, and in contrast to "allocated" which only counts running VMs.
Allows for the detection of potential overprovisioned states when
factoring in non-running VMs.
Includes the supporting code to get this data, since the original
implementation of VM memory selection was dependent on the VM being
running and getting this from libvirt. Now, if the VM is not active, it
gets this from the domain XML instead.
Makes this output a little more realistic and allows proper monitoring
of the Ceph cluster status (separate from the PVC status which is
tracking only OSD up/in state).
Make the provisioner a bit more robust. This way, even if a provisioning
step fails, cleanup is still performed this preventing the system from
being left in an undefined state requiring manual correction.
Addresses #91
Allow the specifying of arbitrary provisioner script install() args on
the provisioner create CLI, either overriding or adding additional
per-VM arguments to those found in the profile. Reference example is
setting a "vm_fqdn" on a per-run basis.
Closes#100
Provides a CLI and API argument to force live migration, which triggers
a new VM state "migrate-live". The node daemon VMInstance during migrate
will read this flag from the state and, if enforced, will not trigger a
shutdown migration.
Closes#95
Doing so can create an image that is 1 sector (512 bytes) too large,
which will then break qemu-img because it's stupid (or, VMDK is stupid,
I haven't decided which is).. Current Ceph rbd commands seem to accept
--size in bytes so this is fine.
Ensure all API return messages are formated the same: no "error", a
final period except when displaying Exception text, and a regular spaced
out format.
Finishes a basic form of OVA provisioning within the existing create_vm
function. Future plans should include separating out the functions and
cleaning them up a bit more, but this is sufficient for basic operation.
Closes#71
Add functions for uploading, listing, and removing OVA images to the API
and CLI interfaces. Includes improved parsing of the OVF and creation of
a system_template and profile for each OVA.
Also modifies some behaviour around profiles, making most components
option at creation to support both profile types (and incomplete
profiles generally).
Implementation part 2/3 - remaining: OVA VM creation
References #71
Add management of the pvcprov database with SQLAlchemy, to allow
seamless management of the database. Add automatic tasks to the postinst
of the API to execute these migrations.
Allow the user to specify other, non-raw files and upload them,
performing a conversion with qemu-img convert and a temporary block
device as a shim (since qemu-img can't use FIFOs).
Also ensures that the target volume exists before proceeding.
Addresses #68
Rename "pvcd" to "pvcnoded", and "pvc-api" to "pvcapid" so names for the
daemons are fully consistent. Update the names of the configuration
files as well to match this new formatting.
References #79