For the "vm modify", revamp the way confirmations are presented. Do the
edits/load, show changes, verify XML, then prompt to write and the
restart. The previous order didn't make much sense.
For any of these `--restart` triggered VM modifications, also alter how
the confirmation works. If the user declines the restart, do not abort;
instead, just set restart=False and continue with the modification.
Allows another way (beyond --yes) to avoid confirming "unsafe"
operations. While there is probably nearly zero usecase for this (at
least to any sane admin), it is provided to allow maximum flexibility.
Sets in the node daemon, returns via the API, and shows in the CLI,
information about the live VNC listen address and port for VNC-enabled
VMs.
Closes#115
Adds cluster backup (JSON dump) and restore functions for use in
disaster recovery.
Further, adds additional confirmation to the initialization (as well as
restore) endpoints to avoid accidental triggering, and also groups the
init, backup, and restore commands in the CLI into a new "task"
subsection.
Before, the default False was problematic and would reset consoles if
the template was otherwise modified. Instead switch the flags to be full
true/false flags, and on modify, adjust the default to be None so they
will not be changed.
Allow a VM to specify its migration type as a default choice. The valid
options are "default" (i.e. behave as now), "live" which forces a live
migration only, and "shutdown" which forces a shutdown migration only.
The new option is treated as a VM meta option and is set to default if
not found.
Adds a separate field to the node memory, "provisioned", which totals
the amount of memory provisioned to all VMs on the node, regardless of
state, and in contrast to "allocated" which only counts running VMs.
Allows for the detection of potential overprovisioned states when
factoring in non-running VMs.
Includes the supporting code to get this data, since the original
implementation of VM memory selection was dependent on the VM being
running and getting this from libvirt. Now, if the VM is not active, it
gets this from the domain XML instead.
Since these will almost always connect to an IP rather than a "real"
hostname, don't verify the SSL cert (if applicable). Also allow the
overriding of SSL verification via an environment variable.
As a consequence, to reduce spam, SSL warnings are disabled for urllib3.
Instead, we warn in the "Using cluster" output whenever verification is
disabled.
Don't try to chmod every time, instead only chmod when first creating
the file. Also allow loading the default permission from an envvar
rather than hardcoding it.
Provide textual explanations for the degraded status, including
specific node/VM/OSD issues as well as detailed Ceph health. "Single
pane of glass" mentality.
Makes this output a little more realistic and allows proper monitoring
of the Ceph cluster status (separate from the PVC status which is
tracking only OSD up/in state).
Allow the specifying of arbitrary provisioner script install() args on
the provisioner create CLI, either overriding or adding additional
per-VM arguments to those found in the profile. Reference example is
setting a "vm_fqdn" on a per-run basis.
Closes#100
Provides a CLI and API argument to force live migration, which triggers
a new VM state "migrate-live". The node daemon VMInstance during migrate
will read this flag from the state and, if enforced, will not trigger a
shutdown migration.
Closes#95
Instead of using group-based validation, which breaks the help context
for subcommands, use a decorator to validate the cluster status for each
command. The eager help option will then override this decorator for
help commands, while enforcing it for others.
Provide pretty status bars to indicate upload progress for tasks that
perform large file uploads to the API ('provisioner ova upload' and
'storage volume upload') so the administrator can gauge progress and
estimated time to completion.
Use a different wait method of querying the node status every
half-second during the transition, in order to wait on the transition to
complete if desired.
Closes#72
Add functions for uploading, listing, and removing OVA images to the API
and CLI interfaces. Includes improved parsing of the OVF and creation of
a system_template and profile for each OVA.
Also modifies some behaviour around profiles, making most components
option at creation to support both profile types (and incomplete
profiles generally).
Implementation part 2/3 - remaining: OVA VM creation
References #71
Rename "pvcd" to "pvcnoded", and "pvc-api" to "pvcapid" so names for the
daemons are fully consistent. Update the names of the configuration
files as well to match this new formatting.
References #79
Warn the administrator if there are active provisioning jobs while
adjusting the current primary node. This is the simplest, cleanest
solution to #69 without trying to implement any hacks or blocking
operations. The administrator can then decide to revert the action
if needed, or will at least know how many jobs are running/queued and
may need to be cancelled.
Move everything from under "storage ceph" to "storage" to simplify the
CLI; additional subclasses can be re-added at a future time if and when
additional storage classes are supported.
Implements a "maintenance mode" for PVC clusters. For now, the only
thing this mode does is disable node fencing while the state is true.
This allows the administrator to tell PVC that network connectivity,
etc. might be interrupted and to avoid fencing nodes.
Closes#70
Move all API calls to a new common function called call_api to
facilitate easier future changes. Use this function to implement API key
handling via request header value as well as integrate the request URI
generation and debug output handling.
Closes#65
Implements the storing of three VM metadata attributes:
1. Node limits - allows specifying a list of hosts on which the VM must
run. This limit influences the migration behaviour of VMs.
2. Per-VM node selectors - allows each VM to have its migration
autoselection method specified, to automatically allow different methods
per VM based on the administrator's preferences.
3. VM autorestart - allows a VM to be automatically restarted from a
stopped state, presumably due to a failure to find a target node (either
due to limits or otherwise) during a flush/fence recovery, on the next
node unflush/ready state of its home hypervisor. Useful mostly in
conjunction with limits to ensure that VMs which were shut down due to
there being no valid migration targets are started back up when their
node becomes ready again.
Includes the full client interaction with these metadata options,
including printing, as well as defining a new function to modify this
metadata. For the CLI it is set/modified either on `vm define` or via the
`vm meta` command. For the API it is set/modified either on a POST to
the `/vm` endpoint (during VM definition) or on POST to the `/vm/<vm>`
endpoint. For the API this replaces the previous reserved word for VM
creation from scratch as this will no longer be implemented in-daemon
(see #22).
Closes#52
This is likely not going to be used with the planned implementation of
the automatic provisioning daemon, which will be either an API client or
direct Python binding client.
Includes a simple implementation of a zookeeper "rename" facility,
allowing a key and all data to be replaced by a new key with a different
name but containing all the same child elements and data.
[2/2] Implements #44
Add the "storage" prefix to all Ceph-based commands in both the CLI and
the API. This partially abstracts the storage subsystem from the Ceph
tool specifically, should future storage subsystems be added or changed.
The name "ceph" is still used due to the non-abstracted components of
the Ceph management, e.g. referencing Ceph-specific concepts like OSDs
or pools.
This just seemed like more trouble that it was worth. Flush locks were
originally intended as a way to counteract the weird issues around
flushing that were mostly fixed by the code refactoring, so this will
help test if those issues are truly gone. If not, will look into a
cleaner solution that doesn't result in unchangeable states.
Previous to this, if once force-migrated a VM, the previous_node value
would be updated to the current node, which is likely never what an
administrator would want. Change this functionality so that the previous
node value is not changed, and update the documentation to reflect this.
Implements a locking mechanism to prevent clobbering of node
flushes. When a flush begins, a global cluster lock is placed
which is freed once the flush completes. While the lock is in place,
other flush events queue waiting for the lock to free before
proceeding.
Modifies the CLI output flow when the `--wait` option is specified.
First, if a lock exists when running the command, the message is
tweaked to indicate this, and the client will wait first for the
lock to free, and then for the flush as normal. Second, the wait
depends on the active lock rather than the domain_status for
consistency purposes.
Closes#32
Implements the ability for a client to watch almost-live domain
console logs from the hypervisors. It does this using a deque-based
"tail -f" mechanism (with a configurable buffer per-VM) that watches
the domain console logfile in the (configurable) directory every
half-second. It then stores the current buffer in Zookeeper when
changed, where a client can then request it, either as a static piece
of text in the `less` pager, or via a similar "tail -f" functionality
implemented using fixed line splitting and comparison to provide a
generally-seamless output.
Enabling this feature requires each guest VM to implement a Libvirt
serial log and write its (text) console to it, for example using the
default logging directory:
```
<serial type='pty'>
<log file='/var/log/libvirt/vmname.log' append='off'/>
<serial>
```
The append mode can be either on or off; on grows files unbounded,
off causes the log (and hence the PVC log data) to be truncated on
initial VM startup from offline. The administrator must choose how
they best want to handle this until Libvirt implements their own
clog-type logging format.
Completely restructure the daemon code to move the 4 discrete daemons
into a single daemon that can be run on every hypervisor. Introduce the
idea of a static list of "coordinator" nodes which are configured at
install time to run Zookeeper and FRR in router mode, and which are
allowed to take on client network management duties (gateway, DHCP, DNS,
etc.) while also allowing them to run VMs (i.e. no dedicated "router"
nodes required).