We need to do a bit more finagling with the logger on termination to
ensure that all messages are written and the queue drained before
actually terminating.
Adds the ability to send node daemon logs to Zookeeper to facilitate a
command like "pvc node log", similar to "pvc vm log". Each node stores
its logs in a separate tree under "/logs" which can then be combined or
queried. By default, set by config, only 2000 lines are kept.
Add an additional protected class, limit manipulation to one at a time,
and ensure future flexibility. Also makes display consistent with other
VM elements.
Adds tags to schema (v3), to VM definition, adds function to modify
tags, adds function to get tags, and adds tags to VM data output.
Tags will enable more granular classification of VMs based either on
administrator configuration or from automated system events.
Prevents bad states where the VM is "removed" but some of its disks
remain due to e.g. stuck watchers.
Rearrange the sequence so it goes stop, delete disks, then delete VM,
and then return a failure if any of the disk(s) fail to remove, allowing
the task to be rerun after fixing the problem.
Regenerating the ZK connection was fraught with issues, including
duplicate connections, strange failures to reconnect, and various other
wonkiness.
Instead let Kazoo handle states sensibly. Kazoo moves to SUSPENDED state
when it loses connectivity, and stays there indefinitely (based on
cursory tests). And Kazoo seems to always resume from this just fine on
its own. Thus all that hackery did nothing but complicate reconnection.
This therefore turns the listener into a purely informational function,
providing logs of when/why it failed, and we also add some additional
output messages during initial connection and final disconnection.
When trying to write to sub-item paths that don't yet exist, the
previous method would just blindly write to whatever the root key is,
which is never what we actually want.
Instead, check explicitly for a "base path" situation, and handle that.
Then, if we try to get a subpath that isn't valid, return None. Finally
in the various functions, if the path is None, just continue (or return
false/None) and (try to) chug along.
The zkhandler read() function takes care of ensuring there is a None
value returned if these fail, so these aren't required. Makes the code a
fair bit more readable here.
This helps parallelize the numerous Zookeeper calls a little bit, at
least within the bounds of the GIL, to improve performance when getting
a large list of VMs. The max_workers value is capped at 32 to avoid
causing too many threads during concurrent executions, but still
provides a noticeable speedup (on the order of 0.2-0.4 seconds with 75
VMs, scaling up further as counts grow).
Usable anywhere that the global daemon "config" parameter can be passed
in (e.g. pvcapid/helper.py, pvcnoded/Daemon.py, etc.). Stores results in
a subdirectory of the PVC logdir called "profiler" if this directory can
be created, or prints results.
The debug config parameter ensures that the profiler can be added to
functions and not run unless the server is explicitly in debug mode.
Might not be useful as I don't initially plan to add this to every
function (only when investigating performance problems), but this
flexibility allows that to change later.
Instead of looping 5+ times acquiring an impossible lock on a
nonexistent key, just fail on a different error and return failure
immediately.
This is likely a major corner case that shouldn't happen, but better to
be safe than 500.
These cause a major (2x) slowdown in read calls since Zookeeper
connections are expensive/slow. Instead, just try the thing and return
None if there's no key there.
Also wrap the children command in similar error handling since that did
not exist and could likely cause some bugs at some point.
This *was* valuable when passing a full UUID in, so go back to that.
Verify first that the limit string is an actual UUID, and then compare
against it if applicable.
I can see no possible reason to want to do limits against UUIDs, but
supporting that means match is not what one would expect since a random
UUID could match the limit. So only limit based on the name.
With many VMs this slows down linearly. Rework it a bit so there are
fewer calls to getInformationFromXML and so the processing could happen
in parallel at some point.
Trying to do this on the VMInstance side had problems because we can't
differentiate the 3 types of migration there. So, just update this in
the API side and hope everything goes well.
This introduces an edge bug: if a VM is using a macvtap SR-IOV device,
and then tries to migrate, and the migrate is aborted, the NIC lists
will be inconsistent.
When I revamp the VMInstance in the future, I should be able to correct
this, but for now we'll have to live with that edgecase.
Adds support for the node daemon managing SR-IOV PF and VF instances.
PFs are added to Zookeeper automatically based on the config at startup
during network configuration, and are otherwise completely static. PFs
are automatically removed from Zookeeper, along with all coresponding
VFs, should the PF phy device be removed from the configuration.
VFs are configured based on the (autocreated) VFs of each PF device,
added to Zookeeper, and then a new class instance, SRIOVVFInstance, is
used to watch them for configuration changes. This will enable the
runtime management of VF settings by the API. The set of keys ensures
that both configuration and details of the NIC can be tracked.
Most keys are self-explanatory, especially for PFs and the basic keys
for VFs. The configuration tree is also self-explanatory, being based
entirely on the options available in the `ip link set {dev} vf` command.
Two additional keys are also present: `used` and `used_by`, which will
be able to track the (boolean) state of usage, as well as the VM that
uses a given VIF. Since the VM side implementation will support both
macvtap and direct "hostdev" assignments, this will ensure that this
state can be tracked on both the VF and the VM side.