This may or may not help, but should in theory prevent the flush from
trying to run after a (locally-running) API daemon is terminated, which
could cause an API failure and a failure to flush.
This will stop systemd from killing the service in the middle of a flush
or unflush operation, which completely defeats the purpose. 30 minutes
was chosen as this is a very large but still somewhat manageable value,
which should cover even a very large very loaded cluster with room to
spare.
Most of these actions/conditionals were looking for primary state, but
were failing during node takeover. Update the conditionals to look for
both router states instead.
Also add a wait to lock flushing until a takeover is completed.
Use a pair of transitional states, "takeover" and "relinquish", when
transitioning between primary and secondary coordinator states. This
provides a clsuter-wide record that the nodes are still working during
their synchronous transition states, and should allow clients to
determine when the node(s) have fully switched over. Also add an
additional 2 seconds of wait at the end of the transition jobs to ensure
everything has had a chance to start before proceeding.
References #72
Rename "pvcd" to "pvcnoded", and "pvc-api" to "pvcapid" so names for the
daemons are fully consistent. Update the names of the configuration
files as well to match this new formatting.
References #79
Modifies the storage and upstream networks to mirror the cluster
network, with a bridge on top of the underlying specified dev, and all
IPs bound to the bridge.
Allows creating VMs in the storage or upstream networks, as well as the
cluster network, should the administrator choose to do so (manually).
Implements a "maintenance mode" for PVC clusters. For now, the only
thing this mode does is disable node fencing while the state is true.
This allows the administrator to tell PVC that network connectivity,
etc. might be interrupted and to avoid fencing nodes.
Closes#70
Required due to #64. Bridged networks were being created on top of a
vLAN if the Cluster network was a vLAN device, rather than being created
on the underlying device. This came from a previous revision of the
cluster architecture guidelines where Cluster was supposed to be a raw
device rather than a vLAN. This fixed the problem by implementing a
configuration field for a "bridge_device", a NIC device that can then
have the bridged vLANs created on top of it.
Fixes#64
Prevents blocking the main thread(s) while a VM is changing state. In
particular, this caused some issues with nodes not responding to
cancellation/reversal of a flush/ready state until the previous
migration was finished, which could cause issues. This entire subset of
actions is now threaded and so can run on its own in the background.
This particular arping interval/count, along with forcing it to run in
the foreground, seems to minimize the packet loss when the primary
coordinator transitions. Through extensive testing, this value results
in the, consistently, least amount of loss: 1-2 pings, at an 0.025s ping
interval, return "TTL exceeded", with no other loss, and only when the
node the test VM is on is the one switching to secondary state. No other
combination of values here, nor tweaks to other parts of the code, seem
able to reduce this further, therefore this is likely the best
configuration possible.
The previous method was a "throw it in the sea"-type migration with some
(very arbitrary) sleep statements thrown in for good measure.
Reimplement this with some hard locking. During each phase of the
transition, the nodes acquire read/write shared locks to a Zookeeper key
so that they can tightly coordinate the actions of transferring each
part of the primary state between them. This is done in a subthread to
prevent strange blocking issues that were encountered, likely due to
business in the existing main thread.
Implements the storing of three VM metadata attributes:
1. Node limits - allows specifying a list of hosts on which the VM must
run. This limit influences the migration behaviour of VMs.
2. Per-VM node selectors - allows each VM to have its migration
autoselection method specified, to automatically allow different methods
per VM based on the administrator's preferences.
3. VM autorestart - allows a VM to be automatically restarted from a
stopped state, presumably due to a failure to find a target node (either
due to limits or otherwise) during a flush/fence recovery, on the next
node unflush/ready state of its home hypervisor. Useful mostly in
conjunction with limits to ensure that VMs which were shut down due to
there being no valid migration targets are started back up when their
node becomes ready again.
Includes the full client interaction with these metadata options,
including printing, as well as defining a new function to modify this
metadata. For the CLI it is set/modified either on `vm define` or via the
`vm meta` command. For the API it is set/modified either on a POST to
the `/vm` endpoint (during VM definition) or on POST to the `/vm/<vm>`
endpoint. For the API this replaces the previous reserved word for VM
creation from scratch as this will no longer be implemented in-daemon
(see #22).
Closes#52
Adds some logic to allow an active shutdown state to be aborted by
changing the VM to another state. Useful mostly if a VM is doing funky
things and not responding to the shutdown, but the administrator either
doesn't want to wait for the timer to expire (forcing an immediate
termination) or wishes to abort the shutdown attempt.
Fixes#49