Includes a simple implementation of a zookeeper "rename" facility,
allowing a key and all data to be replaced by a new key with a different
name but containing all the same child elements and data.
[2/2] Implements #44
Store the flush_thread of a node as a class object. Before starting a
new flush thread (either flush or unflush), stop the existing one if it
exists to prevent further migrations, then start the new thread. Set the
object to None on init and again once the task actually finishes. Remove
the inflush flag as this is not required when using these threads and
functionally does nothing any longer, but add the flush_stopper flag to
trigger cancellation of the current job.
This just seemed like more trouble that it was worth. Flush locks were
originally intended as a way to counteract the weird issues around
flushing that were mostly fixed by the code refactoring, so this will
help test if those issues are truly gone. If not, will look into a
cleaner solution that doesn't result in unchangeable states.
Without this, the IPMI information set during initial node creation can
never be changed, which can cause issues later. Instead, always set it
fresh on each node boot.
Similar to recent client changes, don't replace the previous node record
of an already-migrated VM. Wait for shutdown if required. Use a
continue statement instead of a needless else block.
Adds a config flag that turns on the API client following the Primary
coordinator. The retcode of the start/stop commands is ignore so this
can fail gracefully if e.g. the client isn't installed.
This was very old code that was hard to follow and quite fragile, with
failures and infinite loops occurring fairly frequently. These changes
make the code more robust, including the addition of timeouts, some code
cleanup, and some improvements to the logical flow.
Also forces the libvirt migration to occur on the cluster network, which
couples to changes in the libvirtd listen (via pvc-ansible) and in
Daemon.py via the previous commit.
Reference: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=717215#68
Without this, DHCP fails when traversing only the local bridge, for
Debian Jessie or earlier (and possibly other OSes as well), due to the
missing UDP checksums. This disables the offload and hence reenables
the checksums even on the software-only bridge.
Also rearranged the steps and added comments arround this section to
better clarify what each command is doing.
There was really no need for this to be shared among all the
coordinators, which seemed more fragile. This way only the primary will
try to fence dead nodes.