This branch commit refactors the pvcnoded component to better adhere to
good programming practices. The previous Daemon.py was a massive file
which contained almost 2000 lines of direct, root-level code which was
directly imported. Not only was this poor practice, but this resulted
in a nigh-unmaintainable file which was hard even for me to understand.
This refactoring splits a large section of the code from Daemon.py into
separate small modules and functions in the `util/` directory. This will
hopefully make most of the functionality easy to find and modify without
having to dig through a single large file.
Further the existing subcomponents have been moved to the `objects/`
directory which clearly separates them.
Finally, the Daemon.py code has mostly been moved into a function,
`entrypoint()`, which is then called from the `pvcnoded.py` stub.
An additional item is that most format strings have been replaced by
f-strings to make use of the Python 3.6 features in Daemon.py and the
utility files.
We need to do a bit more finagling with the logger on termination to
ensure that all messages are written and the queue drained before
actually terminating.
Adds the ability to send node daemon logs to Zookeeper to facilitate a
command like "pvc node log", similar to "pvc vm log". Each node stores
its logs in a separate tree under "/logs" which can then be combined or
queried. By default, set by config, only 2000 lines are kept.
Previously, if the node failed to restart, it was declared a "bad fence"
and no further action would be taken. However, there are some
situations, for instance critical hardware failures, where intelligent
systems will not attempt (or succeed at) starting up the node in such a
case, which would result in dead, known-offline nodes without recovery.
Tweak this behaviour somewhat. The main path of Reboot -> Check On ->
Success + fence-flush is retained, but some additional side-paths are
now defined:
1. We attempt to power "on" the chassis 1 second after the reboot, just
in case it is off and can be recovered. We then wait another 2 seconds
and check the power status (as we did before).
2. If the reboot succeeded, follow this series of choices:
a. If the chassis is on, the fence succeeded.
b. If the chassis is off, the fence "succeeded" as well.
c. If the chassis is in some other state, the fence failed.
3. If the reboot failed, follow this series of choices:
a. If the chassis is off, the fence itself failed, but we can treat
it as "succeeded"" since the chassis is in a known-offline state.
This is the most likely situation when there is a critical hardware
failure, and the server's IPMI does not allow itself to start back
up again.
b. If the chassis is in any other state ("on" or unknown), the fence
itself failed and we must treat this as a fence failure.
Overall, this should alleviate the aforementioned issue of a critical
failure rendering the node persistently "off" not triggering a
fence-flush and ensure fencing is more robust.
This reverts commit 65d14ccd92.
This was actually a bad idea. For inexplicable reasons, running these
Ceph commands manually (not even via Python, but in a normal shell)
takes 7 * two orders of magnitude longer than running them with the
Rados module, so long in fact that some basic commands like "ceph
health" would sometimes take longer than the 1 second timeout to
complete. The Rados commands would however take about 1ms instead.
Despite the occasional issues when monitors drop out, the Rados module
is clearly far superior to the shell commands for any moderately-loaded
Ceph cluster. We can look into solving timeouts another way (perhaps
with Processes instead of Threads) at a later time.
Rados module "ceph health":
b'{"checks":{},"status":"HEALTH_OK"}'
0.001204 (s)
b'{"checks":{},"status":"HEALTH_OK"}'
0.001258 (s)
Command "ceph health":
joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
real 0m0.772s
user 0m0.707s
sys 0m0.046s
joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
real 0m0.796s
user 0m0.728s
sys 0m0.054s
Using the Rados module was very problematic, specifically because it had
no sensible timeout parameters and thus would hang for many seconds.
This has poor implications since it blocks further keepalives.
Instead, remove the Rados usage entirely and go back completely to using
manual OS commands to gather this information. While this may cause PID
exhaustion more quickly it's worthwhile to avoid failure scenarios when
Ceph stats time out.
Closes#137
Not sure how this didn't cause an issue until now, but the wrong key
path was used and this was getting unexpected data with the newly-added
version string instead of the proper mode string.
When doing a stop_vm or terminate_vm, check again after 0.2 seconds
and try re-terminating if it's still running. Covers cases where a VM
doesn't stop if given the 'stop' state.
Trying to do this on the VMInstance side had problems because we can't
differentiate the 3 types of migration there. So, just update this in
the API side and hope everything goes well.
This introduces an edge bug: if a VM is using a macvtap SR-IOV device,
and then tries to migrate, and the migrate is aborted, the NIC lists
will be inconsistent.
When I revamp the VMInstance in the future, I should be able to correct
this, but for now we'll have to live with that edgecase.