pvc/node-daemon at 9ea9ac3b8a0d75af0e5e42d5b0d7864db1a0e2d8 - pvc

Files

Joshua M. Boniface c6d552ae57 Rework success checks for IPMI fencing

Previously, if the node failed to restart, it was declared a "bad fence"
and no further action would be taken. However, there are some
situations, for instance critical hardware failures, where intelligent
systems will not attempt (or succeed at) starting up the node in such a
case, which would result in dead, known-offline nodes without recovery.

Tweak this behaviour somewhat. The main path of Reboot -> Check On ->
Success + fence-flush is retained, but some additional side-paths are
now defined:

1. We attempt to power "on" the chassis 1 second after the reboot, just
in case it is off and can be recovered. We then wait another 2 seconds
and check the power status (as we did before).

2. If the reboot succeeded, follow this series of choices:

    a. If the chassis is on, the fence succeeded.

    b. If the chassis is off, the fence "succeeded" as well.

    c. If the chassis is in some other state, the fence failed.

3. If the reboot failed, follow this series of choices:

    a. If the chassis is off, the fence itself failed, but we can treat
    it as "succeeded"" since the chassis is in a known-offline state.
    This is the most likely situation when there is a critical hardware
    failure, and the server's IPMI does not allow itself to start back
    up again.

    b. If the chassis is in any other state ("on" or unknown), the fence
    itself failed and we must treat this as a fence failure.

Overall, this should alleviate the aforementioned issue of a critical
failure rendering the node persistently "off" not triggering a
fence-flush and ensure fencing is more robust.

2021-07-13 17:54:41 -04:00

monitoring

Improve Munin check with extinfo

2020-10-19 11:01:00 -04:00

pvcnoded

Rework success checks for IPMI fencing

2021-07-13 17:54:41 -04:00

daemon_lib

Add daemon_lib symlink to pvcnoded

2021-05-30 00:00:07 -04:00

pvc-flush.service

Increase start delay of flush service