[BUG] Handle the whole Zookeeper cluster going down gracefully #4
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
As-is if the whole ZK cluster goes down, each node will go down as well and possibly fence one-another. Handle this gracefully (perhaps via all-VMs-stop?) Needs research: how to know the state of the ZK cluster? Kazoo doesn't seem to do this.
As far as I can tell there isn't an easy way to do this. However, this also ties into another potential feature - attempting to connect to other nodes' Zookeeper instances to reconnect. I'm very torn on this - on the one hand I do like the tight coupling of PVC to its node's Zookeeper, especially given that ZK conns would need to be re-handled.
I suppose a valid solution is to simply attempt a connection to the other nodes' instances and, if all are unsuccessful, assume the cluster is dead and disable fencing; otherwise just hang on for the local instance to start again (without actually switching conns).
Actually, I'm realizing that this is already "solved" in the architecture.
If the entire ZK cluster goes down, ALL nodes will stop their keepalive threads. They will therefore never try to do fencing (since the
update_zookeeper
function will no longer be called).This still leaves the failure scenario where the ZK cluster is dead and all nodes are in a disconnect state, however I'm not sure that taking any action at all at that point is desirable. PVC is basically always acting on the assumption that the ZK instance will come back, and it will wait indefinitely for that to happen. However at what threshold is it worthwhile to consider the whole cluster dead and take the very drastic action of shutting everything down? I'm not sure that will ever make sense. Comments @michal?
Won't fix due to logic above.
closed
Don't assume, allow it to be configurable and set a sane default.
I would say this as the default is fine, (if state is unknown, stay alive) but admin should configure it based on their use case. For example in some situations the admin might prefer to "commit suicide" if the state is unknown for longer than 60 seconds - unknown states might be untenable for certain classes of users.
reopened
Right but there is also zero way to determine cluster health from within Kazoo (because, even if every node is unreachable, that means nothing), so the point is moot - I can't actually implement this in any reasonably clean way. Fencing is 100% a remote-host operation, not a local one.
I think you misunderstood. if the cluster is unreachable, it is by definition unhealthy. Perhaps not in reality, but from that nodes point of view.
This isn't so much about fencing decisions (though I suppose you could view it that way) as much as what should the daemon do if it can't determent current state. An admin should be able to control via a config option what they want to do
The sane default I agree is
do nothing
but in certain cluster designs, where the storage network and control/system networks are fully gapped you can run into a situations where you may with the response to an isolation event is to commit suicide.Situation 1: A node gets isolated on its control/system network, but rest of cluster overall is healthy, and storage network is healthy. As cluster is healthy, it tries to fence the node, but due to the system networks being down, it fails. node stays up, and cluster restarts VMs, and now you have split brain at storage level.
Situation 2: A node gets isolated on its control/system/storage network, but cluster overall is healthy. As cluster is healthy, it tries to fence the node, but fails. Node stays up, and cluster restarts VMs. Network isolation gets repaired (either manually intervention, or automatically {transient event}) and now the VMs are alive in two locations - split brain.
OK I think the real solution is then to make sure a node is actually fenced before trying to recover the VMs. Which I planned to implement eventually once I could test fencing on physical hardware 😆
In that case, neither of those can happen without manual intervention because fencing won't try to migrate VMs away if the fence fails. Only when a fence succeeds (as defined by the ipmitool command) will it then say "OK I killed the node, let's start the VMs elsewhere".
I have to look at it as a fencing problem, because if ZK is unreachable then it's impossible for a node to self-fence and migrate anyways. A node has to be dead to the perspective of another alive node.
What I CAN do is implement a "kill my VMs forcibly" timeout - that stays node local, but in that case without an external fence event, there will be no migration - VMs would restart on the same node once it becomes healthy again.
From discussion - add config options for:
mentioned in commit
ad4d3d794b
mentioned in commit
9ef5fcb836
mentioned in commit
f5054b1bc7
mentioned in commit
8052dce50d
This configurability was added in the above commits but will need testing. Closing!
closed