Handle rollback of node primary migration #42

Closed
opened 2019-07-09 11:01:51 -04:00 by JoshuaBoniface · 3 comments
JoshuaBoniface commented 2019-07-09 11:01:51 -04:00 (Migrated from git.bonifacelabs.ca)

If the Patroni database fails to move, the node should back out of attempting to become primary. As-is, it simply forces forward but the domain aggregator fails to start. This might be a very difficult issue to solve, so still up in the air.

If the Patroni database fails to move, the node should back out of attempting to become primary. As-is, it simply forces forward but the domain aggregator fails to start. This might be a very difficult issue to solve, so still up in the air.
JoshuaBoniface commented 2019-08-04 16:27:36 -04:00 (Migrated from git.bonifacelabs.ca)

Leaning towards this being a non-issue, or maybe the case of locking the state transition to prevent another switch of the primary while one is still occurring. Going to try implementing the latter just to see how it goes.

Leaning towards this being a non-issue, or maybe the case of locking the state transition to prevent another switch of the primary while one is still occurring. Going to try implementing the latter just to see how it goes.
JoshuaBoniface commented 2019-08-04 17:03:47 -04:00 (Migrated from git.bonifacelabs.ca)

This has been implemented in a safe way to handle the various failure modes.

  1. If the Patroni leader is already us, just print a warning.
  2. If the Patroni leader switchover fails, wait 2s then try again indefinitely.
  3. Write lock the /primary_node key when performing a become_primary() action. Combined with the above, this will lock up the node if Patroni is in a bad state, which is somewhat less ideal but is a situation that requires administrator intervention.
This has been implemented in a safe way to handle the various failure modes. 1. If the Patroni leader is already us, just print a warning. 2. If the Patroni leader switchover fails, wait 2s then try again indefinitely. 3. Write lock the `/primary_node` key when performing a `become_primary()` action. Combined with the above, this will lock up the node if Patroni is in a bad state, which is somewhat less ideal but is a situation that requires administrator intervention.
JoshuaBoniface commented 2019-08-04 17:03:50 -04:00 (Migrated from git.bonifacelabs.ca)

closed

closed
Sign in to join this conversation.
No Milestone
No project
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: parallelvirtualcluster/pvc#42
No description provided.