Use power off in fence instead of reset

Use a power off (and then make the power on a requirement) during a node
fence. Removes some potential ambiguity in the power state, since we
will know for certain if it is off.
This commit is contained in:
Joshua Boniface 2021-10-12 10:59:09 -04:00
parent 2d9fb9688d
commit 8f906c1f81
2 changed files with 24 additions and 8 deletions

View File

@ -322,7 +322,7 @@ Once the cluster, and specifically one node in the cluster, has determined that
During the `dead` process, the failed node has 6 chances, called "saving throws", at `keepalive_interval` second windows, to send another keepalive before it is fenced. This additional, fixed, delay helps ensure that the cluster will gracefully recover from intermittent network failures or loss of Zookeeper contact, by providing nodes up to another 6 keepalive intervals to save themselves once the fence timer actually begins. This bring the total time, with default options, of a node stopping contact to a node being fenced, to between 60 and 65 seconds. This duration is considered by the author an acceptable compromise between speedy recovery and avoiding false positives (and hence larger outages). During the `dead` process, the failed node has 6 chances, called "saving throws", at `keepalive_interval` second windows, to send another keepalive before it is fenced. This additional, fixed, delay helps ensure that the cluster will gracefully recover from intermittent network failures or loss of Zookeeper contact, by providing nodes up to another 6 keepalive intervals to save themselves once the fence timer actually begins. This bring the total time, with default options, of a node stopping contact to a node being fenced, to between 60 and 65 seconds. This duration is considered by the author an acceptable compromise between speedy recovery and avoiding false positives (and hence larger outages).
Once a node has been marked `dead` and has failed its 6 "saving throws", the fence process triggers an IPMI chassis reset sequence. First, the node is issued the standard IPMI `chassis power reset` command to trigger a cold system reset. Next, it waits a fixed 1 second and then issues a `chassis power on` signal to ensure the node is powered on (just in case it had already shut itself off). The node then waits a fixed 2 seconds, and then checks the current `chassis power status`. Using the results of these 3 commands, PVC is then able to determine with near certainty whether the node has truly been forced offline or not, and it can proceed to the next step. Once a node has been marked `dead` and has failed its 6 "saving throws", the fence process triggers an IPMI chassis reset sequence. First, the node is issued an IPMI `chassis power off` command to trigger a cold system shutdown. Next, it waits a fixed 1 second and then checks and logs the current `chassis power state`, and then issues a `chassis power on` signal to start up the node. It then finally waits a fixed 2 seconds, and then checks the current `chassis power status`. Using the results of these 3 commands, PVC is then able to determine with near certainty whether the node has truly been forced offline or not, and it can proceed to the next step.
#### Recovery from Node Fences #### Recovery from Node Fences

View File

@ -133,23 +133,39 @@ def migrateFromFencedNode(zkhandler, node_name, config, logger):
# Perform an IPMI fence # Perform an IPMI fence
# #
def reboot_via_ipmi(ipmi_hostname, ipmi_user, ipmi_password, logger): def reboot_via_ipmi(ipmi_hostname, ipmi_user, ipmi_password, logger):
# Forcibly reboot the node # Power off the node the node
ipmi_command_reset = '/usr/bin/ipmitool -I lanplus -H {} -U {} -P {} chassis power reset'.format( logger.out('Sending power off to dead node', state='i')
ipmi_command_stop = '/usr/bin/ipmitool -I lanplus -H {} -U {} -P {} chassis power off'.format(
ipmi_hostname, ipmi_user, ipmi_password ipmi_hostname, ipmi_user, ipmi_password
) )
ipmi_reset_retcode, ipmi_reset_stdout, ipmi_reset_stderr = common.run_os_command(ipmi_command_reset) ipmi_stop_retcode, ipmi_stop_stdout, ipmi_stop_stderr = common.run_os_command(ipmi_command_stop)
if ipmi_reset_retcode != 0: if ipmi_stop_retcode != 0:
logger.out(f'Failed to reboot dead node: {ipmi_reset_stderr}', state='e') logger.out(f'Failed to power off dead node: {ipmi_stop_stderr}', state='e')
time.sleep(1) time.sleep(1)
# Power on the node (just in case it is offline) # Check the chassis power state
logger.out('Checking power state of dead node', state='i')
ipmi_command_status = '/usr/bin/ipmitool -I lanplus -H {} -U {} -P {} chassis power status'.format(
ipmi_hostname, ipmi_user, ipmi_password
)
ipmi_status_retcode, ipmi_status_stdout, ipmi_status_stderr = common.run_os_command(ipmi_command_status)
if ipmi_status_retcode == 0:
logger.out(f'Current chassis power state is: {ipmi_status_stdout.strip()}', state='i')
else:
logger.out(f'Current chassis power state is: Unknown', state='w')
# Power on the node
logger.out('Sending power on to dead node', state='i')
ipmi_command_start = '/usr/bin/ipmitool -I lanplus -H {} -U {} -P {} chassis power on'.format( ipmi_command_start = '/usr/bin/ipmitool -I lanplus -H {} -U {} -P {} chassis power on'.format(
ipmi_hostname, ipmi_user, ipmi_password ipmi_hostname, ipmi_user, ipmi_password
) )
ipmi_start_retcode, ipmi_start_stdout, ipmi_start_stderr = common.run_os_command(ipmi_command_start) ipmi_start_retcode, ipmi_start_stdout, ipmi_start_stderr = common.run_os_command(ipmi_command_start)
if ipmi_start_retcode != 0:
logger.out(f'Failed to power on dead node: {ipmi_start_stderr}', state='w')
time.sleep(2) time.sleep(2)
# Check the chassis power state # Check the chassis power state
@ -159,7 +175,7 @@ def reboot_via_ipmi(ipmi_hostname, ipmi_user, ipmi_password, logger):
) )
ipmi_status_retcode, ipmi_status_stdout, ipmi_status_stderr = common.run_os_command(ipmi_command_status) ipmi_status_retcode, ipmi_status_stdout, ipmi_status_stderr = common.run_os_command(ipmi_command_status)
if ipmi_reset_retcode == 0: if ipmi_stop_retcode == 0:
if ipmi_status_stdout.strip() == "Chassis Power is on": if ipmi_status_stdout.strip() == "Chassis Power is on":
# We successfully rebooted the node and it is powered on; this is a succeessful fence # We successfully rebooted the node and it is powered on; this is a succeessful fence
logger.out('Successfully rebooted dead node', state='o') logger.out('Successfully rebooted dead node', state='o')