Compare commits

..

2 Commits

Author SHA1 Message Date
8832a81fa7 Correct spelling errors 2023-09-21 22:50:53 -04:00
09b485988a Fix spelling errors 2023-09-21 22:50:02 -04:00
2 changed files with 12 additions and 12 deletions

View File

@ -12,7 +12,7 @@ You can also view a video demonstration of the fencing process in action here:
## Overview
Fencing in PVC provides a mechanism for a cluster's nodes to determine if one of their active (`run` state) peers has stopped responding, take action to ensure the failed node is fully powercycled, and then, if successful, automatically bring up affected VMs from the dead node onto others awaiting its return to service.
Fencing in PVC provides a mechanism for a cluster's nodes to determine if one of their active (`run` state) peers has stopped responding, take action to ensure the failed node is fully power-cycled, and then, if successful, automatically bring up affected VMs from the dead node onto others awaiting its return to service.
Properly configured fencing can thus help ensure the maximum uptime for VMs in the case of a faulty node.
@ -25,8 +25,8 @@ Fencing can be temporarily disabled by setting the cluster maintenance mode to `
For fencing to be enabled, several configurations must be correctly set.
* The node must have a proper IPMI interface, as detailed in the [Hardware Requirements](hardware-requirements.md#ipmilights-out-management) documentation.
* The IPMI interface must be either in the [cluster "upstream" network](cluster-architecture.md#upstream), or in another network reachable by it. The former is strongly recommended, because the latter is potentially susceptable to network faults in the routing between the networks which might cause fencing to fail in otherwise valid scenarios.
* The IPMI BMC must be configured with an `Administrator`-level user with IPMI-over-LAN privilieges enabled.
* The IPMI interface must be either in the [cluster "upstream" network](cluster-architecture.md#upstream), or in another network reachable by it. The former is strongly recommended, because the latter is potentially susceptible to network faults in the routing between the networks which might cause fencing to fail in otherwise valid scenarios.
* The IPMI BMC must be configured with an `Administrator`-level user with IPMI-over-LAN privileges enabled.
* The IPMI interface (IP or hostname) and aforementioned user of each node must be configured in the `fencing` -> `ipmi` section of the `pvcnoded.yaml` file of that node.
PVC will automatically check the reachability of its IPMI and its functionality early during node startup. The functionality can also be tested via the `ipmitool -I lanplus` command from a node.
@ -39,7 +39,7 @@ The [PVC Ansible framework](../deployment/getting-started.md) will automatically
Node fencing is handled during regular node keepalive events. Keepalives occur every 5 seconds (default `keepalive_interval`), during which each node checks into the cluster by providing the current UNIX epoch timestamp in a configuration key.
At the end of each keepalive event, all nodes check their peers' timestamps and compare them against the current time. If the peers detect that a node in `run` daemon state has not checked in for 6 intervals (default `fence_intervals`), or 30 seconds by default, one node at random will begin the fencing process as the watching node. First, a timer is started for 6 more `keepalive_intervals` (hardcoded), during which a checkin from the dead node will cancel the fence (a "saving throw").
At the end of each keepalive event, all nodes check their peers' timestamps and compare them against the current time. If the peers detect that a node in `run` daemon state has not checked in for 6 intervals (default `fence_intervals`), or 30 seconds by default, one node at random will begin the fencing process as the watching node. First, a timer is started for 6 more `keepalive_intervals` (hard-coded), during which a check-in from the dead node will cancel the fence (a "saving throw").
### Dead Node Fencing
@ -58,7 +58,7 @@ With these 6 steps and the 2 saved results of the `chassis power state`, PVC can
Once a dead node has been successfully fenced and at least 1 more `keepalive_interval` has passed, the watching node will begin fencing recovery.
What action is taken during fencing recovery is depdendent on the `successful_fence` configuration key, which can either be `migrate`, which will perform the below steps, or `none` which will perform no recovery action and stop here.
What action is taken during fencing recovery is dependent on the `successful_fence` configuration key, which can either be `migrate`, which will perform the below steps, or `none` which will perform no recovery action and stop here.
First, the node is put into a special `fencing-flush` domain state, to indicate that it is undergoing a forced flush after fencing. Then, for each VM which was running on the dead node:
@ -78,9 +78,9 @@ As an alternative to remote fencing, nodes can be configured to kill themselves
## Valid Fencing Conditions
The conditions in which a node can be successfully fenced are limited, and thus, autorecovery is limited only to those situations where a fence can succeed. In short, any situation whereby a node's OS is not responding normally, but its IPMI interface is still up and available, should succeed in a fence; in contrast, those where the IPMI interface is also unavailable will fail.
The conditions in which a node can be successfully fenced are limited, and thus, auto-recovery is limited only to those situations where a fence can succeed. In short, any situation whereby a node's OS is not responding normally, but its IPMI interface is still up and available, should succeed in a fence; in contrast, those where the IPMI interface is also unavailable will fail.
The following table covers some common scenarios, and whether fencing (and subsequent automatic recovery) can be exepected to occur.
The following table covers some common scenarios, and whether fencing (and subsequent automatic recovery) can be expected to occur.
| Situation | Fence? | Notes |
| --------- | --------------------- | ----- |
@ -98,4 +98,4 @@ Care should be taken to understand these scenarios and which situations can be r
## Future Development
Future versions of PVC may add support for additional fencing modes, for instance the ability for a fence to trigger a remote power device (switched PDU, etc.) or to detect more esoteric situations with the node power state via IPMI, as need requires. The author however believes that the current implementation satisfies the vast majority of potential situations for which autorecovery is beneficial and thus such work would not see much benefit, though he is open to changing his mind.
Future versions of PVC may add support for additional fencing modes, for instance the ability for a fence to trigger a remote power device (switched PDU, etc.) or to detect more esoteric situations with the node power state via IPMI, as need requires. The author however believes that the current implementation satisfies the vast majority of potential situations for which auto-recovery is beneficial and thus such work would not see much benefit, though he is open to changing his mind.

View File

@ -32,11 +32,11 @@ Since hypervisors are not affected by nor affect the quorum, any number can be p
## Fencing
PVC's [fencing mechanism](fencing.md) relies entirely on network access. First, network access is required for a node to updte its keepalives to the other nodes via Zookeeper. Second, IPMI out-of-band connectivity is required for the remaining nodes to fence a dead node.
PVC's [fencing mechanism](fencing.md) relies entirely on network access. First, network access is required for a node to update its keepalives to the other nodes via Zookeeper. Second, IPMI out-of-band connectivity is required for the remaining nodes to fence a dead node.
Georedundancy introduces significant complications to this process. First, it makes network cuts more likely, as the cut can occur outside of a single physical location. Second, the nature of the cut means, that without backup connectivity for the IPMI functionality, any fencing attempt would fail, thus preventing automatic recovery of VMs on the cut site.
Thus, in this design, several normally-possible recovert situations become harder. This does not preclude georedundancy in the author's opinion, but is something that must be more carefully considered, especially in network design.
Thus, in this design, several normally-possible recovery situations become harder. This does not preclude georedundancy in the author's opinion, but is something that must be more carefully considered, especially in network design.
## Network Speed
@ -48,10 +48,10 @@ The storage write process for PVC is heavily dependent on network latency. To ex
[![Ceph Write Process](images/pvc-ceph-write-process.png)](images/pvc-ceph-write-process.png)
As illustrated in this diagram, a write will only be accepted by the client once it has been successfully written to at least `min_copies` OSDs, as defined by the pool replication level (usually 2). Thus, the latency of network communications between the client and another node becomes a major factor in storage performance for writes, as the write cannot complete without at least 4x this latency (send, ack, recieve, ack). Significant physical distances and thus latencies (more than about 3ms) begin to introduce performance degredation, and latencies above about 5-10ms can result in a significant drop in wrie performance.
As illustrated in this diagram, a write will only be accepted by the client once it has been successfully written to at least `min_copies` OSDs, as defined by the pool replication level (usually 2). Thus, the latency of network communications between the client and another node becomes a major factor in storage performance for writes, as the write cannot complete without at least 4x this latency (send, ack, receive, ack). Significant physical distances and thus latencies (more than about 3ms) begin to introduce performance degradation, and latencies above about 5-10ms can result in a significant drop in write performance.
To combat this, georedundant nodes should be as close as possible, both geographically and physically.
## Suggested Scenarios
The author cannot cover every possible option, but georedundancy must be very carefully considered. Ultimately, PVC is designed to ease several very common failure modes (matenance outages, hardware failures, etc.); while rarer ones (fire, flood, meteor strike, etc.) might be possible to mitigate as well, these are not primary considerations in the design of PVC. It is thus up to each cluster administrator to define the correct balance between failure modes, risk, liability, and mitigations (e.g. remote backups) to best account for their particular needs.
The author cannot cover every possible option, but georedundancy must be very carefully considered. Ultimately, PVC is designed to ease several very common failure modes (maintenance outages, hardware failures, etc.); while rarer ones (fire, flood, meteor strike, etc.) might be possible to mitigate as well, these are not primary considerations in the design of PVC. It is thus up to each cluster administrator to define the correct balance between failure modes, risk, liability, and mitigations (e.g. remote backups) to best account for their particular needs.