Joshua Boniface
7fe682aa60
If microcode was missing, checking the other two would be UNKN and thus not restart. But, if microcode *is* present, we want to restart for either of the other two as well. So separate into 3 distinct checks and restart if any one is changed. |
||
---|---|---|
.. | ||
README.md | ||
reboot-pvc-cluster.yml | ||
roles | ||
update-pvc-cluster.yml | ||
update-pvc-daemons.yml | ||
upgrade-pvc-cluster_deb11.yml | ||
upgrade-pvc-cluster_deb12.yml |
README.md
PVC Oneshot Playbooks
This directory contains playbooks to assist in automating day-to-day maintenance of a PVC cluster. These playbooks can be used independent of the main pvc.yml
and roles setup to automate tasks.
update-pvc-cluster.yml
and reboot-pvc-cluster.yml
This playbook performs a sequential full upgrade on all nodes in a PVC cluster.
The reboot-pvc-cluster.yml
does the same shutdown and restart steps as update-pvc-cluster.yml
, but forced for all hosts, and without the update part.
Running the Playbook
$ ansible-playbook -i hosts -l [cluster] update-pvc-cluster.yml
Caveats, Warnings and Notes
-
Ensure the cluster is in Optimal health before executing this playbook; all nodes should be up and reachable and operating normally
-
Be prepared to intervene if step 9 times out; OOB access may be required
-
This playbook is safe to run against a given host multiple times (e.g. to rerun after a failure); if a reboot is not required, it will not be performed
Process and Steps
For each host in the cluster sequentially, do:
-
Enable cluster maintenance mode
-
Perform a full apt update, upgrade, autoremove, and clean
-
Clean up obsolete kernels (
kernel-cleanup.sh
), packages/updated configuration files (dpkg-cleanup.sh
), and the apt archive -
Verify library freshness and kernel version; if these produce no warnings, go to step 14 (skip reboot)
-
Secondary the node, then wait 30 seconds
-
Flush the node, wait for all VMs to migrate, then wait 15 seconds
-
Stop the
pvcnoded
andzookeeper
daemons, then wait 15 seconds -
Set Ceph OSD
noout
and stop all Ceph OSD, monitor, and manager processes, then wait 30 seconds -
Reboot the system and wait for it to come back up (maximum wait time of 1800 seconds)
-
Ensure all OSDs become active and all PGs recover, then unset Ceph OSD
noout
-
Unflush the node, wait for all VMs to migrate, then wait 30 seconds
-
Reset any systemd failures
-
Disable cluster maintenance mode, then wait 30 seconds
upgrade-pvc-daemons.yml
This playbook performs a sequential upgrade of the PVC software daemons via apt on all nodes in a PVC cluster. This is a less invasive update process than the update-pvc-cluster.yml
playbook as it does not flush or reboot the nodes, but does restart all PVC daemons (pvcnoded
, pvcapid
, and pvcapid-worker
).
Running the Playbook
$ ansible-playbook -i hosts -l [cluster] upgrade-pvc-daemons.yml
Caveats, Warnings, and Notes
-
Ensure the cluster is in Optimal health before executing this playbook; all nodes should be up and reachable and operating normally
-
This playbook is safe to run against a given host multiple times; if service restarts are not required, they will not be performed
-
This playbook should only be used in exceptional circumstances when performing a full
update-pvc-cluster.yml
would be too disruptive; it is always preferable to update all packages and reboot the nodes instead
Process and Steps
For each node in the cluster sequentially, do:
-
Enable cluster maintenance mode
-
Perform an apt update, and install the 4 PVC packages (
pvc-client-cli
,pvc-daemon-common
,pvc-daemon-api
,pvc-daemon-node
) -
Clean up the apt archive
-
Verify if packages changed; if not, go to step 8 (skip restarts)
-
Secondary the node, then wait 30 seconds
-
Restart both active PVC daemons (
pvcapid-worker
,pvcnoded
), then wait 60 seconds; since the node is not the primary coordinator,pvcapid
will not be running -
Verify daemons are running
-
Disble cluster maintenance mode, then wait 30 seconds