Joshua Boniface
15fc3261de
Ensures that configurations are always updated whenever the daemons are. This will be necessary for 0.9.83 with the fundamental change from pvcXd.yaml to pvc.conf configuration formats, while also ensuring that future daemon updates also include any configuration changes that may be pending in the group_vars. |
||
---|---|---|
.. | ||
README.md | ||
reboot-pvc-cluster.yml | ||
roles | ||
update-pvc-cluster.yml | ||
update-pvc-daemons.yml | ||
upgrade-pvc-cluster_deb11.yml | ||
upgrade-pvc-cluster_deb12.yml |
README.md
PVC Oneshot Playbooks
This directory contains playbooks to assist in automating day-to-day maintenance of a PVC cluster. These playbooks can be used independent of the main pvc.yml
and roles setup to automate tasks.
update-pvc-cluster.yml
and reboot-pvc-cluster.yml
This playbook performs a sequential full upgrade on all nodes in a PVC cluster.
The reboot-pvc-cluster.yml
does the same shutdown and restart steps as update-pvc-cluster.yml
, but forced for all hosts, and without the update part.
Running the Playbook
$ ansible-playbook -i hosts -l [cluster] update-pvc-cluster.yml
Caveats, Warnings and Notes
-
Ensure the cluster is in Optimal health before executing this playbook; all nodes should be up and reachable and operating normally
-
Be prepared to intervene if step 9 times out; OOB access may be required
-
This playbook is safe to run against a given host multiple times (e.g. to rerun after a failure); if a reboot is not required, it will not be performed
Process and Steps
For each host in the cluster sequentially, do:
-
Enable cluster maintenance mode
-
Perform a full apt update, upgrade, autoremove, and clean
-
Clean up obsolete kernels (
kernel-cleanup.sh
), packages/updated configuration files (dpkg-cleanup.sh
), and the apt archive -
Verify library freshness and kernel version; if these produce no warnings, go to step 14 (skip reboot)
-
Secondary the node, then wait 30 seconds
-
Flush the node, wait for all VMs to migrate, then wait 15 seconds
-
Stop the
pvcnoded
andzookeeper
daemons, then wait 15 seconds -
Set Ceph OSD
noout
and stop all Ceph OSD, monitor, and manager processes, then wait 30 seconds -
Reboot the system and wait for it to come back up (maximum wait time of 1800 seconds)
-
Ensure all OSDs become active and all PGs recover, then unset Ceph OSD
noout
-
Unflush the node, wait for all VMs to migrate, then wait 30 seconds
-
Reset any systemd failures
-
Disable cluster maintenance mode, then wait 30 seconds
upgrade-pvc-daemons.yml
This playbook performs a sequential upgrade of the PVC software daemons via apt on all nodes in a PVC cluster. This is a less invasive update process than the update-pvc-cluster.yml
playbook as it does not flush or reboot the nodes, but does restart all PVC daemons (pvcnoded
, pvcapid
, and pvcworkerd
).
Running the Playbook
$ ansible-playbook -i hosts -l [cluster] upgrade-pvc-daemons.yml
Caveats, Warnings, and Notes
-
Ensure the cluster is in Optimal health before executing this playbook; all nodes should be up and reachable and operating normally
-
This playbook is safe to run against a given host multiple times; if service restarts are not required, they will not be performed
-
This playbook should only be used in exceptional circumstances when performing a full
update-pvc-cluster.yml
would be too disruptive; it is always preferable to update all packages and reboot the nodes instead
Process and Steps
For each node in the cluster sequentially, do:
-
Enable cluster maintenance mode
-
Perform an apt update, and install the 4 PVC packages (
pvc-client-cli
,pvc-daemon-common
,pvc-daemon-api
,pvc-daemon-node
) -
Clean up the apt archive
-
Verify if packages changed; if not, go to step 8 (skip restarts)
-
Secondary the node, then wait 30 seconds
-
Restart both active PVC daemons (
pvcworkerd
,pvcnoded
), then wait 60 seconds; since the node is not the primary coordinator,pvcapid
will not be running -
Verify daemons are running
-
Disble cluster maintenance mode, then wait 30 seconds