Files

Joshua M. Boniface 2c80c187c3 Include another upgrade in deb11 playbook

Ensures that the system is fully updated after re-enabling the security
repository during the base run.

2021-10-10 05:10:57 -04:00

README.md

Add README and daemon upgrade playbook, cleanups

2021-05-20 11:02:47 -04:00

roles

Add Debian 10 -> Debian 11 upgrade playbook

2021-10-10 05:07:56 -04:00

update-pvc-cluster.yml

Avoid errors if noout fails

2021-10-07 16:31:52 -04:00

update-pvc-daemons.yml

Rename Daemon upgrade playbook to match

2021-07-22 09:34:26 -04:00

upgrade-pvc-cluster_deb11.yml

Include another upgrade in deb11 playbook

2021-10-10 05:10:57 -04:00

README.md

PVC Oneshot Playbooks

This directory contains playbooks to assist in automating day-to-day maintenance of a PVC cluster. These playbooks can be used independent of the main pvc.yml and roles setup to automate tasks.

`update-pvc-cluster.yml`

This playbook performs a sequential full upgrade on all nodes in a PVC cluster.

Running the Playbook

$ ansible-playbook -i hosts -l [cluster] update-pvc-cluster.yml

Caveats, Warnings and Notes

Ensure the cluster is in Optimal health before executing this playbook; all nodes should be up and reachable and operating normally
Be prepared to intervene if step 9 times out; OOB access may be required
This playbook is safe to run against a given host multiple times (e.g. to rerun after a failure); if a reboot is not required, it will not be performed

Process and Steps

For each host in the cluster sequentially, do:

Enable cluster maintenance mode
Perform a full apt update, upgrade, autoremove, and clean
Clean up obsolete kernels (kernel-cleanup.sh), packages/updated configuration files (dpkg-cleanup.sh), and the apt archive
Verify library freshness and kernel version; if these produce no warnings, go to step 14 (skip reboot)
Secondary the node, then wait 30 seconds
Flush the node, wait for all VMs to migrate, then wait 15 seconds
Stop and disable the pvc-flush daemon, stop the pvcnoded and zookeeper daemons, then wait 15 seconds
Set Ceph OSD noout and stop all Ceph OSD, monitor, and manager processes, then wait 30 seconds
Reboot the system and wait for it to come back up (maximum wait time of 1800 seconds)
Ensure all OSDs become active and all PGs recover, then unset Ceph OSD noout
Unflush the node, wait for all VMs to migrate, then wait 30 seconds
Start and enable the pvc-flush daemon
Reset any systemd failures
Disable cluster maintenance mode, then wait 30 seconds

`upgrade-pvc-daemons.yml`

This playbook performs a sequential upgrade of the PVC software daemons via apt on all nodes in a PVC cluster. This is a less invasive update process than the update-pvc-cluster.yml playbook as it does not flush or reboot the nodes, but does restart all PVC daemons (pvcnoded, pvcapid, and pvcapid-worker).

Running the Playbook

$ ansible-playbook -i hosts -l [cluster] upgrade-pvc-daemons.yml

Caveats, Warnings, and Notes

Ensure the cluster is in Optimal health before executing this playbook; all nodes should be up and reachable and operating normally
This playbook is safe to run against a given host multiple times; if service restarts are not required, they will not be performed
This playbook should only be used in exceptional circumstances when performing a full update-pvc-cluster.yml would be too disruptive; it is always preferable to update all packages and reboot the nodes instead

Process and Steps

For each node in the cluster sequentially, do:

Enable cluster maintenance mode
Perform an apt update, and install the 4 PVC packages (pvc-client-cli, pvc-daemon-common, pvc-daemon-api, pvc-daemon-node)
Clean up the apt archive
Verify if packages changed; if not, go to step 8 (skip restarts)
Secondary the node, then wait 30 seconds
Restart both active PVC daemons (pvcapid-worker, pvcnoded), then wait 60 seconds; since the node is not the primary coordinator, pvcapid will not be running
Verify daemons are running
Disble cluster maintenance mode, then wait 30 seconds