Commit Graph

36 Commits

Author SHA1 Message Date
Joshua Boniface d3391aa080 Move to new maintenance mode and check legacy first 2023-09-01 15:42:29 -04:00
Joshua Boniface 1f9a74301f Alter deb12 upgrade 2023-09-01 15:42:29 -04:00
Joshua Boniface 642813e4e3 Remove obsolete cset configurations 2023-09-01 15:42:29 -04:00
Joshua Boniface e3c1d28674 Add upgrade to Debian 12 playbook 2023-09-01 15:42:29 -04:00
Joshua Boniface 7e829f04ae Restore unknown state as not-reboot 2023-09-01 15:42:29 -04:00
Joshua Boniface 2c63500011 Split upgrade stage and add dpkg cleanup
Avoid problems if one or more nodes are upgrading libvirt/QEMU and live
migrations fail.
2023-09-01 15:42:29 -04:00
Joshua Boniface 7a0c596281 Add node daemon confirmation before continue 2023-09-01 15:42:29 -04:00
Joshua Boniface 3d4e66471e Trigger restart even with rc=3 2023-09-01 15:42:29 -04:00
Joshua Boniface cbae685b45 Ignore needrestart unknown case 2023-09-01 15:42:29 -04:00
Joshua Boniface 7cac7b26ce Ensure freshness check is proper 2023-09-01 15:42:28 -04:00
Joshua Boniface be091f66d4 Remove pvc-flush references
This service causes more problems than it solves usually, so it is being
removed in the next PVC version.
2023-09-01 15:42:28 -04:00
Joshua Boniface 5aeca53212 Add PostgreSQL cleanup to upgrade 2023-09-01 15:42:28 -04:00
Joshua Boniface d8cad85e91 Add oneshot playbook to reboot cluster 2023-09-01 15:42:28 -04:00
Joshua Boniface 9e20e47903 Update freshness checks 2023-09-01 15:42:28 -04:00
Joshua Boniface 5de3ab0c3a Move pvc maintenance to separate plays
This ensures that the maintenance on/off happens before all tasks and
after all tasks and not intermittently.
2023-09-01 15:42:28 -04:00
Joshua Boniface 94e9bf9133 Ignore errors during flush commands
These might inexplicably fail, but that is fine.
2023-09-01 15:42:27 -04:00
Joshua Boniface 197000a48d Revert "Add symlink for Ceph file pickup"
This reverts commit 3ac946bf2e.
2023-09-01 15:42:27 -04:00
Joshua Boniface 94c019f960 Add symlink for Ceph file pickup 2023-09-01 15:42:27 -04:00
Joshua Boniface 3161708593 Include another upgrade in deb11 playbook
Ensures that the system is fully updated after re-enabling the security
repository during the base run.
2023-09-01 15:42:27 -04:00
Joshua Boniface 593599c148 Add Debian 10 -> Debian 11 upgrade playbook 2023-09-01 15:42:27 -04:00
Joshua Boniface ec2fd99eb6 Avoid errors if noout fails 2023-09-01 15:42:27 -04:00
Joshua Boniface b9f00e3faf Increase flush/unflush wait timeout
Bump this from 10 minutes (60 * 10 seconds) to 30 minutes (180 * 10
seconds) to ensure there is sufficient time for (relatively) large VMs
to migrate with (relatively) slow networking.
2023-09-01 15:42:26 -04:00
Joshua Boniface 4fe6204dfb Use wait on secondary and delay for 15 seconds 2023-09-01 15:42:26 -04:00
Joshua Boniface 43d4f69608 Rename Daemon upgrade playbook to match 2023-09-01 15:42:26 -04:00
Joshua Boniface e55f465034 Reduce timeouts in upgrade playbook 2023-09-01 15:42:26 -04:00
Joshua Boniface 822e39b325 Fix name to be more clear 2023-09-01 15:42:26 -04:00
Joshua Boniface 2d9a5a9d31 Adjust ordering of flush task 2023-09-01 15:42:26 -04:00
Joshua Boniface 1cfbc25f37 Add norestart policy for apt updates 2023-09-01 15:42:25 -04:00
Joshua Boniface ccc6489512 Add README and daemon upgrade playbook, cleanups 2023-09-01 15:42:25 -04:00
Joshua Boniface f34b2a5f7e Add cleanup to update oneshot playbook 2023-09-01 15:42:25 -04:00
Joshua Boniface 0ddd11844e Reorder Ceph stop and lower some waits 2023-09-01 15:42:25 -04:00
Joshua Boniface cf609eb609 Add tasks to verify node has finished (un)flushing 2023-09-01 15:42:25 -04:00
Joshua Boniface c29cdd5305 Increase all wait timeouts to 30s
Ensure that even on slow(er) clusters, these timeouts have more time to
complete before proceeding so the task won't fail.
2023-09-01 15:42:24 -04:00
Joshua Boniface 750cb4b55c Disable pvc-flush service while rebooting
Prevents the flush daemon from starting on node boot, before the
playbook is actually ready to unflush the node.
2023-09-01 15:42:24 -04:00
Joshua Boniface cdc7e3377b Tweak oneshot script
Cleanly stop daemons; check if OSDs are back before continuing; wait
less
2023-09-01 15:42:24 -04:00
Joshua Boniface 9962ceaf0a Add cluster safe update playbook
This playbook will perform a oneshot upgrade of the systems in the
cluster, including performing a clean and safe reboot of the node(s) if
required (either due to services needing a restart, or the kernel
changing). It runs in serial=1 and only reboots if needed.
2023-09-01 15:42:24 -04:00