Commit Graph

13 Commits

Author SHA1 Message Date
Joshua Boniface ec2fd99eb6 Avoid errors if noout fails 2023-09-01 15:42:27 -04:00
Joshua Boniface b9f00e3faf Increase flush/unflush wait timeout
Bump this from 10 minutes (60 * 10 seconds) to 30 minutes (180 * 10
seconds) to ensure there is sufficient time for (relatively) large VMs
to migrate with (relatively) slow networking.
2023-09-01 15:42:26 -04:00
Joshua Boniface 4fe6204dfb Use wait on secondary and delay for 15 seconds 2023-09-01 15:42:26 -04:00
Joshua Boniface 2d9a5a9d31 Adjust ordering of flush task 2023-09-01 15:42:26 -04:00
Joshua Boniface 1cfbc25f37 Add norestart policy for apt updates 2023-09-01 15:42:25 -04:00
Joshua Boniface ccc6489512 Add README and daemon upgrade playbook, cleanups 2023-09-01 15:42:25 -04:00
Joshua Boniface f34b2a5f7e Add cleanup to update oneshot playbook 2023-09-01 15:42:25 -04:00
Joshua Boniface 0ddd11844e Reorder Ceph stop and lower some waits 2023-09-01 15:42:25 -04:00
Joshua Boniface cf609eb609 Add tasks to verify node has finished (un)flushing 2023-09-01 15:42:25 -04:00
Joshua Boniface c29cdd5305 Increase all wait timeouts to 30s
Ensure that even on slow(er) clusters, these timeouts have more time to
complete before proceeding so the task won't fail.
2023-09-01 15:42:24 -04:00
Joshua Boniface 750cb4b55c Disable pvc-flush service while rebooting
Prevents the flush daemon from starting on node boot, before the
playbook is actually ready to unflush the node.
2023-09-01 15:42:24 -04:00
Joshua Boniface cdc7e3377b Tweak oneshot script
Cleanly stop daemons; check if OSDs are back before continuing; wait
less
2023-09-01 15:42:24 -04:00
Joshua Boniface 9962ceaf0a Add cluster safe update playbook
This playbook will perform a oneshot upgrade of the systems in the
cluster, including performing a clean and safe reboot of the node(s) if
required (either due to services needing a restart, or the kernel
changing). It runs in serial=1 and only reboots if needed.
2023-09-01 15:42:24 -04:00