Joshua Boniface
ec2fd99eb6
Avoid errors if noout fails
2023-09-01 15:42:27 -04:00
Joshua Boniface
b9f00e3faf
Increase flush/unflush wait timeout
...
Bump this from 10 minutes (60 * 10 seconds) to 30 minutes (180 * 10
seconds) to ensure there is sufficient time for (relatively) large VMs
to migrate with (relatively) slow networking.
2023-09-01 15:42:26 -04:00
Joshua Boniface
4fe6204dfb
Use wait on secondary and delay for 15 seconds
2023-09-01 15:42:26 -04:00
Joshua Boniface
43d4f69608
Rename Daemon upgrade playbook to match
2023-09-01 15:42:26 -04:00
Joshua Boniface
e55f465034
Reduce timeouts in upgrade playbook
2023-09-01 15:42:26 -04:00
Joshua Boniface
822e39b325
Fix name to be more clear
2023-09-01 15:42:26 -04:00
Joshua Boniface
2d9a5a9d31
Adjust ordering of flush task
2023-09-01 15:42:26 -04:00
Joshua Boniface
1cfbc25f37
Add norestart policy for apt updates
2023-09-01 15:42:25 -04:00
Joshua Boniface
ccc6489512
Add README and daemon upgrade playbook, cleanups
2023-09-01 15:42:25 -04:00
Joshua Boniface
f34b2a5f7e
Add cleanup to update oneshot playbook
2023-09-01 15:42:25 -04:00
Joshua Boniface
0ddd11844e
Reorder Ceph stop and lower some waits
2023-09-01 15:42:25 -04:00
Joshua Boniface
cf609eb609
Add tasks to verify node has finished (un)flushing
2023-09-01 15:42:25 -04:00
Joshua Boniface
c29cdd5305
Increase all wait timeouts to 30s
...
Ensure that even on slow(er) clusters, these timeouts have more time to
complete before proceeding so the task won't fail.
2023-09-01 15:42:24 -04:00
Joshua Boniface
750cb4b55c
Disable pvc-flush service while rebooting
...
Prevents the flush daemon from starting on node boot, before the
playbook is actually ready to unflush the node.
2023-09-01 15:42:24 -04:00
Joshua Boniface
cdc7e3377b
Tweak oneshot script
...
Cleanly stop daemons; check if OSDs are back before continuing; wait
less
2023-09-01 15:42:24 -04:00
Joshua Boniface
9962ceaf0a
Add cluster safe update playbook
...
This playbook will perform a oneshot upgrade of the systems in the
cluster, including performing a clean and safe reboot of the node(s) if
required (either due to services needing a restart, or the kernel
changing). It runs in serial=1 and only reboots if needed.
2023-09-01 15:42:24 -04:00