Commit Graph

28 Commits

Author SHA1 Message Date
Joshua Boniface d0bcbf123f Move kernel cleanup to after reboot
Otherwise, modules might fail etc. when the kernel package is purged
before reboot causing odd failures.
2023-10-24 10:41:47 -04:00
Joshua Boniface 7fe682aa60 Handle freshness for all 3 types separately
If microcode was missing, checking the other two would be UNKN and thus
not restart. But, if microcode *is* present, we want to restart for
either of the other two as well.

So separate into 3 distinct checks and restart if any one is changed.
2023-10-24 10:41:47 -04:00
Joshua Boniface 6d05f40242 Fix import for newer Ansible versions 2023-09-18 09:42:01 -04:00
Joshua Boniface 7a2d5ac0c4 Ensure PVC daemons are updated before reboots 2023-09-01 15:42:30 -04:00
Joshua Boniface d3391aa080 Move to new maintenance mode and check legacy first 2023-09-01 15:42:29 -04:00
Joshua Boniface 7e829f04ae Restore unknown state as not-reboot 2023-09-01 15:42:29 -04:00
Joshua Boniface 2c63500011 Split upgrade stage and add dpkg cleanup
Avoid problems if one or more nodes are upgrading libvirt/QEMU and live
migrations fail.
2023-09-01 15:42:29 -04:00
Joshua Boniface 7a0c596281 Add node daemon confirmation before continue 2023-09-01 15:42:29 -04:00
Joshua Boniface 3d4e66471e Trigger restart even with rc=3 2023-09-01 15:42:29 -04:00
Joshua Boniface cbae685b45 Ignore needrestart unknown case 2023-09-01 15:42:29 -04:00
Joshua Boniface 7cac7b26ce Ensure freshness check is proper 2023-09-01 15:42:28 -04:00
Joshua Boniface be091f66d4 Remove pvc-flush references
This service causes more problems than it solves usually, so it is being
removed in the next PVC version.
2023-09-01 15:42:28 -04:00
Joshua Boniface 9e20e47903 Update freshness checks 2023-09-01 15:42:28 -04:00
Joshua Boniface 5de3ab0c3a Move pvc maintenance to separate plays
This ensures that the maintenance on/off happens before all tasks and
after all tasks and not intermittently.
2023-09-01 15:42:28 -04:00
Joshua Boniface 94e9bf9133 Ignore errors during flush commands
These might inexplicably fail, but that is fine.
2023-09-01 15:42:27 -04:00
Joshua Boniface ec2fd99eb6 Avoid errors if noout fails 2023-09-01 15:42:27 -04:00
Joshua Boniface b9f00e3faf Increase flush/unflush wait timeout
Bump this from 10 minutes (60 * 10 seconds) to 30 minutes (180 * 10
seconds) to ensure there is sufficient time for (relatively) large VMs
to migrate with (relatively) slow networking.
2023-09-01 15:42:26 -04:00
Joshua Boniface 4fe6204dfb Use wait on secondary and delay for 15 seconds 2023-09-01 15:42:26 -04:00
Joshua Boniface 2d9a5a9d31 Adjust ordering of flush task 2023-09-01 15:42:26 -04:00
Joshua Boniface 1cfbc25f37 Add norestart policy for apt updates 2023-09-01 15:42:25 -04:00
Joshua Boniface ccc6489512 Add README and daemon upgrade playbook, cleanups 2023-09-01 15:42:25 -04:00
Joshua Boniface f34b2a5f7e Add cleanup to update oneshot playbook 2023-09-01 15:42:25 -04:00
Joshua Boniface 0ddd11844e Reorder Ceph stop and lower some waits 2023-09-01 15:42:25 -04:00
Joshua Boniface cf609eb609 Add tasks to verify node has finished (un)flushing 2023-09-01 15:42:25 -04:00
Joshua Boniface c29cdd5305 Increase all wait timeouts to 30s
Ensure that even on slow(er) clusters, these timeouts have more time to
complete before proceeding so the task won't fail.
2023-09-01 15:42:24 -04:00
Joshua Boniface 750cb4b55c Disable pvc-flush service while rebooting
Prevents the flush daemon from starting on node boot, before the
playbook is actually ready to unflush the node.
2023-09-01 15:42:24 -04:00
Joshua Boniface cdc7e3377b Tweak oneshot script
Cleanly stop daemons; check if OSDs are back before continuing; wait
less
2023-09-01 15:42:24 -04:00
Joshua Boniface 9962ceaf0a Add cluster safe update playbook
This playbook will perform a oneshot upgrade of the systems in the
cluster, including performing a clean and safe reboot of the node(s) if
required (either due to services needing a restart, or the kernel
changing). It runs in serial=1 and only reboots if needed.
2023-09-01 15:42:24 -04:00