Commit Graph

30 Commits

Author SHA1 Message Date
Joshua Boniface aa6488854a Add forced upgrade of vhostmd 2023-12-26 01:08:53 -05:00
Joshua Boniface 677287fd2e Add additional wait after stopping OSDs
Allows the Ceph cluster to properly reconcile first.
2023-10-24 10:42:15 -04:00
Joshua Boniface d0bcbf123f Move kernel cleanup to after reboot
Otherwise, modules might fail etc. when the kernel package is purged
before reboot causing odd failures.
2023-10-24 10:41:47 -04:00
Joshua Boniface 7fe682aa60 Handle freshness for all 3 types separately
If microcode was missing, checking the other two would be UNKN and thus
not restart. But, if microcode *is* present, we want to restart for
either of the other two as well.

So separate into 3 distinct checks and restart if any one is changed.
2023-10-24 10:41:47 -04:00
Joshua Boniface 6d05f40242 Fix import for newer Ansible versions 2023-09-18 09:42:01 -04:00
Joshua Boniface 7a2d5ac0c4 Ensure PVC daemons are updated before reboots 2023-09-01 15:42:30 -04:00
Joshua Boniface d3391aa080 Move to new maintenance mode and check legacy first 2023-09-01 15:42:29 -04:00
Joshua Boniface 7e829f04ae Restore unknown state as not-reboot 2023-09-01 15:42:29 -04:00
Joshua Boniface 2c63500011 Split upgrade stage and add dpkg cleanup
Avoid problems if one or more nodes are upgrading libvirt/QEMU and live
migrations fail.
2023-09-01 15:42:29 -04:00
Joshua Boniface 7a0c596281 Add node daemon confirmation before continue 2023-09-01 15:42:29 -04:00
Joshua Boniface 3d4e66471e Trigger restart even with rc=3 2023-09-01 15:42:29 -04:00
Joshua Boniface cbae685b45 Ignore needrestart unknown case 2023-09-01 15:42:29 -04:00
Joshua Boniface 7cac7b26ce Ensure freshness check is proper 2023-09-01 15:42:28 -04:00
Joshua Boniface be091f66d4 Remove pvc-flush references
This service causes more problems than it solves usually, so it is being
removed in the next PVC version.
2023-09-01 15:42:28 -04:00
Joshua Boniface 9e20e47903 Update freshness checks 2023-09-01 15:42:28 -04:00
Joshua Boniface 5de3ab0c3a Move pvc maintenance to separate plays
This ensures that the maintenance on/off happens before all tasks and
after all tasks and not intermittently.
2023-09-01 15:42:28 -04:00
Joshua Boniface 94e9bf9133 Ignore errors during flush commands
These might inexplicably fail, but that is fine.
2023-09-01 15:42:27 -04:00
Joshua Boniface ec2fd99eb6 Avoid errors if noout fails 2023-09-01 15:42:27 -04:00
Joshua Boniface b9f00e3faf Increase flush/unflush wait timeout
Bump this from 10 minutes (60 * 10 seconds) to 30 minutes (180 * 10
seconds) to ensure there is sufficient time for (relatively) large VMs
to migrate with (relatively) slow networking.
2023-09-01 15:42:26 -04:00
Joshua Boniface 4fe6204dfb Use wait on secondary and delay for 15 seconds 2023-09-01 15:42:26 -04:00
Joshua Boniface 2d9a5a9d31 Adjust ordering of flush task 2023-09-01 15:42:26 -04:00
Joshua Boniface 1cfbc25f37 Add norestart policy for apt updates 2023-09-01 15:42:25 -04:00
Joshua Boniface ccc6489512 Add README and daemon upgrade playbook, cleanups 2023-09-01 15:42:25 -04:00
Joshua Boniface f34b2a5f7e Add cleanup to update oneshot playbook 2023-09-01 15:42:25 -04:00
Joshua Boniface 0ddd11844e Reorder Ceph stop and lower some waits 2023-09-01 15:42:25 -04:00
Joshua Boniface cf609eb609 Add tasks to verify node has finished (un)flushing 2023-09-01 15:42:25 -04:00
Joshua Boniface c29cdd5305 Increase all wait timeouts to 30s
Ensure that even on slow(er) clusters, these timeouts have more time to
complete before proceeding so the task won't fail.
2023-09-01 15:42:24 -04:00
Joshua Boniface 750cb4b55c Disable pvc-flush service while rebooting
Prevents the flush daemon from starting on node boot, before the
playbook is actually ready to unflush the node.
2023-09-01 15:42:24 -04:00
Joshua Boniface cdc7e3377b Tweak oneshot script
Cleanly stop daemons; check if OSDs are back before continuing; wait
less
2023-09-01 15:42:24 -04:00
Joshua Boniface 9962ceaf0a Add cluster safe update playbook
This playbook will perform a oneshot upgrade of the systems in the
cluster, including performing a clean and safe reboot of the node(s) if
required (either due to services needing a restart, or the kernel
changing). It runs in serial=1 and only reboots if needed.
2023-09-01 15:42:24 -04:00