Joshua Boniface
e26a2f3ca5
Update aptitude -> apt references
2024-06-29 01:33:36 -04:00
Joshua Boniface
aa6488854a
Add forced upgrade of vhostmd
2023-12-26 01:08:53 -05:00
Joshua Boniface
677287fd2e
Add additional wait after stopping OSDs
...
Allows the Ceph cluster to properly reconcile first.
2023-10-24 10:42:15 -04:00
Joshua Boniface
d0bcbf123f
Move kernel cleanup to after reboot
...
Otherwise, modules might fail etc. when the kernel package is purged
before reboot causing odd failures.
2023-10-24 10:41:47 -04:00
Joshua Boniface
7fe682aa60
Handle freshness for all 3 types separately
...
If microcode was missing, checking the other two would be UNKN and thus
not restart. But, if microcode *is* present, we want to restart for
either of the other two as well.
So separate into 3 distinct checks and restart if any one is changed.
2023-10-24 10:41:47 -04:00
Joshua Boniface
6d05f40242
Fix import for newer Ansible versions
2023-09-18 09:42:01 -04:00
Joshua Boniface
7a2d5ac0c4
Ensure PVC daemons are updated before reboots
2023-09-01 15:42:30 -04:00
Joshua Boniface
d3391aa080
Move to new maintenance mode and check legacy first
2023-09-01 15:42:29 -04:00
Joshua Boniface
7e829f04ae
Restore unknown state as not-reboot
2023-09-01 15:42:29 -04:00
Joshua Boniface
2c63500011
Split upgrade stage and add dpkg cleanup
...
Avoid problems if one or more nodes are upgrading libvirt/QEMU and live
migrations fail.
2023-09-01 15:42:29 -04:00
Joshua Boniface
7a0c596281
Add node daemon confirmation before continue
2023-09-01 15:42:29 -04:00
Joshua Boniface
3d4e66471e
Trigger restart even with rc=3
2023-09-01 15:42:29 -04:00
Joshua Boniface
cbae685b45
Ignore needrestart unknown case
2023-09-01 15:42:29 -04:00
Joshua Boniface
7cac7b26ce
Ensure freshness check is proper
2023-09-01 15:42:28 -04:00
Joshua Boniface
be091f66d4
Remove pvc-flush references
...
This service causes more problems than it solves usually, so it is being
removed in the next PVC version.
2023-09-01 15:42:28 -04:00
Joshua Boniface
9e20e47903
Update freshness checks
2023-09-01 15:42:28 -04:00
Joshua Boniface
5de3ab0c3a
Move pvc maintenance to separate plays
...
This ensures that the maintenance on/off happens before all tasks and
after all tasks and not intermittently.
2023-09-01 15:42:28 -04:00
Joshua Boniface
94e9bf9133
Ignore errors during flush commands
...
These might inexplicably fail, but that is fine.
2023-09-01 15:42:27 -04:00
Joshua Boniface
ec2fd99eb6
Avoid errors if noout fails
2023-09-01 15:42:27 -04:00
Joshua Boniface
b9f00e3faf
Increase flush/unflush wait timeout
...
Bump this from 10 minutes (60 * 10 seconds) to 30 minutes (180 * 10
seconds) to ensure there is sufficient time for (relatively) large VMs
to migrate with (relatively) slow networking.
2023-09-01 15:42:26 -04:00
Joshua Boniface
4fe6204dfb
Use wait on secondary and delay for 15 seconds
2023-09-01 15:42:26 -04:00
Joshua Boniface
2d9a5a9d31
Adjust ordering of flush task
2023-09-01 15:42:26 -04:00
Joshua Boniface
1cfbc25f37
Add norestart policy for apt updates
2023-09-01 15:42:25 -04:00
Joshua Boniface
ccc6489512
Add README and daemon upgrade playbook, cleanups
2023-09-01 15:42:25 -04:00
Joshua Boniface
f34b2a5f7e
Add cleanup to update oneshot playbook
2023-09-01 15:42:25 -04:00
Joshua Boniface
0ddd11844e
Reorder Ceph stop and lower some waits
2023-09-01 15:42:25 -04:00
Joshua Boniface
cf609eb609
Add tasks to verify node has finished (un)flushing
2023-09-01 15:42:25 -04:00
Joshua Boniface
c29cdd5305
Increase all wait timeouts to 30s
...
Ensure that even on slow(er) clusters, these timeouts have more time to
complete before proceeding so the task won't fail.
2023-09-01 15:42:24 -04:00
Joshua Boniface
750cb4b55c
Disable pvc-flush service while rebooting
...
Prevents the flush daemon from starting on node boot, before the
playbook is actually ready to unflush the node.
2023-09-01 15:42:24 -04:00
Joshua Boniface
cdc7e3377b
Tweak oneshot script
...
Cleanly stop daemons; check if OSDs are back before continuing; wait
less
2023-09-01 15:42:24 -04:00
Joshua Boniface
9962ceaf0a
Add cluster safe update playbook
...
This playbook will perform a oneshot upgrade of the systems in the
cluster, including performing a clean and safe reboot of the node(s) if
required (either due to services needing a restart, or the kernel
changing). It runs in serial=1 and only reboots if needed.
2023-09-01 15:42:24 -04:00