f00b43f20f
Add extra waits before unsetting maintenance
...
Avoids issues after restarting the API.
2024-08-28 12:42:01 -04:00
e26a2f3ca5
Update aptitude -> apt references
2024-06-29 01:33:36 -04:00
aa6488854a
Add forced upgrade of vhostmd
2023-12-26 01:08:53 -05:00
677287fd2e
Add additional wait after stopping OSDs
...
Allows the Ceph cluster to properly reconcile first.
2023-10-24 10:42:15 -04:00
d0bcbf123f
Move kernel cleanup to after reboot
...
Otherwise, modules might fail etc. when the kernel package is purged
before reboot causing odd failures.
2023-10-24 10:41:47 -04:00
7fe682aa60
Handle freshness for all 3 types separately
...
If microcode was missing, checking the other two would be UNKN and thus
not restart. But, if microcode *is* present, we want to restart for
either of the other two as well.
So separate into 3 distinct checks and restart if any one is changed.
2023-10-24 10:41:47 -04:00
6d05f40242
Fix import for newer Ansible versions
2023-09-18 09:42:01 -04:00
7a2d5ac0c4
Ensure PVC daemons are updated before reboots
2023-09-01 15:42:30 -04:00
d3391aa080
Move to new maintenance mode and check legacy first
2023-09-01 15:42:29 -04:00
7e829f04ae
Restore unknown state as not-reboot
2023-09-01 15:42:29 -04:00
2c63500011
Split upgrade stage and add dpkg cleanup
...
Avoid problems if one or more nodes are upgrading libvirt/QEMU and live
migrations fail.
2023-09-01 15:42:29 -04:00
7a0c596281
Add node daemon confirmation before continue
2023-09-01 15:42:29 -04:00
3d4e66471e
Trigger restart even with rc=3
2023-09-01 15:42:29 -04:00
cbae685b45
Ignore needrestart unknown case
2023-09-01 15:42:29 -04:00
7cac7b26ce
Ensure freshness check is proper
2023-09-01 15:42:28 -04:00
be091f66d4
Remove pvc-flush references
...
This service causes more problems than it solves usually, so it is being
removed in the next PVC version.
2023-09-01 15:42:28 -04:00
9e20e47903
Update freshness checks
2023-09-01 15:42:28 -04:00
5de3ab0c3a
Move pvc maintenance to separate plays
...
This ensures that the maintenance on/off happens before all tasks and
after all tasks and not intermittently.
2023-09-01 15:42:28 -04:00
94e9bf9133
Ignore errors during flush commands
...
These might inexplicably fail, but that is fine.
2023-09-01 15:42:27 -04:00
ec2fd99eb6
Avoid errors if noout fails
2023-09-01 15:42:27 -04:00
b9f00e3faf
Increase flush/unflush wait timeout
...
Bump this from 10 minutes (60 * 10 seconds) to 30 minutes (180 * 10
seconds) to ensure there is sufficient time for (relatively) large VMs
to migrate with (relatively) slow networking.
2023-09-01 15:42:26 -04:00
4fe6204dfb
Use wait on secondary and delay for 15 seconds
2023-09-01 15:42:26 -04:00
2d9a5a9d31
Adjust ordering of flush task
2023-09-01 15:42:26 -04:00
1cfbc25f37
Add norestart policy for apt updates
2023-09-01 15:42:25 -04:00
ccc6489512
Add README and daemon upgrade playbook, cleanups
2023-09-01 15:42:25 -04:00
f34b2a5f7e
Add cleanup to update oneshot playbook
2023-09-01 15:42:25 -04:00
0ddd11844e
Reorder Ceph stop and lower some waits
2023-09-01 15:42:25 -04:00
cf609eb609
Add tasks to verify node has finished (un)flushing
2023-09-01 15:42:25 -04:00
c29cdd5305
Increase all wait timeouts to 30s
...
Ensure that even on slow(er) clusters, these timeouts have more time to
complete before proceeding so the task won't fail.
2023-09-01 15:42:24 -04:00
750cb4b55c
Disable pvc-flush service while rebooting
...
Prevents the flush daemon from starting on node boot, before the
playbook is actually ready to unflush the node.
2023-09-01 15:42:24 -04:00
cdc7e3377b
Tweak oneshot script
...
Cleanly stop daemons; check if OSDs are back before continuing; wait
less
2023-09-01 15:42:24 -04:00
9962ceaf0a
Add cluster safe update playbook
...
This playbook will perform a oneshot upgrade of the systems in the
cluster, including performing a clean and safe reboot of the node(s) if
required (either due to services needing a restart, or the kernel
changing). It runs in serial=1 and only reboots if needed.
2023-09-01 15:42:24 -04:00