Joshua Boniface
70c7c76605
Move pvc maintenance to separate plays
...
This ensures that the maintenance on/off happens before all tasks and
after all tasks and not intermittently.
2021-11-11 15:54:22 -05:00
Joshua Boniface
820e2a64d0
Ignore errors during flush commands
...
These might inexplicably fail, but that is fine.
2021-10-13 10:34:36 -04:00
Joshua Boniface
74066e6ceb
Avoid errors if noout fails
2021-10-07 16:31:52 -04:00
Joshua Boniface
311f388f56
Increase flush/unflush wait timeout
...
Bump this from 10 minutes (60 * 10 seconds) to 30 minutes (180 * 10
seconds) to ensure there is sufficient time for (relatively) large VMs
to migrate with (relatively) slow networking.
2021-07-22 16:16:27 -04:00
Joshua Boniface
942743daef
Use wait on secondary and delay for 15 seconds
2021-07-22 09:35:00 -04:00
Joshua Boniface
bb094193b4
Adjust ordering of flush task
2021-07-06 09:28:59 -04:00
Joshua Boniface
cae8cfc4cb
Add norestart policy for apt updates
2021-05-27 01:38:43 -04:00
Joshua Boniface
491ea77306
Add README and daemon upgrade playbook, cleanups
2021-05-20 11:02:47 -04:00
Joshua Boniface
510db0df58
Add cleanup to update oneshot playbook
2021-02-02 15:41:38 -05:00
Joshua Boniface
97869ca5c3
Reorder Ceph stop and lower some waits
2021-01-07 11:11:16 -05:00
Joshua Boniface
d35250b870
Add tasks to verify node has finished (un)flushing
2021-01-07 10:49:23 -05:00
Joshua Boniface
cd164d1984
Increase all wait timeouts to 30s
...
Ensure that even on slow(er) clusters, these timeouts have more time to
complete before proceeding so the task won't fail.
2021-01-05 16:17:19 -05:00
Joshua Boniface
f277acc974
Disable pvc-flush service while rebooting
...
Prevents the flush daemon from starting on node boot, before the
playbook is actually ready to unflush the node.
2020-12-15 14:32:50 -05:00
Joshua Boniface
8b474760ed
Tweak oneshot script
...
Cleanly stop daemons; check if OSDs are back before continuing; wait
less
2020-11-26 10:51:54 -05:00
Joshua Boniface
b4ba4f9eda
Add cluster safe update playbook
...
This playbook will perform a oneshot upgrade of the systems in the
cluster, including performing a clean and safe reboot of the node(s) if
required (either due to services needing a restart, or the kernel
changing). It runs in serial=1 and only reboots if needed.
2020-10-27 15:41:20 -04:00