d69770b776
Avoid writing hosts if empty
2023-09-01 15:42:25 -04:00
f4e49b9d3e
Ensure apt-update runs if configs update
2023-09-01 15:42:25 -04:00
9438ab46d7
Add bullseye support
2023-09-01 15:42:25 -04:00
dc83f91bd8
Add directory creation to backup script
2023-09-01 15:42:25 -04:00
5466df7065
Add PostgreSQL to daily backup script
2023-09-01 15:42:25 -04:00
c9742fe2e5
Update tags and fix backup keys to var
2023-09-01 15:42:25 -04:00
1cfbc25f37
Add norestart policy for apt updates
2023-09-01 15:42:25 -04:00
ccc6489512
Add README and daemon upgrade playbook, cleanups
2023-09-01 15:42:25 -04:00
7c7ca4a229
Allow inter-cluster orphan NTP sync
...
Due to the requirement of Ceph to have all peer nodes tightly
synchronized with each other to come online, PVC nodes need a way to
synchronize to each other even in the absence of an external time
reference. This is especially prevalent if a set of nodes are left
offline for an extended period (>1-2 weeks), since their hardware clocks
will drift. If the resulting Internet connectivity is then dependent on
a VM, this will cause a catch-22 and the cluster will not properly
start.
This configuration will accomplish that - if no suitable >6 stratum
peers are found, the hosts will enter orphan mode. Since they are now
all configured as "peers" with each other, they will collectively decide
on one of them to become the source and sync to it. A local stratum 10
fudge is added so that at least one of the nodes can become this source.
While this is not an ideal use of NTP, it is by far the cleanest
solution to this problem, and does not impact normal functionality when
the two configured stratum-2 servers are reachable.
2023-09-01 15:42:25 -04:00
027a819a83
Move some other tasks to bootstrap role
...
Avoids an issue where the pvcnoded service is stopped on non-bootstrap
runs.
2023-09-01 15:42:25 -04:00
e53342474c
Remove GRUB config from base role
...
This is not actually ideal.
2023-09-01 15:42:25 -04:00
4666db17cb
Fix version sorting bugs in kernel-cleanup.sh
2023-09-01 15:42:25 -04:00
6903627150
Add additional items to base role
...
Backups, GRUB configuration, and IPMI configuration.
2023-09-01 15:42:25 -04:00
c96ad603b0
Fix sudoers to use conditional deploy_username
2023-09-01 15:42:25 -04:00
29363ebf80
Allow configurable fail2ban IPs
2023-09-01 15:42:25 -04:00
d9be39a048
Allow customization of deploy username
2023-09-01 15:42:25 -04:00
95b9d6d786
Fix group_vars to match new setup
2023-09-01 15:42:25 -04:00
4dc5ebdba0
Move to more dynamic apt configs
...
Allow specifying repository URLs in the group_vars, and add
release-specific template files to support future version changes.
2023-09-01 15:42:25 -04:00
5873ce7d0c
Update root password in default group_vars
2023-09-01 15:42:25 -04:00
6a61f8f7bf
Update relative path to bootstrap files
2023-09-01 15:42:25 -04:00
4caab67d03
Remove superfluous symlink
2023-09-01 15:42:25 -04:00
57e5953fd1
Add sensible sorting of kernel removals
2023-09-01 15:42:25 -04:00
2a72a826f5
Remove cruft and add mkpasswd setup
2023-09-01 15:42:25 -04:00
3a24e6dd8a
Update file copyright header
2023-09-01 15:42:25 -04:00
bf02da693f
Correct bad indentation in base role
2023-09-01 15:42:25 -04:00
39b8229c35
Add libguestfs-tools to libvirt role deps
2023-09-01 15:42:25 -04:00
f34b2a5f7e
Add cleanup to update oneshot playbook
2023-09-01 15:42:25 -04:00
1f6cb077fa
Update tags and add kernel-cleanup script
2023-09-01 15:42:25 -04:00
0bf9c6209c
Fix incorrect systemd enabling in Patroni
2023-09-01 15:42:25 -04:00
8d41650619
Add reboot to purge
2023-09-01 15:42:25 -04:00
c634823ba5
Remove log dirs during purge
2023-09-01 15:42:25 -04:00
c0dc6fad4e
Add some additional compression libraries
2023-09-01 15:42:25 -04:00
a4be011884
Add local domain to resolver config
2023-09-01 15:42:25 -04:00
4f5dbee8ee
Correct bugs during bootstrap
...
1. Ensure Zookeeper restarts and checks out successfully before
proceeding with other steps.
2. Make sure PVC itself doesn't start prematurely.
2023-09-01 15:42:25 -04:00
05a2c1949d
Add removal of Zookeeper keys too
2023-09-01 15:42:25 -04:00
a4f1d6eedc
Update purge script
2023-09-01 15:42:25 -04:00
26dbd082ef
Retry pgsql bootstrap startup 6 times
...
This will sometimes fail, so retry it several times
2023-09-01 15:42:25 -04:00
e9f08ad100
Retry msgr2 enabling 6 times
...
This will sometimes fail, so retry it several times
2023-09-01 15:42:25 -04:00
a77e41bf7c
Remove invalid timezone entries in postgres conf
2023-09-01 15:42:25 -04:00
0ddd11844e
Reorder Ceph stop and lower some waits
2023-09-01 15:42:25 -04:00
cf609eb609
Add tasks to verify node has finished (un)flushing
2023-09-01 15:42:25 -04:00
c29cdd5305
Increase all wait timeouts to 30s
...
Ensure that even on slow(er) clusters, these timeouts have more time to
complete before proceeding so the task won't fail.
2023-09-01 15:42:24 -04:00
cba276e248
Add default values
2023-09-01 15:42:24 -04:00
be94bc134f
Add configurable ZK memory limits
2023-09-01 15:42:24 -04:00
6e74ac44a5
Remove libjemalloc package
2023-09-01 15:42:24 -04:00
2bd5cc5a25
Tune Zookeeper memory usage
...
Use Xms and Xmx=128M to reduce overall Zookeeper memory usage.
2023-09-01 15:42:24 -04:00
b4e36d146a
Add tuning for Ceph OSDs
2023-09-01 15:42:24 -04:00
24764fe704
Don't use libjemalloc for Ceph daemons
...
This was an artifact of a much, much older Ceph configuration I ran, and
is not relevant with newer Ceph versions like those used in PVC.
Performance testing with Nautilus and Bluestore reveals a minimal
performance hit, and using `jemalloc` prevents cache autotuning from
being effective, so remove it.
2023-09-01 15:42:24 -04:00
750cb4b55c
Disable pvc-flush service while rebooting
...
Prevents the flush daemon from starting on node boot, before the
playbook is actually ready to unflush the node.
2023-09-01 15:42:24 -04:00
cdc7e3377b
Tweak oneshot script
...
Cleanly stop daemons; check if OSDs are back before continuing; wait
less
2023-09-01 15:42:24 -04:00