Commit Graph

348 Commits

Author SHA1 Message Date
Joshua Boniface 95b47f8b09 Fix a few more extraneous splits
Just use this_node if applicable, or the raw node.hostname.
2023-09-01 15:42:28 -04:00
Joshua Boniface 87803cb7a2 Remove extraneous splits
The node.hostname should always be short.
2023-09-01 15:42:28 -04:00
Joshua Boniface d24cb8a8ef Unify and standardize inventory_hostname
This was causing some confusing conflicts, so create a new fact called
"this_node" which is inventory_hostname.split('.')[0], i.e. the short
name, and use that everywhere instead of an FQDN or true inventory
hostname.
2023-09-01 15:42:28 -04:00
Joshua Boniface 056c325486 Add option for setting CPU governor
Allows the administrator to set a CPU frequency governor if they need
to, though the default of ondemand is usually sufficient.
2023-09-01 15:42:28 -04:00
Joshua Boniface fc5bcf139c Fix name of IPMI check again 2023-09-01 15:42:28 -04:00
Joshua Boniface 44cedf66c9 Fix name of ipmi check 2023-09-01 15:42:28 -04:00
Joshua Boniface 9f7dbfb4f8 Add IPMI check to tasks 2023-09-01 15:42:28 -04:00
Joshua Boniface b9ae4d1009 Adjust headers and add LOM check 2023-09-01 15:42:27 -04:00
Joshua Boniface 48fb21af75 Add node list to PVC MOTD 2023-09-01 15:42:27 -04:00
Joshua Boniface e009cf4076 Fix whitespaced manufacturer and bad [[ 2023-09-01 15:42:27 -04:00
Joshua Boniface e65f1d15a6 Add coordinator state to MOTD 2023-09-01 15:42:27 -04:00
Joshua Boniface 894ce9b517 Support unknown manufacturers in MOTD 2023-09-01 15:42:27 -04:00
Joshua Boniface 55ec177919 Ignore errors restarting libvirtd
This seems to inexplicably fail sometimes. We can just ignore it.
2023-09-01 15:42:27 -04:00
Joshua Boniface b814ec60f6 Add resolv.conf customization 2023-09-01 15:42:27 -04:00
Joshua Boniface ddecb94348 Disable unified cgroup heirarchy on kernel cmdline
This is required on Debian 11 to use the cset tool, since the newer
systemd implementation of a unified cgroup hierarchy is not compatible
with the cset tool.

Ref for future use:
  https://github.com/lpechacek/cpuset/issues/40
2023-09-01 15:42:27 -04:00
Joshua Boniface be3ce67574 Use inventory_hostname in IPMI fragment 2023-09-01 15:42:27 -04:00
Joshua Boniface 5f05835721 Update bondX configuration 2023-09-01 15:42:27 -04:00
Joshua Boniface 4cb2d7835c Add setting bridge_mtu to config 2023-09-01 15:42:27 -04:00
Joshua Boniface 9f16995f59 Add smartmontools to base package list 2023-09-01 15:42:27 -04:00
Joshua Boniface 6e2d661134 Adjust documentation and behaviour of cpuset
1. Detail the caveats and specific situations and ref the documentation
which will provide more details.

2. Always install the configs, but use /etc/default/ceph-osd-cpuset to
control if the script does anything or not (so, the "osd" cset set is
always active just not set in a special way.
2023-09-01 15:42:27 -04:00
Joshua Boniface 83bd1b1efd Install cset configs even if disabled
The setup script handles this instead.
2023-09-01 15:42:27 -04:00
Joshua Boniface 7927ec4f11 Allow dynamic enabling/disabling of cset
Add a separate config to handle enable/disable on the system itself.
2023-09-01 15:42:27 -04:00
Joshua Boniface 2ae9b9075a Adjust default ceph.conf parameters
1. Remove an explicit OSD journal size, especially such a small one (no
clue why I ever added that...)

2. Add max scrubs, disable scrub during recovery, and set scrub sleep.

3. Add max backfills, tune recovery sleep to 0 to prioritize recovery.
2023-09-01 15:42:27 -04:00
Joshua Boniface 6e48d6fe84 Add Ceph OSD cpuset tuning options
Allows an administrator to set CPU pinning with the cpuset tool for Ceph
OSDs, in situations where CPU contention with VMs or other system tasks
may be negatively affecting OSD performance. This is optional, advanced
tuning and is disabled by default.
2023-09-01 15:42:27 -04:00
Joshua Boniface 45424a28ce Fix bad flag 2023-09-01 15:42:27 -04:00
Joshua Boniface 044a14fa6d Add package installs for different Debian versions 2023-09-01 15:42:27 -04:00
Joshua Boniface ae40227ea1 Move paths and keys to defaults 2023-09-01 15:42:27 -04:00
Joshua Boniface f25a80ff53 Add additional CMK checks 2023-09-01 15:42:26 -04:00
Joshua Boniface 8c2d117a3c Wait longer when restarting services
From 15 -> 30 seconds to ensure more time for stabilization before
proceeding with the next.
2023-09-01 15:42:26 -04:00
Joshua Boniface 647ca1c446 Add default features flag to ceph.conf generator
Coupled with the removal of explicit --image-features flags to the RBD
command in PVC itself, this ensures that only the two features supported
on kernel 4.19 are enabled by default.
2023-09-01 15:42:26 -04:00
Joshua Boniface 86eaeed2b4 Fix sources.list for Bullseye 2023-09-01 15:42:26 -04:00
Joshua Boniface 3d64ad2420 Typo fix 2023-09-01 15:42:26 -04:00
Joshua Boniface eaea860b61 Lower autopurge interval to 1 hour 2023-09-01 15:42:26 -04:00
Joshua Boniface 524f857f56 Add some Zookeeper configuration tweaks 2023-09-01 15:42:26 -04:00
Joshua Boniface 13556918d7 Disable any systemd start rate limiting
Because Zookeeper is supremely stupid (see last commit) we want to
disable start limiting. It needs to keep trying forever until it starts.
2023-09-01 15:42:26 -04:00
Joshua Boniface 8eecc95f2f Ensure Zookeeper restarts itself
The Zookeeper daemon does not appear to exit with any status other than
0, even after a fatal error. Work around this.
2023-09-01 15:42:26 -04:00
Joshua Boniface b03ecf0125 Add -XX:+AlwaysPreTouch option for Zookeeper 2023-09-01 15:42:26 -04:00
Joshua Boniface b842276002 Lower keep count for Zookeeper vacuum to 3
Required to keep disk space growth down when using zookeeper_logging
functionality.
2023-09-01 15:42:26 -04:00
Joshua Boniface 681afd1d1b Fix excessive whitespace 2023-09-01 15:42:26 -04:00
Joshua Boniface 2d31e6c8ea Fix memory tuning issues 2023-09-01 15:42:26 -04:00
Joshua Boniface 71b6da6555 Adjust package lists per Debian version 2023-09-01 15:42:26 -04:00
Joshua Boniface 4b0a4ae73c Fix bad Ansible variable name 2023-09-01 15:42:26 -04:00
Joshua Boniface a52d4cbf37 Add Zookeeper logging configs 2023-09-01 15:42:26 -04:00
Joshua Boniface 7bacbd5dd6 Don't fail if IPMI tasks fail 2023-09-01 15:42:26 -04:00
Joshua Boniface eef0f959dd Add GRUB, Plymouth themes and issue for PVC 2023-09-01 15:42:26 -04:00
Joshua Boniface 6d3e5ac728 Fix zkcli for good 2023-09-01 15:42:26 -04:00
Joshua Boniface e760114b8d Fix bootstrap collection path for Ceph 2023-09-01 15:42:26 -04:00
Joshua Boniface bace67b8bf Add GRUB configuration to Ansible role 2023-09-01 15:42:26 -04:00
Joshua Boniface 0802cca980 Support both versions of psycopg2 and kazoo 2023-09-01 15:42:26 -04:00
Joshua Boniface 31a677b444 Fix Patroni ACL to use subnet mask 2023-09-01 15:42:26 -04:00
Joshua Boniface 35089f6dda Fix zkcli alias to use hostname 2023-09-01 15:42:26 -04:00
Joshua Boniface 9dc9139c35 Use short ansible_hostname in ipmi fragment 2023-09-01 15:42:26 -04:00
Joshua Boniface 329bc9690e Add ipmitool to packages list 2023-09-01 15:42:26 -04:00
Joshua Boniface a2ed38b459 Add generic SR-IOV configuration 2023-09-01 15:42:26 -04:00
Joshua Boniface 0fc889df32 Ensure we can connect to Patroni 2023-09-01 15:42:26 -04:00
Joshua Boniface 388db6ad1d Use IPs for Patroni configuration 2023-09-01 15:42:26 -04:00
Joshua Boniface d455b31905 Bump max connections in Zookeeper to 200 2023-09-01 15:42:26 -04:00
Joshua Boniface f105f0497c Configure Zookeeper only on Cluster address 2023-09-01 15:42:26 -04:00
Joshua Boniface 7e94dddb4c Ensure libvirtd restarts when unit changes 2023-09-01 15:42:26 -04:00
Joshua Boniface c9df64bc7d Ensure deb-src is present for bullseye 2023-09-01 15:42:26 -04:00
Joshua Boniface 0bbb91fc8b Add override custom libvirtd.service unit
This has no functional change on Buster, but on Bullseye this overrides
the stupid socket-based activation shenanigans that the default unit
tries to do, as well as the breaking replacement of the
/etc/default/libvirt variable names.
2023-09-01 15:42:26 -04:00
Joshua Boniface 3a67dc129b Ensure DEBIAN_FRONTEND is noninteractive 2023-09-01 15:42:26 -04:00
Joshua Boniface 0114ad8ed5 Add python3 version of psycopg2 explicitly 2023-09-01 15:42:26 -04:00
Joshua Boniface a548bdcc6a Use inventory_hostname for IPMI dict 2023-09-01 15:42:26 -04:00
Joshua Boniface 6104e0a5a5 Use independent fact to work around codename 2023-09-01 15:42:26 -04:00
Joshua Boniface 5c46bb0db7 Ensure backup_keys isn't empty 2023-09-01 15:42:25 -04:00
Joshua Boniface d69770b776 Avoid writing hosts if empty 2023-09-01 15:42:25 -04:00
Joshua Boniface f4e49b9d3e Ensure apt-update runs if configs update 2023-09-01 15:42:25 -04:00
Joshua Boniface 9438ab46d7 Add bullseye support 2023-09-01 15:42:25 -04:00
Joshua Boniface dc83f91bd8 Add directory creation to backup script 2023-09-01 15:42:25 -04:00
Joshua Boniface 5466df7065 Add PostgreSQL to daily backup script 2023-09-01 15:42:25 -04:00
Joshua Boniface c9742fe2e5 Update tags and fix backup keys to var 2023-09-01 15:42:25 -04:00
Joshua Boniface 7c7ca4a229 Allow inter-cluster orphan NTP sync
Due to the requirement of Ceph to have all peer nodes tightly
synchronized with each other to come online, PVC nodes need a way to
synchronize to each other even in the absence of an external time
reference. This is especially prevalent if a set of nodes are left
offline for an extended period (>1-2 weeks), since their hardware clocks
will drift. If the resulting Internet connectivity is then dependent on
a VM, this will cause a catch-22 and the cluster will not properly
start.

This configuration will accomplish that - if no suitable >6 stratum
peers are found, the hosts will enter orphan mode. Since they are now
all configured as "peers" with each other, they will collectively decide
on one of them to become the source and sync to it. A local stratum 10
fudge is added so that at least one of the nodes can become this source.

While this is not an ideal use of NTP, it is by far the cleanest
solution to this problem, and does not impact normal functionality when
the two configured stratum-2 servers are reachable.
2023-09-01 15:42:25 -04:00
Joshua Boniface 027a819a83 Move some other tasks to bootstrap role
Avoids an issue where the pvcnoded service is stopped on non-bootstrap
runs.
2023-09-01 15:42:25 -04:00
Joshua Boniface e53342474c Remove GRUB config from base role
This is not actually ideal.
2023-09-01 15:42:25 -04:00
Joshua Boniface 4666db17cb Fix version sorting bugs in kernel-cleanup.sh 2023-09-01 15:42:25 -04:00
Joshua Boniface 6903627150 Add additional items to base role
Backups, GRUB configuration, and IPMI configuration.
2023-09-01 15:42:25 -04:00
Joshua Boniface c96ad603b0 Fix sudoers to use conditional deploy_username 2023-09-01 15:42:25 -04:00
Joshua Boniface 29363ebf80 Allow configurable fail2ban IPs 2023-09-01 15:42:25 -04:00
Joshua Boniface d9be39a048 Allow customization of deploy username 2023-09-01 15:42:25 -04:00
Joshua Boniface 4dc5ebdba0 Move to more dynamic apt configs
Allow specifying repository URLs in the group_vars, and add
release-specific template files to support future version changes.
2023-09-01 15:42:25 -04:00
Joshua Boniface 6a61f8f7bf Update relative path to bootstrap files 2023-09-01 15:42:25 -04:00
Joshua Boniface 4caab67d03 Remove superfluous symlink 2023-09-01 15:42:25 -04:00
Joshua Boniface 57e5953fd1 Add sensible sorting of kernel removals 2023-09-01 15:42:25 -04:00
Joshua Boniface 2a72a826f5 Remove cruft and add mkpasswd setup 2023-09-01 15:42:25 -04:00
Joshua Boniface bf02da693f Correct bad indentation in base role 2023-09-01 15:42:25 -04:00
Joshua Boniface 39b8229c35 Add libguestfs-tools to libvirt role deps 2023-09-01 15:42:25 -04:00
Joshua Boniface 1f6cb077fa Update tags and add kernel-cleanup script 2023-09-01 15:42:25 -04:00
Joshua Boniface 0bf9c6209c Fix incorrect systemd enabling in Patroni 2023-09-01 15:42:25 -04:00
Joshua Boniface c0dc6fad4e Add some additional compression libraries 2023-09-01 15:42:25 -04:00
Joshua Boniface a4be011884 Add local domain to resolver config 2023-09-01 15:42:25 -04:00
Joshua Boniface 4f5dbee8ee Correct bugs during bootstrap
1. Ensure Zookeeper restarts and checks out successfully before
proceeding with other steps.
2. Make sure PVC itself doesn't start prematurely.
2023-09-01 15:42:25 -04:00
Joshua Boniface 26dbd082ef Retry pgsql bootstrap startup 6 times
This will sometimes fail, so retry it several times
2023-09-01 15:42:25 -04:00
Joshua Boniface e9f08ad100 Retry msgr2 enabling 6 times
This will sometimes fail, so retry it several times
2023-09-01 15:42:25 -04:00
Joshua Boniface a77e41bf7c Remove invalid timezone entries in postgres conf 2023-09-01 15:42:25 -04:00
Joshua Boniface cba276e248 Add default values 2023-09-01 15:42:24 -04:00
Joshua Boniface be94bc134f Add configurable ZK memory limits 2023-09-01 15:42:24 -04:00
Joshua Boniface 6e74ac44a5 Remove libjemalloc package 2023-09-01 15:42:24 -04:00
Joshua Boniface 2bd5cc5a25 Tune Zookeeper memory usage
Use Xms and Xmx=128M to reduce overall Zookeeper memory usage.
2023-09-01 15:42:24 -04:00
Joshua Boniface b4e36d146a Add tuning for Ceph OSDs 2023-09-01 15:42:24 -04:00
Joshua Boniface 24764fe704 Don't use libjemalloc for Ceph daemons
This was an artifact of a much, much older Ceph configuration I ran, and
is not relevant with newer Ceph versions like those used in PVC.
Performance testing with Nautilus and Bluestore reveals a minimal
performance hit, and using `jemalloc` prevents cache autotuning from
being effective, so remove it.
2023-09-01 15:42:24 -04:00
Joshua Boniface 458e7b4872 Use new init command location
Command was renamed in the PVC CLI to facilitate other "task" actions
like backup/restore.
2023-09-01 15:42:24 -04:00
Joshua Boniface bcb5962353 Add jute.maxbuffer to Zookeeper environment ops
Adds this option based on the findings of
https://github.com/python-zk/kazoo/issues/630, whereby restores of >1MB
in size would fail. This is considered an unsafe option, but given our
usecase no actual znode should ever exceed this limit; this is purely
for the large transactions that come from a `pvc task restore` action to
an empty Zookeeper instance.
2023-09-01 15:42:24 -04:00
Joshua Boniface 075ce8ea22 Add PVC status MOTD script 2023-09-01 15:42:24 -04:00
Joshua Boniface 68a475ccf9 Set proper mode on agent plugins 2023-09-01 15:42:24 -04:00
Joshua Boniface f86ec62416 Add check-mk-agent plugin installs
These are used by various Ansible tasks, even if the administrator is
not using Check_MK for monitoring.
2023-09-01 15:42:24 -04:00
Joshua Boniface 62d53b0c9c Add PCI and USB utils 2023-09-01 15:42:24 -04:00
Joshua Boniface f79fb605de Support using existing SSL certs on system
Add the additional pvc_api_ssl_cert_path and pvc_api_ssl_key_path
group_vars options, which can be used to set the SSL details to existing
files on the filesystem if desired. If these are empty (or nonexistent),
the original pvc_api_ssl_cert and pvc_api_ssl_key raw format options
will be used as they were.

Allows the administrator to use outside methods (such as Let's Encrypt)
to obtain the certs locally on the system, avoiding changes to the
group_vars and redeployment to manage SSL keys.
2023-09-01 15:42:24 -04:00
Joshua Boniface a8419be587 Use generic Debian repos and PVC component 2023-09-01 15:42:24 -04:00
Joshua Boniface 2caed2ae12 Rename remaining "pvc_prov" items to pvc_api 2023-09-01 15:42:24 -04:00
Joshua Boniface 2a2d318dbc Change name of default API database
From pvcprov to pvcapi to reflect the changing use of this database.
2023-09-01 15:42:24 -04:00
Joshua Boniface 833d99a360 Add comments to defaults 2023-09-01 15:42:24 -04:00
Joshua Boniface 8109f13386 Add additional configuration to group_vars
Also include defaults and the new pvc_vm_shutdown_timeout option.
2023-09-01 15:42:24 -04:00
Joshua Boniface 72df058684 Ensure ZK prioritizes IPv4 2023-09-01 15:42:24 -04:00
Joshua Boniface 457e18a850 Use FQDN for Zookeeper server entries 2023-09-01 15:42:24 -04:00
Joshua Boniface 777a4693a1 Improve SSH configuration for nodes
Ensure hostbased auth works with configs, remove erroneous old
conditional for authtypes, remove obsolete config option.
2023-09-01 15:42:24 -04:00
Joshua Boniface 88209a2b70 Use Google DNS instead of Cloudflare
For some reason Cloudflare works in fewer places than Google, so just
use it instead.
2023-09-01 15:42:24 -04:00
Joshua Boniface fbbf5ffe09 Use cluster_group variable for paths
Instead of trying to automagic this group out of the Ansible hostvars,
just make it explicitly defined in the group_vars to avoid any
confusion.
2023-09-01 15:42:23 -04:00
Joshua Boniface a925e4bd40 Ignore errors in bringing up bootstrap interfaces 2023-09-01 15:42:23 -04:00
Joshua Boniface e3ad750412 Add storage components to default pvcnoded.yaml 2023-09-01 15:42:23 -04:00
Joshua Boniface 715fa103cd Ensure uuid-runtime is installed 2023-09-01 15:42:23 -04:00
Joshua Boniface 12d50cfca6 Use correct syntax for init command 2023-09-01 15:42:23 -04:00
Joshua Boniface 92ccc0a737 Use consistent naming in patroni.yml 2023-09-01 15:42:23 -04:00
Joshua Boniface 0566aadfb0 Remove obsolete issue-gen script on install 2023-09-01 15:42:23 -04:00
Joshua Boniface c35c58389d Use short names in PVC configs 2023-09-01 15:42:23 -04:00
Joshua Boniface 157c56fd46 Use shortname for Zookeeper 2023-09-01 15:42:23 -04:00
Joshua Boniface 7e653d52c3 Include upstream and short names in hosts 2023-09-01 15:42:23 -04:00
Joshua Boniface 6a3c32f306 Use local CLI command instead of API to init 2023-09-01 15:42:23 -04:00
Joshua Boniface c71415317a Use only short names in Ceph MON config 2023-09-01 15:42:23 -04:00
Joshua Boniface 52862f9daf Fix conditional checks with inventory_hostname 2023-09-01 15:42:23 -04:00
Joshua Boniface 91313e848e Handle bridge creation more sensibly 2023-09-01 15:42:23 -04:00
Joshua Boniface 6d3999eaab Don't restart pvcd.service on bootstrap 2023-09-01 15:42:23 -04:00
Joshua Boniface 0d9e209b45 Allow deb migrations to be installed 2023-09-01 15:42:23 -04:00
Joshua Boniface 4b89aff1d8 Add symlink for pvc files dir 2023-09-01 15:42:23 -04:00
Joshua Boniface 8c15edd75c Handle creation and collection on bootstrap better 2023-09-01 15:42:23 -04:00
Joshua Boniface b4079cae88 Use new in-built database migrations in API 2023-09-01 15:42:23 -04:00
Joshua Boniface 0e5cb688dc Use new package and file names
References parallelvirtualclient/pvc#79
2023-09-01 15:42:23 -04:00
Joshua Boniface 999e50a68f Don't mess with upstream at all during bootstrap
This caused some major breakage and is not required.
2023-09-01 15:42:23 -04:00
Joshua Boniface e7e7f2cc96 Don't remove nano 2023-09-01 15:42:22 -04:00
Joshua Boniface 42d76618e3 Modify add_cluster_ips to support new bridges 2023-09-01 15:42:22 -04:00
Joshua Boniface 32b719cb4a Enable and start vhostmd service 2023-09-01 15:42:22 -04:00
Joshua Boniface b654be8825 Add source_volume column to storage table 2023-09-01 15:42:22 -04:00
Joshua Boniface e3f83713a0 Add new empty script entry 2023-09-01 15:42:22 -04:00
Joshua Boniface f68ba7a735 Add bridge_device entry to config
Used to properly allow bridged networks to be formed.

Ref parallelvirtualcluster/pvc#64
2023-09-01 15:42:22 -04:00
Joshua Boniface 9848eb10bb Fix additional reference to userdata_template 2023-09-01 15:42:22 -04:00
Joshua Boniface f3212d5e4f Adjust provisioner database schema 2023-09-01 15:42:22 -04:00
Joshua Boniface bc1d9cd33b Set msgr2 mode on Ceph monitors 2023-09-01 15:42:22 -04:00
Joshua Boniface 372b949930 Apply fix with some tweaks to other serial handlers 2023-09-01 15:42:22 -04:00
Joshua Boniface 15768130e2 Change ordering of networks in file 2023-09-01 15:42:22 -04:00
Joshua Boniface 146e660a21 Replace broken "serial" restarts with a new method 2023-09-01 15:42:22 -04:00