179 Commits

Author SHA1 Message Date
0ddd11844e Reorder Ceph stop and lower some waits 2023-09-01 15:42:25 -04:00
cf609eb609 Add tasks to verify node has finished (un)flushing 2023-09-01 15:42:25 -04:00
c29cdd5305 Increase all wait timeouts to 30s
Ensure that even on slow(er) clusters, these timeouts have more time to
complete before proceeding so the task won't fail.
2023-09-01 15:42:24 -04:00
cba276e248 Add default values 2023-09-01 15:42:24 -04:00
be94bc134f Add configurable ZK memory limits 2023-09-01 15:42:24 -04:00
6e74ac44a5 Remove libjemalloc package 2023-09-01 15:42:24 -04:00
2bd5cc5a25 Tune Zookeeper memory usage
Use Xms and Xmx=128M to reduce overall Zookeeper memory usage.
2023-09-01 15:42:24 -04:00
b4e36d146a Add tuning for Ceph OSDs 2023-09-01 15:42:24 -04:00
24764fe704 Don't use libjemalloc for Ceph daemons
This was an artifact of a much, much older Ceph configuration I ran, and
is not relevant with newer Ceph versions like those used in PVC.
Performance testing with Nautilus and Bluestore reveals a minimal
performance hit, and using `jemalloc` prevents cache autotuning from
being effective, so remove it.
2023-09-01 15:42:24 -04:00
750cb4b55c Disable pvc-flush service while rebooting
Prevents the flush daemon from starting on node boot, before the
playbook is actually ready to unflush the node.
2023-09-01 15:42:24 -04:00
cdc7e3377b Tweak oneshot script
Cleanly stop daemons; check if OSDs are back before continuing; wait
less
2023-09-01 15:42:24 -04:00
458e7b4872 Use new init command location
Command was renamed in the PVC CLI to facilitate other "task" actions
like backup/restore.
2023-09-01 15:42:24 -04:00
bcb5962353 Add jute.maxbuffer to Zookeeper environment ops
Adds this option based on the findings of
https://github.com/python-zk/kazoo/issues/630, whereby restores of >1MB
in size would fail. This is considered an unsafe option, but given our
usecase no actual znode should ever exceed this limit; this is purely
for the large transactions that come from a `pvc task restore` action to
an empty Zookeeper instance.
2023-09-01 15:42:24 -04:00
075ce8ea22 Add PVC status MOTD script 2023-09-01 15:42:24 -04:00
68a475ccf9 Set proper mode on agent plugins 2023-09-01 15:42:24 -04:00
9962ceaf0a Add cluster safe update playbook
This playbook will perform a oneshot upgrade of the systems in the
cluster, including performing a clean and safe reboot of the node(s) if
required (either due to services needing a restart, or the kernel
changing). It runs in serial=1 and only reboots if needed.
2023-09-01 15:42:24 -04:00
f86ec62416 Add check-mk-agent plugin installs
These are used by various Ansible tasks, even if the administrator is
not using Check_MK for monitoring.
2023-09-01 15:42:24 -04:00
62d53b0c9c Add PCI and USB utils 2023-09-01 15:42:24 -04:00
f79fb605de Support using existing SSL certs on system
Add the additional pvc_api_ssl_cert_path and pvc_api_ssl_key_path
group_vars options, which can be used to set the SSL details to existing
files on the filesystem if desired. If these are empty (or nonexistent),
the original pvc_api_ssl_cert and pvc_api_ssl_key raw format options
will be used as they were.

Allows the administrator to use outside methods (such as Let's Encrypt)
to obtain the certs locally on the system, avoiding changes to the
group_vars and redeployment to manage SSL keys.
2023-09-01 15:42:24 -04:00
a8419be587 Use generic Debian repos and PVC component 2023-09-01 15:42:24 -04:00
2caed2ae12 Rename remaining "pvc_prov" items to pvc_api 2023-09-01 15:42:24 -04:00
2a2d318dbc Change name of default API database
From pvcprov to pvcapi to reflect the changing use of this database.
2023-09-01 15:42:24 -04:00
833d99a360 Add comments to defaults 2023-09-01 15:42:24 -04:00
503d99d0fe Add more detailed comments 2023-09-01 15:42:24 -04:00
8109f13386 Add additional configuration to group_vars
Also include defaults and the new pvc_vm_shutdown_timeout option.
2023-09-01 15:42:24 -04:00
98c3586511 Add nice warning to purge script 2023-09-01 15:42:24 -04:00
72df058684 Ensure ZK prioritizes IPv4 2023-09-01 15:42:24 -04:00
457e18a850 Use FQDN for Zookeeper server entries 2023-09-01 15:42:24 -04:00
777a4693a1 Improve SSH configuration for nodes
Ensure hostbased auth works with configs, remove erroneous old
conditional for authtypes, remove obsolete config option.
2023-09-01 15:42:24 -04:00
88209a2b70 Use Google DNS instead of Cloudflare
For some reason Cloudflare works in fewer places than Google, so just
use it instead.
2023-09-01 15:42:24 -04:00
fbbf5ffe09 Use cluster_group variable for paths
Instead of trying to automagic this group out of the Ansible hostvars,
just make it explicitly defined in the group_vars to avoid any
confusion.
2023-09-01 15:42:23 -04:00
a925e4bd40 Ignore errors in bringing up bootstrap interfaces 2023-09-01 15:42:23 -04:00
e3ad750412 Add storage components to default pvcnoded.yaml 2023-09-01 15:42:23 -04:00
715fa103cd Ensure uuid-runtime is installed 2023-09-01 15:42:23 -04:00
7dc6efdf9a Add update to purge command 2023-09-01 15:42:23 -04:00
12d50cfca6 Use correct syntax for init command 2023-09-01 15:42:23 -04:00
92ccc0a737 Use consistent naming in patroni.yml 2023-09-01 15:42:23 -04:00
0566aadfb0 Remove obsolete issue-gen script on install 2023-09-01 15:42:23 -04:00
c35c58389d Use short names in PVC configs 2023-09-01 15:42:23 -04:00
157c56fd46 Use shortname for Zookeeper 2023-09-01 15:42:23 -04:00
82406e9da8 Add purge script 2023-09-01 15:42:23 -04:00
7e653d52c3 Include upstream and short names in hosts 2023-09-01 15:42:23 -04:00
6a3c32f306 Use local CLI command instead of API to init 2023-09-01 15:42:23 -04:00
c71415317a Use only short names in Ceph MON config 2023-09-01 15:42:23 -04:00
52862f9daf Fix conditional checks with inventory_hostname 2023-09-01 15:42:23 -04:00
91313e848e Handle bridge creation more sensibly 2023-09-01 15:42:23 -04:00
18735c0657 Fix a grammatical error 2023-09-01 15:42:23 -04:00
0aeb4fbc89 Update README with GitHub notice and links 2023-09-01 15:42:23 -04:00
6d3999eaab Don't restart pvcd.service on bootstrap 2023-09-01 15:42:23 -04:00
0d9e209b45 Allow deb migrations to be installed 2023-09-01 15:42:23 -04:00