Compare commits

...

31 Commits

Author SHA1 Message Date
ecbaaa39a1 Update Ceph agent to the latest version 2024-11-24 18:45:01 -05:00
dd451c70c3 Add automirror support to Ansible 2024-11-15 11:49:53 -05:00
6d75b33d17 Correct bugs with Patroni tasks on Ansible 2.14 2024-11-07 02:01:03 -05:00
f5fe9c1f70 Ignore errors on stupid tasks 2024-11-04 16:03:37 -05:00
fe050509c0 Add missing tags from main tasks 2024-11-03 15:40:29 -05:00
e4a0e0be7c Add needrestart override for PVC daemons 2024-11-03 15:38:41 -05:00
36fcdd151b Improve PostgreSQL migration
Just dump and reimport the database, it's easier.
2024-10-29 13:46:22 -04:00
cbb2352b01 Avoid error if noded version doesn't work 2024-10-29 11:32:32 -04:00
662083d87e Update README badge order 2024-10-25 23:47:49 -04:00
c53043f79d Update README 2024-10-25 23:44:15 -04:00
8f6a59d1ba Update comments in default group_vars 2024-10-25 23:41:04 -04:00
9fe3e438ec Update README to match other repositories 2024-10-25 23:40:50 -04:00
463c1985d3 Add additional wait time 2024-10-15 11:18:18 -04:00
b4c2b9bdf8 Lower sync and init tick limits
Lower both of these to 5 seconds to ensure Zookeeper doesn't linger on
startup or synchronization while pvcnoded is starting up (15s in
0.9.101).
2024-10-15 11:11:10 -04:00
d82d057956 Remove ALL the quotes 2024-09-27 23:52:46 -04:00
597f9bdb92 Remove quotes from example 2024-09-27 23:51:28 -04:00
4fed4ecc64 Add support for arbitrary NIC options
And add a proper example to the default group_vars.
2024-09-27 23:47:07 -04:00
1925100589 Add gunicorn install to update playbook 2024-09-19 16:16:37 -04:00
c2dfb2d30e Update README note for latest versions 2024-09-05 04:10:23 -04:00
6b390d7082 Fix issue with newhost inverse definition 2024-09-05 01:27:31 -04:00
31728c0915 Fix unsafe conditional 2024-09-03 21:24:56 -04:00
8f5e162fd6 Skip update-motd and update-issue run on bootstrap 2024-08-30 12:16:35 -04:00
506d2ada49 Fix typo in script 2024-08-30 11:07:49 -04:00
ce82f72241 Ensure schema updates are run after upgrade 2024-08-29 02:57:12 -04:00
f85136ed62 Add final role runs to normalize cluster 2024-08-29 02:33:30 -04:00
805477b8be Ignore more errors for user: module 2024-08-29 01:08:44 -04:00
65af8ef149 Ignore errors on all user commands
After a D10->D12 upgrade, these all fail. Let them.
2024-08-29 01:02:01 -04:00
6f2aeed3c9 Avoid failing if setting root password fails 2024-08-29 00:55:55 -04:00
beef030656 Fix ansible_lsb call on Debian 10
Fails due to empty ansible_lsb, so skip it
2024-08-29 00:13:46 -04:00
f00b43f20f Add extra waits before unsetting maintenance
Avoids issues after restarting the API.
2024-08-28 12:42:01 -04:00
4e59ad5efe Remove obsolete upgrade script
Debian 11 is now deprecated.
2024-08-28 12:41:50 -04:00
25 changed files with 678 additions and 474 deletions

View File

@ -1,16 +1,42 @@
# PVC Ansible
<p align="center">
<img alt="Logo banner" src="https://docs.parallelvirtualcluster.org/en/latest/images/pvc_logo_black.png"/>
<br/><br/>
<a href="https://www.parallelvirtualcluster.org"><img alt="Website" src="https://img.shields.io/badge/visit-website-blue"/></a>
<a href="https://github.com/parallelvirtualcluster/pvc/releases"><img alt="Latest Release" src="https://img.shields.io/github/release-pre/parallelvirtualcluster/pvc"/></a>
<a href="https://docs.parallelvirtualcluster.org/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/parallelvirtualcluster/badge/?version=latest"/></a>
<a href="https://github.com/parallelvirtualcluster/pvc"><img alt="License" src="https://img.shields.io/github/license/parallelvirtualcluster/pvc"/></a>
<a href="https://github.com/psf/black"><img alt="Code style: Black" src="https://img.shields.io/badge/code%20style-black-000000.svg"/></a>
</p>
**NOTICE FOR GITHUB**: This repository is a read-only mirror of the PVC repositories from my personal GitLab instance. Pull requests submitted here will not be merged. Issues submitted here will however be treated as authoritative.
## What is PVC?
A set of Ansible roles to set up PVC nodes. Part of the [Parallel Virtual Cluster system](https://github.com/parallelvirtualcluster/pvc).
PVC is a Linux KVM-based hyperconverged infrastructure (HCI) virtualization cluster solution that is fully Free Software, scalable, redundant, self-healing, self-managing, and designed for administrator simplicity. It is an alternative to other HCI solutions such as Ganeti, Harvester, Nutanix, and VMWare, as well as to other common virtualization stacks such as ProxMox and OpenStack.
PVC is a complete HCI solution, built from well-known and well-trusted Free Software tools, to assist an administrator in creating and managing a cluster of servers to run virtual machines, as well as self-managing several important aspects including storage failover, node failure and recovery, virtual machine failure and recovery, and network plumbing. It is designed to act consistently, reliably, and unobtrusively, letting the administrator concentrate on more important things.
PVC is highly scalable. From a minimum (production) node count of 3, up to 12 or more, and supporting many dozens of VMs, PVC scales along with your workload and requirements. Deploy a cluster once and grow it as your needs expand.
As a consequence of its features, PVC makes administrating very high-uptime VMs extremely easy, featuring VM live migration, built-in always-enabled shared storage with transparent multi-node replication, and consistent network plumbing throughout the cluster. Nodes can also be seamlessly removed from or added to service, with zero VM downtime, to facilitate maintenance, upgrades, or other work.
PVC also features an optional, fully customizable VM provisioning framework, designed to automate and simplify VM deployments using custom provisioning profiles, scripts, and CloudInit userdata API support.
Installation of PVC is accomplished by two main components: a [Node installer ISO](https://github.com/parallelvirtualcluster/pvc-installer) which creates on-demand installer ISOs, and an [Ansible role framework](https://github.com/parallelvirtualcluster/pvc-ansible) to configure, bootstrap, and administrate the nodes. Installation can also be fully automated with a companion [cluster bootstrapping system](https://github.com/parallelvirtualcluster/pvc-bootstrap). Once up, the cluster is managed via an HTTP REST API, accessible via a Python Click CLI client ~~or WebUI~~ (eventually).
Just give it physical servers, and it will run your VMs without you having to think about it, all in just an hour or two of setup time.
More information about PVC, its motivations, the hardware requirements, and setting up and managing a cluster [can be found over at our docs page](https://docs.parallelvirtualcluster.org).
# PVC Ansible Management Framework
This repository contains a set of Ansible roles for setting up and managing PVC nodes.
Tested on Ansible 2.2 through 2.10; it is not guaranteed to work properly on older or newer versions.
## Roles
# Roles
This repository contains two roles:
#### base
### base
This role provides a standardized and configured base system for PVC. This role expects that
the system was installed via the PVC installer ISO, which results in a Debian Buster system.
@ -18,21 +44,22 @@ the system was installed via the PVC installer ISO, which results in a Debian Bu
This role is optional; the administrator may configure the base system however they please so
long as the `pvc` role can be installed thereafter.
#### pvc
### pvc
This role configures the various subsystems required by PVC, including Ceph, Libvirt, Zookeeper,
FRR, and Patroni, as well as the main PVC components themselves.
## Variables
# Variables
A default example set of configuration variables can be found in `group_vars/default/`.
A full explanation of all variables can be found in [the manual](https://parallelvirtualcluster.readthedocs.io/en/latest/manuals/ansible/).
## Using
# Using
*NOTE:* These roles expect a Debian 10.X (Buster) or Debian 11.X (Bullseye) system specifically.
This is currently the only operating environment supported for PVC.
*NOTE:* These roles expect a Debian 12.X (Bookworm) system specifically (as of PVC 0.9.100).
This is currently the only operating environment supported for PVC. This role MAY work
on Debian derivatives but this is not guaranteed!
*NOTE:* All non-`default` directories under `group_vars/` and `files/`, and the `hosts` file,
are ignored by this Git repository. It is advisable to manage these files securely
@ -51,19 +78,3 @@ For full details, please see the general [PVC install documentation](https://par
0. Run the `pvc.yml` playbook against the servers. If this is the very first run for a given
cluster, use the `-e do_bootstrap=yes` variable to ensure the Ceph, Patroni, and PVC
clusters are initialized.
## License
Copyright (C) 2018-2021 Joshua M. Boniface <joshua@boniface.me>
This repository, and all contained files, is free software: you can
redistribute it and/or modify it under the terms of the GNU General
Public License as published by the Free Software Foundation, version 3.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.

View File

@ -60,72 +60,80 @@ ipmi:
password: "{{ root_password }}"
pvc:
username: "host"
password: ""
password: "" # Set a random password here
# > use pwgen to generate
hosts:
"pvchv1": # This name MUST match the Ansible inventory_hostname's first portion, i.e. "inventory_hostname.split('.')[0]"
hostname: pvchv1-lom # A valid short name (e.g. from /etc/hosts) or an FQDN must be used here and it must resolve to address.
"hv1": # This name MUST match the Ansible inventory_hostname's first portion, i.e. "inventory_hostname.split('.')[0]"
hostname: hv1-lom # A valid short name (e.g. from /etc/hosts) or an FQDN must be used here and it must resolve to address.
# PVC connects to this *hostname* for fencing.
address: 192.168.100.101
address: 10.100.0.101 # The IPMI address should usually be in the "upstream" network, but can be routed if required
netmask: 255.255.255.0
gateway: 192.168.100.1
channel: 1 # Optional: defaults to "1" if not set
"pvchv2": # This name MUST match the Ansible inventory_hostname's first portion, i.e. "inventory_hostname.split('.')[0]"
hostname: pvchv2-lom # A valid short name (e.g. from /etc/hosts) or an FQDN must be used here and it must resolve to address.
gateway: 10.100.0.254
channel: 1 # Optional: defaults to "1" if not set; defines the IPMI LAN channel which is usually 1
"hv2": # This name MUST match the Ansible inventory_hostname's first portion, i.e. "inventory_hostname.split('.')[0]"
hostname: hv2-lom # A valid short name (e.g. from /etc/hosts) or an FQDN must be used here and it must resolve to address.
# PVC connects to this *hostname* for fencing.
address: 192.168.100.102
netmask: 255.255.255.0
gateway: 192.168.100.1
channel: 1 # Optional: defaults to "1" if not set
"pvchv3": # This name MUST match the Ansible inventory_hostname's first portion, i.e. "inventory_hostname.split('.')[0]"
hostname: pvchv3-lom # A valid short name (e.g. from /etc/hosts) or an FQDN must be used here and it must resolve to address.
channel: 1 # Optional: defaults to "1" if not set; defines the IPMI LAN channel which is usually 1
"hv3": # This name MUST match the Ansible inventory_hostname's first portion, i.e. "inventory_hostname.split('.')[0]"
hostname: hv3-lom # A valid short name (e.g. from /etc/hosts) or an FQDN must be used here and it must resolve to address.
# PVC connects to this *hostname* for fencing.
address: 192.168.100.103
netmask: 255.255.255.0
gateway: 192.168.100.1
channel: 1 # Optional: defaults to "1" if not set
channel: 1 # Optional: defaults to "1" if not set; defines the IPMI LAN channel which is usually 1
# IPMI user configuration
# > Adjust this based on the specific hardware you are using; the cluster_hardware variable is
# used as the key in this dictionary.
# > If you run multiple clusters with different hardware, it may be prudent to move this to an
# 'all' group_vars file instead.
ipmi_user_configuration:
"default":
channel: 1
admin:
id: 1
role: 0x4 # ADMINISTRATOR
username: "{{ ipmi['users']['admin']['username'] }}"
password: "{{ ipmi['users']['admin']['password'] }}"
pvc:
id: 2
role: 0x4 # ADMINISTRATOR
channel: 1 # The IPMI user channel, usually 1
admin: # Configuration for the Admin user
id: 1 # The user ID, usually 1 for the Admin user
role: 0x4 # ADMINISTRATOR privileges
username: "{{ ipmi['users']['admin']['username'] }}" # Loaded from the above section
password: "{{ ipmi['users']['admin']['password'] }}" # Loaded from the above section
pvc: # Configuration for the PVC user
id: 2 # The user ID, usually 2 for the PVC user
role: 0x4 # ADMINISTRATOR privileges
username: "{{ ipmi['users']['pvc']['username'] }}"
password: "{{ ipmi['users']['pvc']['password'] }}"
# Log rotation configuration
# > The defaults here are usually sufficient and should not need to be changed without good reason
logrotate_keepcount: 7
logrotate_interval: daily
# Root email name (usually "root")
# > Can be used to send email destined for the root user (e.g. cron reports) to a real email address if desired
username_email_root: root
# Hosts entries
# > Define any static `/etc/hosts` entries here; the provided example shows the format but should be removed
hosts:
- name: test
ip: 127.0.0.1
ip: 1.2.3.4
# Administrative shell users for the cluster
# > These users will be permitted SSH access to the cluster, with the user created automatically and its
# SSH public keys set based on the provided lists. In addition, all keys will be allowed access to the
# Ansible deploy user for managing the cluster
admin_users:
- name: "myuser"
uid: 500
- name: "myuser" # Set the username
uid: 500 # Set the UID; the first admin user should be 500, then 501, 502, etc.
keys:
# These SSH public keys will be added if missing
- "ssh-ed25519 MyKey 2019-06"
removed:
# These SSH public keys will be removed if present
- "ssh-ed25519 ObsoleteKey 2017-01"
# Backup user SSH user keys, for remote backups separate from administrative users (e.g. rsync)
# > Uncomment to activate this functionality.
# > Useful for tools like BackupPC (the authors preferred backup tool) or remote rsync backups.
#backup_keys:
# - "ssh-ed25519 MyKey 2019-06"
@ -133,43 +141,70 @@ admin_users:
# > The "type" can be one of three NIC types: "nic" for raw NIC devices, "bond" for ifenslave bonds,
# or "vlan" for vLAN interfaces. The PVC role will write out an interfaces file matching these specs.
# > Three names are reserved for the PVC-specific interfaces: upstream, cluster, and storage; others
# may be used at will to describe the other devices.
# > All devices should be using the newer device name format (i.e. enp1s0f0 instead of eth0).
# > In this example configuration, the "upstream" device is an LACP bond of the first two onboard NICs,
# with the two other PVC networks being vLANs on top of this device.
# > Usually, the Upstream network provides Internet connectivity for nodes in the cluster, and all
# nodes are part of it regardless of function for this reason; an optional, advanced, configuration
# will have only coordinators in the upstream network, however this configuration is out of the scope
# of this role.
# may be used at will to describe the other devices. These devices have IP info which is then written
# into `pvc.conf`.
# > All devices should be using the predictable device name format (i.e. enp1s0f0 instead of eth0). If
# you do not know these names, consult the manual of your selected node hardware, or boot a Linux
# LiveCD to see the generated interface configuration.
# > This example configuration is one the author uses frequently, to demonstrate all possible options.
# First, two base NIC devices are set with some custom ethtool options; these are optional of course.
# The "timing" value for a "custom_options" entry must be "pre" or "post". The command can include $IFACE
# which is written as-is (to be interpreted by Debian ifupdown at runtime).
# Second, a bond interface is created on top of the two NIC devices in 802.3ad (LACP) mode with high MTU.
# Third, the 3 PVC interfaces are created as vLANs (1000, 1001, and 1002) on top of the bond.
# This should cover most normal usecases, though consult the template files for more detail if needed.
networks:
"upstream":
device: "bondU"
type: "bond"
bond_mode: "802.3ad"
enp1s0f0:
device: enp1s0f0
type: nic
mtu: 9000 # Forms a post-up ip link set $IFACE mtu statement; a high MTU is recommended for optimal backend network performance
custom_options:
- timing: pre # Forms a pre-up statement
command: ethtool -K $IFACE rx-gro-hw off
- timing: post # Forms a post-up statement
command: sysctl -w net.ipv6.conf.$IFACE.accept_ra=0
enp1s0f1:
device: enp1s0f1
type: nic
mtu: 9000 # Forms a post-up ip link set $IFACE mtu statement; a high MTU is recommended for optimal backend network performance
custom_options:
- timing: pre # Forms a pre-up statement
command: ethtool -K $IFACE rx-gro-hw off
- timing: post # Forms a post-up statement
command: sysctl -w net.ipv6.conf.$IFACE.accept_ra=0
bond0:
device: bond0
type: bond
bond_mode: 802.3ad # Can also be active-backup for active-passive failover, but LACP is advised
bond_devices:
- "enp1s0f0"
- "enp1s0f1"
mtu: 1500
domain: "{{ local_domain }}"
netmask: "24"
subnet: "192.168.100.0"
floating_ip: "192.168.100.10"
gateway_ip: "192.168.100.1"
"cluster":
device: "vlan1001"
type: "vlan"
raw_device: "bondU"
mtu: 1500
domain: "pvc-cluster.local"
netmask: "24"
subnet: "10.0.0.0"
floating_ip: "10.0.0.254"
"storage":
device: "vlan1002"
type: "vlan"
raw_device: "bondU"
mtu: 1500
domain: "pvc-storage.local"
netmask: "24"
subnet: "10.0.1.0"
floating_ip: "10.0.1.254"
- enp1s0f0
- enp1s0f1
mtu: 9000 # Forms a post-up ip link set $IFACE mtu statement; a high MTU is recommended for optimal backend network performance
upstream:
device: vlan1000
type: vlan
raw_device: bond0
mtu: 1500 # Use a lower MTU on upstream for compatibility with upstream networks to avoid fragmentation
domain: "{{ local_domain }}" # This should be the local_domain for the upstream network
subnet: 10.100.0.0 # The CIDR subnet address without the netmask
netmask: 24 # The CIDR netmask
floating_ip: 10.100.0.250 # The floating IP used by the cluster primary coordinator; should be a high IP that won't conflict with any node IDs
gateway_ip: 10.100.0.254 # The default gateway IP
cluster:
device: vlan1001
type: vlan
raw_device: bond0
mtu: 9000 # Use a higher MTU on cluster for performance
domain: pvc-cluster.local # This domain is arbitrary; using this default example is a good practice
subnet: 10.0.0.0 # The CIDR subnet address without the netmask; this should be an UNROUTED network (no gateway)
netmask: 24 # The CIDR netmask
floating_ip: 10.0.0.254 # The floating IP used by the cluster primary coordinator; should be a high IP that won't conflict with any node IDs
storage:
device: vlan1002
type: vlan
raw_device: bond0
mtu: 9000 # Use a higher MTU on storage for performance
domain: pvc-storage.local # This domain is arbitrary; using this default example is a good practice
subnet: 10.0.1.0 # The CIDR subnet address without the netmask; this should be an UNROUTED network (no gateway)
netmask: 24 # The CIDR netmask
floating_ip: 10.0.1.254 # The floating IP used by the cluster primary coordinator; should be a high IP that won't conflict with any node IDs

View File

@ -1,125 +1,128 @@
---
# Logging configuration (uncomment to override defaults)
# These default options are generally best for most clusters; override these if you want more granular
# control over the logging output of the PVC system.
#pvc_log_to_file: False # Log to a file in /var/log/pvc
#pvc_log_to_stdout: True # Log to stdout (i.e. journald)
#pvc_log_to_zookeeper: True # Log to Zookeeper (required for 'node log' commands)
#pvc_log_colours: True # Log colourful prompts for states instead of text
#pvc_log_dates: False # Log dates (useful with log_to_file, not useful with log_to_stdout as journald adds these)
#pvc_log_keepalives: True # Log keepalive event every pvc_keepalive_interval seconds
#pvc_log_keepalive_cluster_details: True # Log cluster details (VMs, OSDs, load, etc.) duing keepalive events
#pvc_log_keepalive_plugin_details: True # Log health plugin details (messages) suring keepalive events
#pvc_log_console_lines: 1000 # The number of VM console log lines to store in Zookeeper for 'vm log' commands.
#pvc_log_node_lines: 2000 # The number of node log lines to store in Zookeeper for 'node log' commands.
# These default options are generally best for most clusters; override these if you want more granular
# control over the logging output of the PVC system.
#pvc_log_to_stdout: True # Log to stdout (i.e. journald)
#pvc_log_to_file: False # Log to files in /var/log/pvc
#pvc_log_to_zookeeper: True # Log to Zookeeper; required for 'node log' commands to function, but writes a lot of data to Zookeeper - disable if using very small system disks
#pvc_log_colours: True # Log colourful prompts for states instead of text
#pvc_log_dates: False # Log dates (useful with log_to_file, not useful with log_to_stdout as journald adds these)
#pvc_log_keepalives: True # Log each keepalive event every pvc_keepalive_interval seconds
#pvc_log_keepalive_cluster_details: True # Log cluster details (VMs, OSDs, load, etc.) duing keepalive events
#pvc_log_keepalive_plugin_details: True # Log health plugin details (messages) suring keepalive events
#pvc_log_console_lines: 1000 # The number of VM console log lines to store in Zookeeper for 'vm log' commands.
#pvc_log_node_lines: 2000 # The number of node log lines to store in Zookeeper for 'node log' commands.
# Timing and fencing configuration (uncomment to override defaults)
# These default options are generally best for most clusters; override these if you want more granular
# control over the timings of various areas of the cluster, for instance if your hardware is slow or error-prone.
#pvc_vm_shutdown_timeout: 180 # Number of seconds before a 'shutdown' VM is forced off
#pvc_keepalive_interval: 5 # Number of seconds between keepalive ticks
#pvc_monitoring_interval: 15 # Number of seconds between monitoring plugin runs
#pvc_fence_intervals: 6 # Number of keepalive ticks before a node is considered dead
#pvc_suicide_intervals: 0 # Number of keepalive ticks before a node consideres itself dead (0 to disable)
#pvc_fence_successful_action: migrate # What to do with VMs when a fence is successful (migrate, None)
#pvc_fence_failed_action: None # What to do with VMs when a fence is failed (migrate, None) - migrate is DANGEROUS without pvc_suicide_intervals set to < pvc_fence_intervals
#pvc_migrate_target_selector: mem # The selector to use for migrating VMs if not explicitly set; one of mem, memfree, load, vcpus, vms
# These default options are generally best for most clusters; override these if you want more granular
# control over the timings of various areas of the cluster, for instance if your hardware is slow or error-prone.
# DO NOT lower most of these values; this will NOT provide "more reliability", but the contrary.
#pvc_vm_shutdown_timeout: 180 # Number of seconds before a 'shutdown' VM is forced off
#pvc_keepalive_interval: 5 # Number of seconds between keepalive ticks
#pvc_monitoring_interval: 15 # Number of seconds between monitoring plugin runs
#pvc_fence_intervals: 6 # Number of keepalive ticks before a node is considered dead
#pvc_suicide_intervals: 0 # Number of keepalive ticks before a node consideres itself dead and forcibly restarts itself (0 to disable, recommended)
#pvc_fence_successful_action: migrate # What to do with VMs when a fence is successful (migrate, None)
#pvc_fence_failed_action: None # What to do with VMs when a fence is failed (migrate, None) - migrate is DANGEROUS without pvc_suicide_intervals set to < pvc_fence_intervals
#pvc_migrate_target_selector: mem # The selector to use for migrating VMs if not explicitly set by the VM; one of mem, memfree, load, vcpus, vms
# Client API basic configuration
pvc_api_listen_address: "{{ pvc_upstream_floatingip }}"
pvc_api_listen_port: "7370"
pvc_api_secret_key: "" # Use pwgen to generate
pvc_api_listen_address: "{{ pvc_upstream_floatingip }}" # This should usually be the upstream floating IP
pvc_api_listen_port: "7370" # This can be any port, including low ports, if desired, but be mindful of port conflicts
pvc_api_secret_key: "" # Use pwgen to generate
# Client API user tokens
# Create a token (random UUID or password) for each user you wish to have access to the PVC API.
# The first token will always be used for the "local" connection, and thus at least one token MUST be defined.
# WARNING: All tokens function at the same privilege level and provide FULL CONTROL over the cluster. Keep them secret!
pvc_api_enable_authentication: True
pvc_api_tokens:
# - description: "myuser"
# token: "a3945326-d36c-4024-83b3-2a8931d7785a"
- description: "myuser" # The description is purely cosmetic for current iterations of PVC
token: "a3945326-d36c-4024-83b3-2a8931d7785a" # The token should be random for security; use uuidgen or pwgen to generate
# PVC API SSL configuration
# Use these options to enable SSL for the API listener, providing security over WAN connections.
# There are two options for defining the SSL certificate and key to use:
# a) Set both pvc_api_ssl_cert_path and pvc_api_ssl_key_path to paths to an existing SSL combined (CA + cert) certificate and key, respectively, on the system.
# b) Set both pvc_api_ssl_cert and pvc_api_ssl_key to the raw PEM-encoded contents of an SSL combined (CA + cert) certificate and key, respectively, which will be installed under /etc/pvc.
# a) Set both pvc_api_ssl_cert_path and pvc_api_ssl_key_path to paths to an existing SSL combined (CA + intermediate + cert) certificate and key, respectively, on the system.
# b) Set both pvc_api_ssl_cert and pvc_api_ssl_key to the raw PEM-encoded contents of an SSL combined (CA + intermediate + cert) certificate and key, respectively, which will be installed under /etc/pvc.
# If the _path options are non-empty, the raw entries are ignored and will not be used.
pvc_api_enable_ssl: False
pvc_api_ssl_cert_path:
pvc_api_enable_ssl: False # Enable SSL listening; this is highly recommended when using the API over the Internet!
pvc_api_ssl_cert_path: "" # Set a path here, or...
pvc_api_ssl_cert: >
# A RAW CERTIFICATE FILE, installed to /etc/pvc/api-cert.pem
pvc_api_ssl_key_path:
# Enter a A RAW CERTIFICATE FILE content, installed to /etc/pvc/api-cert.pem
pvc_api_ssl_key_path: "" # Set a path here, or...
pvc_api_ssl_key: >
# A RAW KEY FILE, installed to /etc/pvc/api-key.pem
# Enter a A RAW KEY FILE content, installed to /etc/pvc/api-key.pem
# Ceph storage configuration
pvc_ceph_storage_secret_uuid: "" # Use uuidgen to generate
# Database configuration
pvc_dns_database_name: "pvcdns"
pvc_dns_database_user: "pvcdns"
pvc_dns_database_name: "pvcdns" # Should usually be "pvcdns" unless there is good reason to change it
pvc_dns_database_user: "pvcdns" # Should usually be "pvcdns" unless there is good reason to change it
pvc_dns_database_password: "" # Use pwgen to generate
pvc_api_database_name: "pvcapi"
pvc_api_database_user: "pvcapi"
pvc_api_database_name: "pvcapi" # Should usually be "pvcapi" unless there is good reason to change it
pvc_api_database_user: "pvcapi" # Should usually be "pvcapi" unless there is good reason to change it
pvc_api_database_password: "" # Use pwgen to generate
pvc_replication_database_user: "replicator"
pvc_replication_database_user: "replicator" # Should be "replicator" for Patroni
pvc_replication_database_password: "" # Use pwgen to generate
pvc_superuser_database_user: "postgres"
pvc_superuser_database_user: "postgres" # Should be "postgres"
pvc_superuser_database_password: "" # Use pwgen to generate
# Network routing configuration
# > The ASN should be a private ASN number.
# > The list of routers are those which will learn routes to the PVC client networks via BGP;
# they should speak BGP and allow sessions from the PVC nodes.
# The ASN should be a private ASN number, usually "65500"
# The list of routers are those which will learn routes to the PVC client networks via BGP;
# they should speak BGP and allow sessions from the PVC nodes.
# If you do not have any upstream BGP routers, e.g. if you wish to use static routing to and from managed networks, leave the list empty
pvc_asn: "65500"
pvc_routers:
- "192.168.100.1"
- "10.100.0.254"
# PVC Node list
# > Every node configured with this playbook must be specified in this list.
# Every node configured with this playbook must be specified in this list.
pvc_nodes:
- hostname: "pvchv1" # This name MUST match the Ansible inventory_hostname's first portion, i.e. "inventory_hostname.split('.')[0]"
is_coordinator: yes
node_id: 1
router_id: "192.168.100.11"
upstream_ip: "192.168.100.11"
- hostname: "hv1" # This name MUST match the Ansible inventory_hostname's first portion, i.e. "inventory_hostname.split('.')[0]"
is_coordinator: yes # Coordinators should be set to "yes", hypervisors to "no"
node_id: 1 # Should match the number portion of the hostname
router_id: "10.100.0.1"
upstream_ip: "10.100.0.1"
cluster_ip: "10.0.0.1"
storage_ip: "10.0.1.1"
ipmi_host: "{{ ipmi['hosts']['pvchv1']['hostname'] }}" # Note the node hostname key in here
ipmi_host: "{{ ipmi['hosts']['hv1']['hostname'] }}" # Note the node hostname as the key here
ipmi_user: "{{ ipmi['users']['pvc']['username'] }}"
ipmi_password: "{{ ipmi['users']['pvc']['password'] }}"
cpu_tuning: # Example of cpu_tuning overrides per-node, only relevant if enabled; see below
system_cpus: 2
osd_cpus: 2
- hostname: "pvchv2" # This name MUST match the Ansible inventory_hostname's first portion, i.e. "inventory_hostname.split('.')[0]"
cpu_tuning: # Example of cpu_tuning overrides per-node, only relevant if enabled; see below
system_cpus: 2 # Number of CPU cores (+ their hyperthreads) to allocate to the system
osd_cpus: 2 # Number of CPU cores (+ their hyperthreads) to allocate to the storage OSDs
- hostname: "hv2" # This name MUST match the Ansible inventory_hostname's first portion, i.e. "inventory_hostname.split('.')[0]"
is_coordinator: yes
node_id: 2
router_id: "192.168.100.12"
upstream_ip: "192.168.100.12"
router_id: "10.100.0.2"
upstream_ip: "10.100.0.2"
cluster_ip: "10.0.0.2"
storage_ip: "10.0.1.2"
ipmi_host: "{{ ipmi['hosts']['pvchv2']['hostname'] }}" # Note the node hostname key in here
ipmi_host: "{{ ipmi['hosts']['hv2']['hostname'] }}" # Note the node hostname as the key here
ipmi_user: "{{ ipmi['users']['pvc']['username'] }}"
ipmi_password: "{{ ipmi['users']['pvc']['password'] }}"
- hostname: "pvchv3" # This name MUST match the Ansible inventory_hostname's first portion, i.e. "inventory_hostname.split('.')[0]"
- hostname: "hv3" # This name MUST match the Ansible inventory_hostname's first portion, i.e. "inventory_hostname.split('.')[0]"
is_coordinator: yes
node_id: 3
router_id: "192.168.100.13"
upstream_ip: "192.168.100.13"
router_id: "10.100.0.3"
upstream_ip: "10.100.0.3"
cluster_ip: "10.0.0.3"
storage_ip: "10.0.1.3"
ipmi_host: "{{ ipmi['hosts']['pvchv3']['hostname'] }}" # Note the node hostname key in here
ipmi_host: "{{ ipmi['hosts']['hv3']['hostname'] }}" # Note the node hostname as the key here
ipmi_user: "{{ ipmi['users']['pvc']['username'] }}"
ipmi_password: "{{ ipmi['users']['pvc']['password'] }}"
# Bridge device entry
# This device is passed to PVC and is used when creating bridged networks. Normal managed networks are
# created on top of the "cluster" interface defined below, however bridged networks must be created
# directly on an underlying non-vLAN network device. This can be the same underlying device as the
# upstream/cluster/storage networks (especially if the upstream network device is not a vLAN itself),
# or a different device separate from the other 3 main networks.
pvc_bridge_device: bondU # Replace based on your network configuration
pvc_bridge_mtu: 1500 # Replace based on your network configuration
# This device is used when creating bridged networks. Normal managed networks are created on top of the
# "cluster" interface defined below, however bridged networks must be created directly on an underlying
# non-vLAN network device. This can be the same underlying device as the upstream/cluster/storage networks
# (especially if the upstream network device is not a vLAN itself), or a different device separate from the
# other 3 main networks.
pvc_bridge_device: bond0 # Replace based on your network configuration
pvc_bridge_mtu: 1500 # Replace based on your network configuration; bridges will have this MTU by default unless otherwise specified
# SR-IOV device configuration
# SR-IOV enables the passing of hardware-virtualized network devices (VFs), created on top of SR-IOV-enabled
@ -154,7 +157,6 @@ pvc_sriov_enable: False
# > ADVANCED TUNING: These options are strongly recommended due to the performance gains possible, but
# most users would be able to use the default without too much issue. Read the following notes
# carefully to determine if this setting should be enabled in your cluster.
# > NOTE: CPU tuning is only supported on Debian Bullseye (11) or newer
# > NOTE: CPU tuning playbooks require jmespath (e.g. python3-jmespath) installed on the controller node
# > Defines CPU tuning/affinity options for various subsystems within PVC. This is useful to
# help limit the impact that noisy elements may have on other elements, e.g. busy VMs on
@ -298,6 +300,88 @@ pvc_autobackup:
# This example shows an fusermount3 unmount (e.g. for SSHFS) leveraging the backup_root_path variable
# - "/usr/bin/fusermount3 -u {backup_root_path}"
# PVC VM automirrors
# > PVC supports automirrors, which can perform automatic snapshot-level VM mirrors of selected
# virtual machines based on tags. The mirrors are fully managed on a consistent schedule, and
# include both full and incremental varieties.
# > To solve the shared storage issue and ensure mirrors are taken off-cluster, automaticmounting
# of remote filesystems is supported by automirror.
pvc_automirror:
# Enable or disable automirror
# > If disabled, no timers or "/etc/pvc/automirror.yaml" configuration will be installed, and any
# existing timers or configuration will be REMOVED on each run (even if manually created).
# > Since automirror is an integrated PVC CLI feature, the command will always be available regardless
# of this setting, but without this option enabled, the lack of a "/etc/pvc/automirror.yaml" will
# prevent its use.
enabled: no
# List of possible remote clusters to mirror to
destinations:
# The name of the cluster, used in tags (e.g. 'automirror:cluster2')
cluster2:
# The destination address, either an IP or an FQDN the destination API is reachable at
address: pvc.cluster2.mydomain.tld
# The destination port (usually 7370)
port: 7370
# The API prefix (usually '/api/v1') without a trailing slash
prefix: "/api/v1"
# The API key of the destination
key: 00000000-0000-0000-0000-000000000000
# Whether or not to use SSL for the connection
ssl: yes
# Whether or not to verify SSL for the connection
verify_ssl: yes
# Storage pool for VMs on the destination
pool: vms
# The default destination to send to, for VMs tagged without an explicit cluster
# > This is required even if there is only one destination!
default_destination: cluster2
# Set the VM tag(s) which will be selected for automirror
# > Automirror selects VMs based on their tags. If a VM has a tag present in this list, it will be
# selected for automirror at runtime; if not it will be ignored.
# > Usually, the tag "automirror" here is sufficient; the administrator should then add this tag
# to any VM(s) they want to use automirrors. However, any tag may be specified to keep the tag list
# cleaner and more focused, should the administrator choose to.
# > A cluster can be explicitly set by suffixing `:clustername` from the cluster list to the tag.
# > If a VM has multiple `:clustername` tags, it will be mirrored to all of them sequentially
# using the same source snapshot.
tags:
- automirror
# Automirror scheduling
schedule:
# Mirrors are performed at regular intervals via a systemd timer
# > This default schedule performs a mirror every 4 hours starting at midnight
# > These options use a systemd timer date string; see "man systemd.time" for details
time: "*-*-* 00/4:00:00"
# The retention count specifies how many mirrors should be kept, based on the destination count
# for a given VM and cluster pairing to prevent failed mirrors from being cleaned up on the source
# > Retention cleanup is run after each full mirror, and thus, that mirror is counted in this number
# > For example, a value of 7 means that there will always be at least 7 mirrors on the remote side.
# When a new full mirror is taken, the oldest (i.e. 8rd) mirror is removed.
# > Thus, this schedule combined with this retention will ensure there's always 24 hours of mirrors.
retention: 7
# Set reporting options for automirrors
# NOTE: By default, pvc-ansible installs a local Postfix MTA and Postfix sendmail to send emails
# This may not be what you want! If you want an alternate sendmail MTA (e.g. msmtp) you must install it
# yourself in a custom role!
reporting:
# Enable or disable email reporting; if disabled ("no"), no reports are ever sent
enabled: no
# Email a report to these addresses; at least one MUST be specified if enabled
emails:
- myuser@domain.tld
- otheruser@domain.tld
# Email a report on the specified job results
# > These options are like this for clarity. Functionally, only "error" changes anything:
# * If yes & yes, all results send a report.
# * If yes & no, all results send a report.
# * If no & yes, only errors send a report.
# * If no & no, no reports are ever sent; functionally equivalent to setting enabled:no above.
report_on:
# Report on a successful job (all snapshots were sent successfully)
success: no
# Report on an error (at least one snapshot was not sent successfully)
error: yes
# Configuration file networks
# > Taken from base.yml's configuration; DO NOT MODIFY THIS SECTION.
pvc_upstream_device: "{{ networks['upstream']['device'] }}"

View File

@ -160,6 +160,12 @@
become_user: root
gather_facts: yes
tasks:
- name: wait 15 seconds for system to stabilize
pause:
seconds: 15
become: no
connection: local
- name: unset PVC maintenance mode
command: pvc cluster maintenance off
run_once: yes

View File

@ -244,6 +244,12 @@
become_user: root
gather_facts: yes
tasks:
- name: wait 15 seconds for system to stabilize
pause:
seconds: 15
become: no
connection: local
- name: unset PVC maintenance mode
command: pvc cluster maintenance off
run_once: yes

View File

@ -46,6 +46,9 @@
autoremove: yes
autoclean: yes
package:
- gunicorn
- python3-gunicorn
- python3-setuptools
- pvc-client-cli
- pvc-daemon-common
- pvc-daemon-api
@ -87,9 +90,9 @@
- pvcworkerd
- pvchealthd
- name: wait 15 seconds for system to stabilize
- name: wait 30 seconds for system to stabilize
pause:
seconds: 15
seconds: 30
become: no
connection: local
@ -108,9 +111,9 @@
command: systemctl reset-failed
when: packages.changed
- name: wait 15 seconds for system to stabilize
- name: wait 30 seconds for system to stabilize
pause:
seconds: 15
seconds: 30
become: no
connection: local
@ -145,6 +148,12 @@
- pvcapid
run_once: yes
- name: wait 30 seconds for system to stabilize
pause:
seconds: 30
become: no
connection: local
- name: unset PVC maintenance mode on first node
command: pvc cluster maintenance off
run_once: yes

View File

@ -1,229 +0,0 @@
---
- hosts: all
remote_user: deploy
become: yes
become_user: root
gather_facts: yes
serial: 1
tasks:
- name: set PVC maintenance mode
command: pvc cluster maintenance on
- name: secondary node
command: "pvc node secondary {{ ansible_hostname }}"
ignore_errors: yes
- name: wait 30 seconds for system to stabilize
pause:
seconds: 30
become: no
connection: local
- name: flush node
command: "pvc node flush {{ ansible_hostname }} --wait"
- name: ensure VMs are migrated away
shell: "virsh list | grep running | wc -l"
register: virshcount
failed_when: virshcount.stdout != "0"
until: virshcount.stdout == "0"
retries: 60
delay: 10
- name: make sure all VMs have migrated
shell: "pvc node info {{ ansible_hostname }} | grep '^Domain State' | awk '{ print $NF }'"
register: pvcflush
failed_when: pvcflush.stdout != 'flushed'
until: pvcflush.stdout == 'flushed'
retries: 60
delay: 10
- name: wait 15 seconds for system to stabilize
pause:
seconds: 15
become: no
connection: local
- name: stop PVC daemon cleanly
service:
name: pvcnoded
state: stopped
- name: stop Zookeeper daemon cleanly
service:
name: zookeeper
state: stopped
- name: wait 15 seconds for system to stabilize
pause:
seconds: 15
become: no
connection: local
- name: set OSD noout
command: pvc storage osd set noout
- name: get running OSD services
shell: "systemctl | awk '{ print $1 }' | grep 'ceph-osd@[0-9]*.service'"
ignore_errors: yes
register: osd_services
- name: stop Ceph OSD daemons cleanly
service:
name: "{{ item }}"
state: stopped
ignore_errors: yes
with_items: "{{ osd_services.stdout_lines }}"
- name: stop Ceph Monitor daemon cleanly
service:
name: "ceph-mon@{{ ansible_hostname }}"
state: stopped
ignore_errors: yes
- name: stop Ceph Manager daemon cleanly
service:
name: "ceph-mgr@{{ ansible_hostname }}"
state: stopped
ignore_errors: yes
- name: wait 30 seconds for system to stabilize
pause:
seconds: 30
become: no
connection: local
- name: remove possible obsolete cset configuration
file:
dest: /etc/systemd/system/ceph-osd@.service.d
state: absent
- name: replace sources.list entries will bullseye
replace:
dest: "{{ item }}"
regexp: "buster"
replace: "bullseye"
with_items:
- /etc/apt/sources.list
- name: remove security entry
lineinfile:
dest: /etc/apt/sources.list
regexp: "security.debian.org"
state: absent
- name: update apt cache
apt:
update_cache: yes
- name: install python-is-python3
apt:
name: python-is-python3
state: latest
- name: apt dist upgrade and cleanup
apt:
update_cache: yes
autoremove: yes
autoclean: yes
upgrade: dist
- name: clean up obsolete kernels
command: /usr/local/sbin/kernel-cleanup.sh
- name: clean up obsolete packages
command: /usr/local/sbin/dpkg-cleanup.sh
- name: clean apt archives
file:
dest: /var/cache/apt/archives
state: absent
- name: regather facts
setup:
- name: include base role
import_role:
name: base
- name: include pvc role
import_role:
name: pvc
- name: apt full upgrade and cleanup
apt:
update_cache: yes
autoremove: yes
autoclean: yes
upgrade: full
- name: remove obsolete database directories
file:
dest: "{{ item }}"
state: absent
with_items:
- "/etc/postgresql/13"
- "/var/lib/postgresql/13"
- name: restart system
reboot:
post_reboot_delay: 15
reboot_timeout: 1800
- name: make sure all OSDs are active
shell: "ceph osd stat | grep 'osds:' | awk '{ if ( $1 == $3 ) { print \"OK\" } else { print \"NOK\" } }'"
register: osdstat
failed_when: osdstat.stdout == "NOK"
until: osdstat.stdout == "OK"
retries: 60
delay: 10
- name: make sure all PGs have recovered
shell: "ceph health | grep -wo 'Degraded data redundancy'"
register: cephhealth
failed_when: cephhealth.stdout == "Degraded data redundancy'"
until: cephhealth.stdout == ""
retries: 60
delay: 10
- name: unset OSD noout
command: pvc storage osd unset noout
- name: unflush node
command: "pvc node ready {{ ansible_hostname }} --wait"
- name: make sure all VMs have returned
shell: "pvc node info {{ ansible_hostname }} | grep '^Domain State' | awk '{ print $NF }'"
register: pvcunflush
failed_when: pvcunflush.stdout != 'ready'
until: pvcunflush.stdout == 'ready'
retries: 60
delay: 10
- name: wait 30 seconds for system to stabilize
pause:
seconds: 30
become: no
connection: local
- name: reset any systemd failures
command: systemctl reset-failed
- name: wait 30 seconds for system to stabilize
pause:
seconds: 30
become: no
connection: local
- name: unset PVC maintenance mode
command: pvc cluster maintenance off
- hosts: all
remote_user: deploy
become: yes
become_user: root
gather_facts: yes
tasks:
- name: disable insecure global id reclaim in Ceph
command: ceph config set mon auth_allow_insecure_global_id_reclaim false
run_once: yes

View File

@ -55,17 +55,24 @@
failed_when: check_output.stdout == ""
- name: stop and mask patroni service on followers to perform database upgrade (later)
service:
name: patroni
systemd_service:
name: patroni.service
state: stopped
masked: yes
run_once: yes
delegate_to: "{{ item }}"
loop: "{{ patroni_followers }}"
- name: perform a backup of the primary
shell:
cmd: "sudo -u postgres /usr/bin/pg_dumpall > upgrade_dump.sql"
chdir: "/var/lib/postgresql"
run_once: yes
delegate_to: "{{ patroni_leader }}"
- name: stop and mask patroni service on leader to perform database upgrade (later)
service:
name: patroni
systemd_service:
name: patroni.service
state: stopped
masked: yes
run_once: yes
@ -185,7 +192,7 @@
- ceph-osd@.service.d
- ceph-osd-cpuset.service
- name: replace sources.list entries will bookworm
- name: replace sources.list entries with bookworm
replace:
dest: "{{ item }}"
regexp: "{{ debian_codename }}"
@ -474,29 +481,14 @@
delegate_to: "{{ patroni_leader }}"
- block:
- name: initialize new postgres database
shell:
cmd: "sudo -u postgres /usr/lib/postgresql/{{ new_postgres_version }}/bin/initdb -D /var/lib/postgresql/{{ new_postgres_version }}/pvc"
chdir: "/var/lib/postgresql"
- name: enable data checksums in new database
shell:
cmd: "sudo -u postgres /usr/lib/postgresql/{{ new_postgres_version }}/bin/pg_checksums --enable /var/lib/postgresql/{{ new_postgres_version }}/pvc"
chdir: "/var/lib/postgresql"
- name: run postgres upgrade
shell:
cmd: "sudo -u postgres /usr/lib/postgresql/{{ new_postgres_version }}/bin/pg_upgrade -b {{ old_postgres_bin_dir }} -d /var/lib/postgresql/patroni/pvc -D /var/lib/postgresql/{{ new_postgres_version }}/pvc"
chdir: "/var/lib/postgresql"
- name: move old postgres database out of the way
shell:
cmd: "sudo -u postgres mv /var/lib/postgresql/patroni/pvc /var/lib/postgresql/patroni/pvc.old"
chdir: "/var/lib/postgresql"
- name: move new postgres database into place
- name: initialize new postgres database
shell:
cmd: "sudo -u postgres mv /var/lib/postgresql/{{ new_postgres_version }}/pvc /var/lib/postgresql/patroni/pvc"
cmd: "sudo -u postgres /usr/lib/postgresql/{{ new_postgres_version }}/bin/initdb -D /var/lib/postgresql/patroni/pvc"
chdir: "/var/lib/postgresql"
- name: ensure recovery.conf is absent
@ -508,10 +500,31 @@
shell: "/usr/share/zookeeper/bin/zkCli.sh -server {{ ansible_hostname }}:2181 deleteall /patroni/pvc"
- name: start patroni service on leader
service:
name: patroni
systemd_service:
name: patroni.service
state: started
masked: no
- name: wait 15 seconds for system to stabilize
pause:
seconds: 15
become: no
connection: local
- name: import backup of the primary
shell:
cmd: "sudo -u postgres /usr/bin/psql < upgrade_dump.sql"
chdir: "/var/lib/postgresql"
- name: apply schema updates
shell: /usr/share/pvc/pvc-api-db-upgrade
ignore_errors: yes
- name: remove temporary backup
file:
dest: /var/lib/postgresql/upgrade_dump.sql
state: absent
run_once: yes
delegate_to: "{{ patroni_leader }}"
@ -524,8 +537,8 @@
loop: "{{ patroni_followers }}"
- name: start patroni service on followers
service:
name: patroni
systemd_service:
name: patroni.service
state: started
masked: no
run_once: yes
@ -544,10 +557,38 @@
delegate_to: "{{ item }}"
loop: "{{ ansible_play_hosts }}"
- name: wait 30 seconds for system to stabilize
pause:
seconds: 30
become: no
connection: local
- name: set first node as primary coordinator
command: "pvc node primary --wait {{ ansible_play_hosts[0].split('.')[0] }}"
run_once: yes
delegate_to: "{{ ansible_play_hosts[0] }}"
- name: wait 15 seconds for system to stabilize
pause:
seconds: 15
become: no
connection: local
# Play 5: Final role updates to normalize cluster
- hosts: all
remote_user: deploy
become: yes
become_user: root
gather_facts: yes
tasks:
- name: include base role
import_role:
name: base
- name: include pvc role
import_role:
name: pvc
- name: unset PVC maintenance mode
command: pvc cluster maintenance off
run_once: yes

View File

@ -1,24 +1,125 @@
#!/bin/bash
#!/usr/bin/env python3
# -*- encoding: utf-8; py-indent-offset: 4 -*-
# Ceph check for Check_MK
# Installed by PVC ansible
# (c) 2021 Heinlein Support GmbH
# Robert Sander <r.sander@heinlein-support.de>
CMK_VERSION="2.1.0"
# This is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by
# the Free Software Foundation in version 2. This file is distributed
# in the hope that it will be useful, but WITHOUT ANY WARRANTY; with-
# out even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE. See the GNU General Public License for more de-
# ails. You should have received a copy of the GNU General Public
# License along with GNU Make; see the file COPYING. If not, write
# to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
# Boston, MA 02110-1301 USA.
USER=client.admin
KEYRING=/etc/ceph/ceph.client.admin.keyring
import json
import rados
import os, os.path
import subprocess
import socket
if [ -n "$USER" ] && [ -n "$KEYRING" ]; then
CEPH_CMD="ceph -n $USER --keyring=$KEYRING"
echo "<<<ceph_status>>>"
$CEPH_CMD -s -f json-pretty
if OUT="$($CEPH_CMD df detail --format json)"; then
echo "<<<ceph_df_json:sep(0)>>>"
$CEPH_CMD version --format json
echo "$OUT"
else
# fallback for old versions if json output is not available
echo "<<<ceph_df>>>"
$CEPH_CMD df detail
fi
fi
class RadosCMD(rados.Rados):
def command_mon(self, cmd, params=None):
data = {'prefix': cmd, 'format': 'json'}
if params:
data.update(params)
return self.mon_command(json.dumps(data), b'', timeout=5)
def command_mgr(self, cmd):
return self.mgr_command(json.dumps({'prefix': cmd, 'format': 'json'}), b'', timeout=5)
def command_osd(self, osdid, cmd):
return self.osd_command(osdid, json.dumps({'prefix': cmd, 'format': 'json'}), b'', timeout=5)
def command_pg(self, pgid, cmd):
return self.pg_command(pgid, json.dumps({'prefix': cmd, 'format': 'json'}), b'', timeout=5)
ceph_config='/etc/ceph/ceph.conf'
ceph_client='client.admin'
try:
with open(os.path.join(os.environ['MK_CONFDIR'], 'ceph.cfg'), 'r') as config:
for line in config.readlines():
if '=' in line:
key, value = line.strip().split('=')
if key == 'CONFIG':
ceph_config = value
if key == 'CLIENT':
ceph_client = value
except FileNotFoundError:
pass
cluster = RadosCMD(conffile=ceph_config, name=ceph_client)
cluster.connect()
hostname = socket.gethostname().split('.', 1)[0]
fqdn = socket.getfqdn()
res = cluster.command_mon("status")
if res[0] == 0:
status = json.loads(res[1])
mons = status.get('quorum_names', [])
fsid = status.get("fsid", "")
if hostname in mons or fqdn in mons:
# only on MON hosts
print("<<<cephstatus:sep(0)>>>")
print(json.dumps(status))
res = cluster.command_mon("df", params={'detail': 'detail'})
if res[0] == 0:
print("<<<cephdf:sep(0)>>>")
print(json.dumps(json.loads(res[1])))
localosds = []
res = cluster.command_mon("osd metadata")
if res[0] == 0:
print("<<<cephosdbluefs:sep(0)>>>")
out = {'end': {}}
for osd in json.loads(res[1]):
if osd.get('hostname') in [hostname, fqdn]:
localosds.append(osd['id'])
if "container_hostname" in osd:
adminsocket = "/run/ceph/%s/ceph-osd.%d.asok" % (fsid, osd['id'])
else:
adminsocket = "/run/ceph/ceph-osd.%d.asok" % osd['id']
if os.path.exists(adminsocket):
chunks = []
try:
sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
sock.connect(adminsocket)
sock.sendall(b'{"prefix": "perf dump"}\n')
sock.shutdown(socket.SHUT_WR)
while len(chunks) == 0 or chunks[-1] != b'':
chunks.append(sock.recv(4096))
sock.close()
chunks[0] = chunks[0][4:]
except:
chunks = [b'{"bluefs": {}}']
out[osd['id']] = {'bluefs': json.loads(b"".join(chunks))['bluefs']}
print(json.dumps(out))
osddf_raw = cluster.command_mon("osd df")
osdperf_raw = cluster.command_mon("osd perf")
if osddf_raw[0] == 0 and osdperf_raw[0] == 0:
osddf = json.loads(osddf_raw[1])
osdperf = json.loads(osdperf_raw[1])
osds = []
for osd in osddf['nodes']:
if osd['id'] in localosds:
osds.append(osd)
summary = osddf['summary']
perfs = []
if 'osd_perf_infos' in osdperf:
for osd in osdperf['osd_perf_infos']:
if osd['id'] in localosds:
perfs.append(osd)
if 'osdstats' in osdperf and 'osd_perf_infos' in osdperf['osdstats']:
for osd in osdperf['osdstats']['osd_perf_infos']:
if osd['id'] in localosds:
perfs.append(osd)
print("<<<cephosd:sep(0)>>>")
out = {'df': {'nodes': osds},
'perf': {'osd_perf_infos': perfs}}
print(json.dumps(out))

View File

@ -303,18 +303,21 @@
- prometheus-node-exporter
- prometheus-process-exporter
when: enable_prometheus_exporters is defined and enable_prometheus_exporters
tags: base-packages
- name: install Intel-specific microcode package
apt:
name:
- intel-microcode
when: "'GenuineIntel' in ansible_processor"
tags: base-packages
- name: install AMD-specific microcode package
apt:
name:
- amd64-microcode
when: "'AuthenticAMD' in ansible_processor"
tags: base-packages
- name: install cleanup scripts
template:
@ -412,6 +415,14 @@
mode: 0440
tags: base-system
# needrestart
- name: write the needrestart pvc blacklist file
template:
src: "etc/needrestart/conf.d/pvc.conf.j2"
dest: "/etc/needrestart/conf.d/pvc.conf"
mode: 0444
tags: base-system
# dns
- name: write the hosts config
template:
@ -435,6 +446,7 @@
file:
state: directory
dest: "/usr/share/grub-pvc"
tags: base-bootloader
- name: install PVC grub style
copy:
@ -443,6 +455,7 @@
with_items:
- background.png
- theme.txt
tags: base-bootloader
- name: install GRUB configuration
template:
@ -451,6 +464,7 @@
notify:
- update grub
- regenerate uefi entries
tags: base-bootloader
# Plymouth theme
- name: install PVC Plymouth theme archive
@ -460,14 +474,17 @@
creates: "/usr/share/plymouth/themes/pvc"
owner: root
group: root
tags: base-bootloader
- name: install PVC Plymouth background file
copy:
src: "usr/share/grub-pvc/background.png"
dest: "/usr/share/plymouth/themes/pvc/background-tile.png"
tags: base-bootloader
- name: set PVC Plymouth theme as the default
command: plymouth-set-default-theme -R pvc
tags: base-bootloader
# syslog
- name: install rsyslog and logrotate configs
@ -664,13 +681,15 @@
- name: run update-motd on change
command: /usr/local/sbin/update-motd.sh
when: profile_scripts.changed
when: profile_scripts.changed and newhost is not defined and not newhost
tags: base-shell
ignore_errors: yes
- name: run update-issue on change
command: /usr/local/sbin/update-issue.sh
when: profile_scripts.changed
when: profile_scripts.changed and newhost is not defined and not newhost
tags: base-shell
ignore_errors: yes
# htop
- name: install htop configuration
@ -799,6 +818,7 @@
tags:
- users
- user-root
ignore_errors: yes
- name: remove Root known_hosts
file:
@ -845,6 +865,7 @@
tags:
- users
- user-backup
ignore_errors: yes
- name: create backup .ssh directory
file:
@ -910,6 +931,7 @@
tags:
- users
- user-deploy
ignore_errors: yes
- name: ensure homedir has right permissions
file:
@ -974,6 +996,7 @@
tags:
- users
- user-admin
ignore_errors: yes
- name: ensure homedir has right permissions
file:

View File

@ -0,0 +1,4 @@
# needrestart - Blacklist PVC binaries
# {{ ansible_managed }}
push @{$nrconf{blacklist}}, qr(^/usr/share/pvc/);

View File

@ -3,7 +3,14 @@
auto {{ network.value['device'] }}
iface {{ network.value['device'] }} inet {{ network.value['mode']|default('manual') }}
{% if network.value['custom_options'] is defined %}
{% for option in network.value['custom_options'] %}
{{ option['timing'] }}-up {{ option['command'] }}
{% endfor %}
{% endif %}
{% if network.value['mtu'] is defined %}
post-up ip link set $IFACE mtu {{ network.value['mtu'] }}
{% endif %}
{% if network.value['type'] == 'bond' %}
bond-mode {{ network.value['bond_mode'] }}
bond-slaves {{ network.value['bond_devices'] | join(' ') }}

View File

@ -15,7 +15,7 @@ NAME="$( grep '^NAME=' /etc/os-release | awk -F'"' '{ print $2 }' )"
VERSION_ID="$( cat /etc/debian_version )"
VERSION_CODENAME="$( grep '^VERSION_CODENAME=' /etc/os-release | awk -F'=' '{ print $2 }' )"
DEBVER="${NAME} ${VERSION_ID} \"$(tc ${VERSION_CODENAME} )\""
PVCVER="$( /usr/share/pvc/pvcnoded.py --version )"
PVCVER="$( /usr/share/pvc/pvcnoded.py --version || echo 'Unknown' )"
echo >> $TMPFILE
echo -e "\033[01;34mParallel Virtual Cluster (PVC) Node\033[0m" >> $TMPFILE
@ -23,7 +23,7 @@ echo -e "> \033[1;34mNode name:\033[0m \033[01;36m$(hostname)\033[0m" >> $TMPFIL
echo -e "> \033[1;34mCluster name:\033[0m \033[01;36m{{ cluster_group }}\033[0m" >> $TMPFILE
echo -e "> \033[1;34mSystem type:\033[0m PVC \033[1;36m{% if is_coordinator %}coordinator{% else %}hypervisor{% endif %}\033[0m node" >> $TMPFILE
echo -e "> \033[1;34mPVC version:\033[0m ${PVCVER}" >> $TMPFILE
echo -e "> \033[1;34mBase system:\033[0m {{ ansible_lsb.description }}" >> $TMPFILE
echo -e "> \033[1;34mBase system:\033[0m {{ ansible_lsb.description if ansible_lsb else 'Debian GNU/Linux' }}" >> $TMPFILE
echo -e "> \033[1;34mKernel:\033[0m $(/bin/uname -vm)" >> $TMPFILE
# Get machine information

View File

@ -91,6 +91,36 @@ pvc_autobackup:
# Example: Unmount the {backup_root_path}
# - "/usr/bin/umount {backup_root_path}"
# PVC VM automirrors
# This is uncommented but disabled so this is not installed by default; enable it in your per-cluster configs
# Automirror allows the sending of VM snapshots automatically to an external cluster.
# These values are default; ensure you modify them in your own group_vars to match your system!
pvc_automirror:
enabled: no
destinations:
cluster2:
address: pvc.cluster2.mydomain.tld
port: 7370
prefix: "/api/v1"
key: 00000000-0000-0000-0000-000000000000
ssl: yes
verify_ssl: yes
pool: vms
default_destination: cluster2
tags:
- automirror
schedule:
time: "*-*-* 00/4:00:00"
retention: 7
reporting:
enabled: no
emails:
- myuser@domain.tld
- otheruser@domain.tld
report_on:
success: no
error: yes
# Coordinators & Nodes list
pvc_nodes:
- hostname: "pvc1" # The full ansible inventory hostname of the node

View File

@ -0,0 +1,24 @@
---
- name: disable timer units
systemd:
name: "{{ item }}"
state: stopped
enabled: false
loop:
- pvc-automirror.timer
ignore_errors: yes
- name: remove automirror configurations
file:
dest: "{{ item }}"
state: absent
loop:
- /etc/systemd/system/pvc-automirror.timer
- /etc/systemd/system/pvc-automirror.service
register: systemd
ignore_errors: yes
- name: reload systemd to apply changes
command: systemctl daemon-reload
when: systemd.changed

View File

@ -0,0 +1,23 @@
---
- name: install automirror systemd units
template:
src: "automirror/pvc-automirror.{{ item }}.j2"
dest: "/etc/systemd/system/pvc-automirror.{{ item }}"
loop:
- timer
- service
register: systemd
- name: reload systemd to apply changes
command: systemctl daemon-reload
when: systemd.changed
- name: enable timer units
systemd:
name: "{{ item }}"
state: started
enabled: true
loop:
- pvc-automirror.timer

View File

@ -0,0 +1,7 @@
---
- include: enable.yml
when: pvc_automirror.enabled
- include: disable.yml
when: not pvc_automirror.enabled

View File

@ -18,6 +18,7 @@
groups: ceph
append: yes
with_items: "{{ admin_users }}"
ignore_errors: yes
- name: install sysctl tweaks
template:
@ -171,10 +172,10 @@
# Single-node cluster ruleset
- name: remove default CRUSH replicated_rule ruleset
command: ceph osd crush rule rm replicated_rule
when: "{{ pvc_nodes | length }} == 1"
when: pvc_nodes | length == 1
- name: add single-node CRUSH replicated_rule ruleset
command: ceph osd crush rule create-replicated replicated_rule default osd
when: "{{ pvc_nodes | length }} == 1"
when: pvc_nodes | length == 1
- meta: flush_handlers

View File

@ -28,6 +28,7 @@
name: libvirt-qemu
groups: ceph
append: yes
ignore_errors: yes
- name: add admin users to libvirt groups
user:
@ -35,6 +36,7 @@
groups: kvm,libvirt
append: yes
with_items: "{{ admin_users }}"
ignore_errors: yes
- name: install libvirt configurations
template:

View File

@ -65,6 +65,11 @@
tags: pvc-autobackup
when: pvc_autobackup is defined
# Install PVC automirror
- include: automirror/main.yml
tags: pvc-automirror
when: pvc_automirror is defined
# Install CPU tuning
- include: cputuning/main.yml
tags: pvc-cputuning

View File

@ -1,24 +0,0 @@
---
# PVC Autobackup configuration
# {{ ansible_managed }}
autobackup:
backup_root_path: {{ pvc_autobackup.backup_root_path }}
backup_root_suffix: {{ pvc_autobackup.backup_root_suffix }}
backup_tags:
{% for tag in pvc_autobackup.backup_tags %}
- {{ tag }}
{% endfor %}
backup_schedule:
full_interval: {{ pvc_autobackup.schedule.full_interval }}
full_retention: {{ pvc_autobackup.schedule.full_retention }}
auto_mount:
enabled: {{ pvc_autobackup.auto_mount.enabled }}
mount_cmds:
{% for cmd in pvc_autobackup.auto_mount.mount_cmds %}
- "{{ cmd }}"
{% endfor %}
unmount_cmds:
{% for cmd in pvc_autobackup.auto_mount.unmount_cmds %}
- "{{ cmd }}"
{% endfor %}

View File

@ -0,0 +1,9 @@
[Unit]
Description=[Cron] PVC VM automirror
[Service]
Type=oneshot
IgnoreSIGPIPE=false
KillMode=process
ExecCondition=/usr/bin/pvc --quiet node is-primary
ExecStart=/usr/bin/pvc --quiet vm automirror --cron {% if pvc_automirror.reporting.enabled and (pvc_automirror.reporting.report_on.error or pvc_automirror.reporting.report_on.success) %}--email-report {{ pvc_automirror.reporting.emails|join(',') }}{% endif %} {% if pvc_automirror.reporting.enabled and (pvc_automirror.reporting.report_on.error and not pvc_automirror.reporting.report_on.success) %}--email-errors-only{% endif %}

View File

@ -0,0 +1,9 @@
[Unit]
Description=[Timer] PVC VM automirror
[Timer]
Unit=pvc-automirror.service
OnCalendar={{ pvc_automirror.schedule.time }}
[Install]
WantedBy=pvc.target

View File

@ -174,7 +174,7 @@ autobackup:
full_interval: {{ pvc_autobackup.schedule.full_interval }}
full_retention: {{ pvc_autobackup.schedule.full_retention }}
auto_mount:
enabled: {{ pvc_autobackup.auto_mount.enabled }}
enabled: {{ 'yes' if pvc_autobackup.auto_mount.enabled else 'no' }}
mount_cmds:
{% for cmd in pvc_autobackup.auto_mount.mount_cmds %}
- "{{ cmd }}"
@ -184,6 +184,26 @@ autobackup:
- "{{ cmd }}"
{% endfor %}
{% endif %}
automirror:
{% if pvc_automirror is defined and pvc_automirror.enabled is defined and pvc_automirror.enabled %}
destinations:
{% for destination in pvc_automirror.destinations %}
{{ destination }}:
address: {{ pvc_automirror.destinations[destination].address }}
port: {{ pvc_automirror.destinations[destination].port }}
prefix: {{ pvc_automirror.destinations[destination].prefix }}
key: {{ pvc_automirror.destinations[destination].key }}
ssl: {{ 'yes' if pvc_automirror.destinations[destination].ssl else 'no' }}
verify_ssl: {{ 'yes' if pvc_automirror.destinations[destination].verify_ssl else 'no' }}
pool: {{ pvc_automirror.destinations[destination].pool }}
{% endfor %}
default_destination: {{ pvc_automirror.default_destination }}
mirror_tags:
{% for tag in pvc_automirror.tags %}
- {{ tag }}
{% endfor %}
keep_snapshots: {{ pvc_automirror.schedule.retention }}
{% endif %}
# VIM modeline, requires "set modeline" in your VIMRC
# vim: expandtab shiftwidth=2 tabstop=2 filetype=yaml

View File

@ -5,10 +5,10 @@
dataDir=/var/lib/zookeeper
# Set our tick time to 1 second
tickTime=1000
# Initialization can take up to 30 ticks
initLimit=30
# Syncing can take up to 15 ticks
syncLimit=15
# Initialization can take up to 5 ticks
initLimit=5
# Syncing can take up to 5 ticks
syncLimit=5
# Lower snapshot count from 100k to 10k
snapCount=10000
# Halve the snapshot size to 2GB