commit 4310da805f7f4b9ec9daf691c60cf8489d92a5a5 Author: Joshua M. Boniface Date: Wed Dec 29 22:31:01 2021 -0500 Initial commit of PVC Bootstrap system Adds the PVC Bootstrap system, which allows the automated deployment of one or more PVC clusters. diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..9c923c3 --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +*.pyc +*.tmp +*.swp diff --git a/README.md b/README.md new file mode 100644 index 0000000..b7fcb5f --- /dev/null +++ b/README.md @@ -0,0 +1,67 @@ +# PVC Bootstrap System + +The PVC bootstrap system provides a convenient way to deploy PVC clusters. Rather than manual node installation, this system provides a fully-automated deployment from node powering to cluster readiness, based on pre-configured values. It is useful if an administrator will deploy several PVC clusters or for repeated re-deployment for testing purposes. + +## Setup + +Setting up the PVC bootstrap system is fairly complicated and is mostly manual. This is due both to some requirements that cannot be satisfied by Debian packaging, and also to provide maximum flexibility to the administrator. However, some helper scripts are provided to automate some aspects, and the entire setup process is documented here. + +### Preparing to use the PVC Bootstrap system + +1. Prepare a Git repository to store cluster configurations. This can be done automatically with the `create-local-repo.sh` script in the [PVC Ansible](https://github.com/parallelvirtualcluster/pvc-ansible) repository. + +2. Create `group_vars` for each cluster you plan to bootstrap. Additionally, ensure you configure the `bootstrap.yml` file for each cluster with the relevant details of the hardware you will be using. This step can be repeated for each cluster in the future as new clusters are required, and the system will automatically pull changes to the local PVC repository once configured. + +### Preparing a PVC Bootstrap host + +1. The recommended OS for a PVC Bootstrap host is Debian GNU/Linux or a similar derivative. In terms of hardware, a small single-board computer like a Raspberry Pi or small desktop will work, as the host does not require significant CPU, memory, or disk resources. + +2. Install the required dependencies for the following steps: python3, python3-pip, and Ansible. + +3. Set up the network as detailed in the "Networking for Bootstrap" section. + +4. Create a working directory for `pvcbootstrapd`, usually `/srv/tftp` or something similar. + +5. Clone this repository under the working directory. + +6. Run the `./install-pvcbootstrapd.sh` script from the root of the repository to install the required systemd units and template configuration files. It will prompt for several configuration parameters. + +### Running the PVC Bootstrap daemon + +1. Edit the `/etc/pvc/pvcbootstrapd.yaml` configuration file to suit your needs. + +2. Start the `pvcbootstrapd.service` and `pvcbootstrapd-worker.service` units. + +3. Observe the logs for each service. + +### Networking for Bootstrap + +When using the pvcbootstrapd system, a dedicated network is required to provide bootstrap DHCP and TFTP to the cluster. This network can either have a dedicated, upstream router that does not provide DHCP, or the network can be routed with network address translation (NAT) through the bootstrap host. + +In bootstrap mode (as opposed to manual install mode), new nodes are configured with their interfaces as follows: + + * BMC: bootstrap + * Interface 1 (first among all LOM ports): bootstrap + * Interface 2+ (all other ports): LACP (802.3ad) bond0 + +The Bootstrap interfaces do DHCP from the bootstrap host, and are thus responsible for autoconfiguration. The remaining interfaces, in an LACP bond, are used to underlay the various standard PVC networks. + +Care must therefore be taken to ensure that the BMC and *first* lan-on-motherboard interface are connected as vLAN access ports in the bootstrap network, and that the remaining ports have some connectivity along the various configured PVC networks, before proceeding. + +Consider the following diagram for reference: + +![Per-Node Physical Connections](/docs/images/pvcbootstrapd-phy.png) + +![Overall Network Topology](/docs/images/pvcbootstrapd-net.png) + +### Deploying a Cluster + +1. Ensure the cluster configuration is committed to the repository, including the BMC MAC addresses, default IPMI credentials, and all other cluster configurations. + +2. Connect the network ports as outlined above. + +3. Connect power to the servers, but do not power on. + +4. Wait for the cluster bootstrapping to complete. + +5. Power off the servers and put them into production. diff --git a/bootstrap-daemon/clusters.yaml.sample b/bootstrap-daemon/clusters.yaml.sample new file mode 100644 index 0000000..73fd85a --- /dev/null +++ b/bootstrap-daemon/clusters.yaml.sample @@ -0,0 +1,7 @@ +--- +# clusters.yml +# This file defines a list of Clusters that pvcbootstrapd should be aware of. + +clusters: + - cluster1 + - cluster2 diff --git a/bootstrap-daemon/pvcbootstrapd-worker.service b/bootstrap-daemon/pvcbootstrapd-worker.service new file mode 100644 index 0000000..fde632b --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd-worker.service @@ -0,0 +1,16 @@ +# Parallel Virtual Cluster Provisioner API provisioner worker unit file + +[Unit] +Description = Parallel Virtual Cluster Bootstrap API worker +After = network-online.target + +[Service] +Type = simple +WorkingDirectory = /usr/share/pvc +Environment = PYTHONUNBUFFERED=true +Environment = PVC_CONFIG_FILE=/etc/pvc/pvcbootstrapd.yaml +ExecStart = /usr/share/pvc/pvcbootstrapd-worker.sh +Restart = on-failure + +[Install] +WantedBy = multi-user.target diff --git a/bootstrap-daemon/pvcbootstrapd-worker.sh b/bootstrap-daemon/pvcbootstrapd-worker.sh new file mode 100755 index 0000000..3063f81 --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd-worker.sh @@ -0,0 +1,40 @@ +#!/usr/bin/env bash + +# pvcbootstrapd-worker.py - API Celery worker daemon startup stub +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +CELERY_BIN="$( which celery )" + +# This absolute hackery is needed because Celery got the bright idea to change how their +# app arguments work in a non-backwards-compatible way with Celery 5. +case "$( cat /etc/debian_version )" in + 10.*) + CELERY_ARGS="worker --app pvcbootstrapd.flaskapi.celery --concurrency 99 --pool gevent --loglevel DEBUG" + ;; + 11.*) + CELERY_ARGS="--app pvcbootstrapd.flaskapi.celery worker --concurrency 99 --pool gevent --loglevel DEBUG" + ;; + *) + echo "Invalid Debian version found!" + exit 1 + ;; +esac + +${CELERY_BIN} ${CELERY_ARGS} +exit $? diff --git a/bootstrap-daemon/pvcbootstrapd.py b/bootstrap-daemon/pvcbootstrapd.py new file mode 100755 index 0000000..5d12e8f --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd.py @@ -0,0 +1,24 @@ +#!/usr/bin/env python3 + +# pvcbootstrapd.py - Bootstrap API daemon startup stub +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +import pvcbootstrapd.Daemon # noqa: F401 + +pvcbootstrapd.Daemon.entrypoint() diff --git a/bootstrap-daemon/pvcbootstrapd.service b/bootstrap-daemon/pvcbootstrapd.service new file mode 100644 index 0000000..1fe3616 --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd.service @@ -0,0 +1,16 @@ +# Parallel Virtual Cluster Bootstrap API daemon unit file + +[Unit] +Description = Parallel Virtual Cluster Bootstrap API daemon +After = network-online.target + +[Service] +Type = simple +WorkingDirectory = /usr/share/pvc +Environment = PYTHONUNBUFFERED=true +Environment = PVC_CONFIG_FILE=/etc/pvc/pvcbootstrapd.yaml +ExecStart = /usr/share/pvc/pvcbootstrapd.py +Restart = on-failure + +[Install] +WantedBy = multi-user.target diff --git a/bootstrap-daemon/pvcbootstrapd.yaml.sample b/bootstrap-daemon/pvcbootstrapd.yaml.sample new file mode 100644 index 0000000..f890c6b --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd.yaml.sample @@ -0,0 +1,91 @@ +--- +pvc: + # Enable debug mode + debug: true + + # Deploy username + deploy_username: deploy + + # Database (SQLite) configuration + database: + # Path to the database file + path: /srv/tftp/pvcbootstrapd.sql + + # Flask API configuration + api: + # Listen address + address: 10.199.199.254 + + # Listen port + port: 9999 + + # Redis Celery queue configuration + queue: + # Connect address + address: 127.0.0.1 + + # Connect port + port: 6379 + + # Redis path (almost always 0) + path: "/0" + + # DNSMasq DHCP configuration + dhcp: + # Listen address + address: 10.199.199.254 + + # Default gateway address + gateway: 10.199.199.1 + + # Local domain + domain: pvcbootstrap.local + + # DHCP lease range start + lease_start: 10.199.199.10 + + # DHCP lease range end + lease_end: 10.199.199.99 + + # DHCP lease time + lease_time: 1h + + # DNSMasq TFTP configuration + tftp: + # Root TFTP path (contents of the "buildpxe.sh" output directory; generally read-only) + root_path: "/srv/tftp/pvc-installer" + + # Per-host TFTP path (almost always "/host" under "root_path"; must be writable) + host_path: "/srv/tftp/pvc-installer/host" + + # PVC Ansible repository configuration + # Note: If "path" does not exist, "remote" will be cloned to it via Git using SSH private key "keyfile". + # Note: The VCS will be refreshed regularly via the API in response to webhooks. + ansible: + # Path to the VCS repository + path: "/var/home/joshua/pvc" + + # Clusters configuration file + clusters_file: "clusters.yml" + + # Path to the deploy key (if applicable) used to clone and pull the repository + keyfile: "/var/home/joshua/id_ed25519.joshua.key" + + # Git remote URI for the repository + remote: "ssh://git@git.bonifacelabs.ca:2222/bonifacelabs/pvc.git" + + # Git branch to use + branch: "master" + + # Filenames of the various group_vars components of a cluster + # Generally with pvc-ansible this will contain 2 files: "base.yml", and "pvc.yml"; refer to the + # pvc-ansible documentation and examples for details on these files. + # The third file, "bootstrap.yml", is used by pvcbootstrapd to map BMC MAC addresses to hosts and + # to simplify hardware detection. It must be present or the cluster will not be bootstrapped. + # Adjust these entries to match the actual filenames of your clusters; the pvc-ansible defaults + # are provided here. All clusters using this pvcbootstrapd instance must share identical filenames + # here. + cspec_files: + base: "base.yml" + pvc: "pvc.yml" + bootstrap: "bootstrap.yml" diff --git a/bootstrap-daemon/pvcbootstrapd.yaml.template b/bootstrap-daemon/pvcbootstrapd.yaml.template new file mode 100644 index 0000000..7a165f8 --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd.yaml.template @@ -0,0 +1,33 @@ +--- +pvc: + debug: true + deploy_username: DEPLOY_USERNAME + database: + path: ROOT_DIRECTORY/pvcbootstrapd.sql + api: + address: BOOTSTRAP_ADDRESS + port: 9999 + queue: + address: 127.0.0.1 + port: 6379 + path: "/0" + dhcp: + address: BOOTSTRAP_ADDRESS + gateway: BOOTSTRAP_ADDRESS + domain: pvcbootstrap.local + lease_start: BOOTSTRAP_DHCPSTART + lease_end: BOOTSTRAP_DHCPEND + lease_time: 1h + tftp: + root_path: "ROOT_DIRECTORY/tftp" + host_path: "ROOT_DIRECTORY/tftp/host" + ansible: + path: "ROOT_DIRECTORY/repo" + clusters_path: "clusters.yml" + keyfile: "ROOT_DIRECTORY/id_ed25519" + remote: "GIT_REMOTE" + branch: "GIT_BRANCH" + cspec_files: + base: "base.yml" + pvc: "pvc.yml" + bootstrap: "bootstrap.yml" diff --git a/bootstrap-daemon/pvcbootstrapd/Daemon.py b/bootstrap-daemon/pvcbootstrapd/Daemon.py new file mode 100755 index 0000000..431f4f8 --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/Daemon.py @@ -0,0 +1,242 @@ +#!/usr/bin/env python3 + +# Daemon.py - PVC HTTP API daemon +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +import os +import yaml +import signal + +import pvcbootstrapd.lib.dnsmasq as dnsmasqd +import pvcbootstrapd.lib.lib as lib +import pvcbootstrapd.lib.db as db +import pvcbootstrapd.lib.git as git +import pvcbootstrapd.lib.tftp as tftp +import pvcbootstrapd.lib.ansible as ansible + +from distutils.util import strtobool as dustrtobool + +# Daemon version +version = "0.1" + +# API version +API_VERSION = 1.0 + + +########################################################## +# Exceptions +########################################################## + + +class MalformedConfigurationError(Exception): + """ + An exception when parsing the PVC daemon configuration file + """ + + def __init__(self, error=None): + self.msg = f"ERROR: Configuration file is malformed: {error}" + + def __str__(self): + return str(self.msg) + + +########################################################## +# Helper Functions +########################################################## + + +def strtobool(stringv): + if stringv is None: + return False + if isinstance(stringv, bool): + return bool(stringv) + try: + return bool(dustrtobool(stringv)) + except Exception: + return False + + +########################################################## +# Configuration Parsing +########################################################## + +def get_config_path(): + try: + return os.environ["PVCD_CONFIG_FILE"] + except KeyError: + print('ERROR: The "PVCD_CONFIG_FILE" environment variable must be set.') + os._exit(1) + + +def read_config(): + pvcbootstrapd_config_file = get_config_path() + + print(f"Loading configuration from file '{pvcbootstrapd_config_file}'") + + # Load the YAML config file + with open(pvcbootstrapd_config_file, "r") as cfgfile: + try: + o_config = yaml.load(cfgfile, Loader=yaml.SafeLoader) + except Exception as e: + print(f"ERROR: Failed to parse configuration file: {e}") + os._exit(1) + + # Create the configuration dictionary + config = dict() + + # Get the base configuration + try: + o_base = o_config["pvc"] + except KeyError as k: + raise MalformedConfigurationError(f"Missing top-level category {k}") + + for key in ['debug', 'deploy_username']: + try: + config[key] = o_base[key] + except KeyError as k: + raise MalformedConfigurationError(f"Missing first-level key {k}") + + # Get the first-level categories + try: + o_database = o_base["database"] + o_api = o_base["api"] + o_queue = o_base["queue"] + o_dhcp = o_base["dhcp"] + o_tftp = o_base["tftp"] + o_ansible = o_base["ansible"] + except KeyError as k: + raise MalformedConfigurationError(f"Missing first-level category {k}") + + # Get the Datbase configuration + for key in ['path']: + try: + config[f"database_{key}"] = o_database[key] + except Exception: + raise MalformedConfigurationError(f"Missing second-level key '{key}' under 'database'") + + # Get the API configuration + for key in ['address', 'port']: + try: + config[f"api_{key}"] = o_api[key] + except Exception: + raise MalformedConfigurationError(f"Missing second-level key '{key}' under 'api'") + + # Get the queue configuration + for key in ['address', 'port', 'path']: + try: + config[f"queue_{key}"] = o_queue[key] + except Exception: + raise MalformedConfigurationError(f"Missing second-level key '{key}' under 'queue'") + + # Get the DHCP configuration + for key in ['address', 'gateway', 'domain', 'lease_start', 'lease_end', 'lease_time']: + try: + config[f"dhcp_{key}"] = o_dhcp[key] + except Exception: + raise MalformedConfigurationError(f"Missing second-level key '{key}' under 'dhcp'") + + # Get the TFTP configuration + for key in ['root_path', 'host_path']: + try: + config[f"tftp_{key}"] = o_tftp[key] + except Exception: + raise MalformedConfigurationError(f"Missing second-level key '{key}' under 'tftp'") + + # Get the Ansible configuration + for key in ['path', 'clusters_file', 'keyfile', 'remote', 'branch']: + try: + config[f"ansible_{key}"] = o_ansible[key] + except Exception: + raise MalformedConfigurationError(f"Missing second-level key '{key}' under 'ansible'") + + # Get the second-level categories under Ansible + try: + o_ansible_cspec_files = o_ansible['cspec_files'] + except KeyError as k: + raise MalformedConfigurationError(f"Missing second-level category {k} under 'ansible'") + + # Get the Ansible CSpec Files configuration + for key in ['base', 'pvc', 'bootstrap']: + try: + config[f"ansible_cspec_files_{key}"] = o_ansible_cspec_files[key] + except Exception: + raise MalformedConfigurationError(f"Missing third-level key '{key}' under 'ansible/cspec_files'") + + return config + + +config = read_config() + + +########################################################## +# Entrypoint +########################################################## + + +def entrypoint(): + import pvcbootstrapd.flaskapi as pvcbootstrapd # noqa: E402 + + # Print our startup messages + print("") + print("|----------------------------------------------------------|") + print("| |") + print("| ███████████ ▜█▙ ▟█▛ █████ █ █ █ |") + print("| ██ ▜█▙ ▟█▛ ██ |") + print("| ███████████ ▜█▙ ▟█▛ ██ |") + print("| ██ ▜█▙▟█▛ ███████████ |") + print("| |") + print("|----------------------------------------------------------|") + print("| Parallel Virtual Cluster Bootstrap API daemon v{0: <9} |".format(version)) + print("| Debug: {0: <49} |".format(str(config["debug"]))) + print("| API version: v{0: <42} |".format(API_VERSION)) + print( + "| Listen: {0: <48} |".format( + "{}:{}".format(config["api_address"], config["api_port"]) + ) + ) + print("|----------------------------------------------------------|") + print("") + + # Initialize the database + db.init_database(config) + + # Initialize the Ansible repository + git.init_repository(config) + + # Initialize the tftp root + tftp.init_tftp(config) + + # Start DNSMasq + dnsmasq = dnsmasqd.DNSMasq(config) + dnsmasq.start() + + def cleanup(retcode): + dnsmasq.stop() + exit(retcode) + + def term(signum="", frame=""): + print("Received TERM, exiting.") + cleanup(0) + + signal.signal(signal.SIGTERM, term) + signal.signal(signal.SIGINT, term) + signal.signal(signal.SIGQUIT, term) + + # Start Flask + pvcbootstrapd.app.run(config['api_address'], config['api_port'], use_reloader=False, threaded=False, processes=4) diff --git a/bootstrap-daemon/pvcbootstrapd/__init__.py b/bootstrap-daemon/pvcbootstrapd/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/bootstrap-daemon/pvcbootstrapd/dnsmasq-lease.py b/bootstrap-daemon/pvcbootstrapd/dnsmasq-lease.py new file mode 100755 index 0000000..e7bbe73 --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/dnsmasq-lease.py @@ -0,0 +1,122 @@ +#!/usr/bin/env python3 + +# dnsmasq-lease.py - DNSMasq lease interface for pvcnodedprov +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +from os import environ +from sys import argv +from requests import post +from json import dumps + +# Request log +# dnsmasq-dhcp[877466]: 2067194916 available DHCP range: 10.199.199.10 -- 10.199.199.19 +# dnsmasq-dhcp[877466]: 2067194916 DHCPDISCOVER(ens8) 52:54:00:34:36:40 +# dnsmasq-dhcp[877466]: 2067194916 tags: ens8 +# dnsmasq-dhcp[877466]: 2067194916 DHCPOFFER(ens8) 10.199.199.14 52:54:00:34:36:40 +# dnsmasq-dhcp[877466]: 2067194916 requested options: 1:netmask, 28:broadcast, 2:time-offset, 3:router, +# dnsmasq-dhcp[877466]: 2067194916 requested options: 15:domain-name, 6:dns-server, 12:hostname +# dnsmasq-dhcp[877466]: 2067194916 next server: 10.199.199.1 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 1 option: 53 message-type 2 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 54 server-identifier 10.199.199.1 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 51 lease-time 1h +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 58 T1 30m +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 59 T2 52m30s +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 1 netmask 255.255.255.0 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 28 broadcast 10.199.199.255 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 3 router 10.199.199.1 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 6 dns-server 10.199.199.1 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 8 option: 15 domain-name test.com +# dnsmasq-dhcp[877466]: 2067194916 available DHCP range: 10.199.199.10 -- 10.199.199.19 +# dnsmasq-dhcp[877466]: 2067194916 DHCPREQUEST(ens8) 10.199.199.14 52:54:00:34:36:40 +# dnsmasq-dhcp[877466]: 2067194916 tags: ens8 +# dnsmasq-dhcp[877466]: 2067194916 DHCPACK(ens8) 10.199.199.14 52:54:00:34:36:40 +# dnsmasq-dhcp[877466]: 2067194916 requested options: 1:netmask, 28:broadcast, 2:time-offset, 3:router, +# dnsmasq-dhcp[877466]: 2067194916 requested options: 15:domain-name, 6:dns-server, 12:hostname +# dnsmasq-dhcp[877466]: 2067194916 next server: 10.199.199.1 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 1 option: 53 message-type 5 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 54 server-identifier 10.199.199.1 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 51 lease-time 1h +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 58 T1 30m +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 59 T2 52m30s +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 1 netmask 255.255.255.0 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 28 broadcast 10.199.199.255 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 3 router 10.199.199.1 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 4 option: 6 dns-server 10.199.199.1 +# dnsmasq-dhcp[877466]: 2067194916 sent size: 8 option: 15 domain-name test.com +# dnsmasq-script[877466]: ['/var/home/joshua/dnsmasq-lease.py', 'add', '52:54:00:34:36:40', '10.199.199.14'] +# dnsmasq-script[877466]: environ({'DNSMASQ_INTERFACE': 'ens8', 'DNSMASQ_LEASE_EXPIRES': '1638422308', 'DNSMASQ_REQUESTED_OPTIONS': '1,28,2,3,15,6,12', 'DNSMASQ_TAGS': 'ens8', 'DNSMASQ_TIME_REMAINING': '3600', 'DNSMASQ_LOG_DHCP': '1', 'LC_CTYPE': 'C.UTF-8'}) + +# Renew log +# dnsmasq-dhcp[877466]: 1471211555 available DHCP range: 10.199.199.10 -- 10.199.199.19 +# dnsmasq-dhcp[877466]: 1471211555 DHCPREQUEST(ens8) 10.199.199.14 52:54:00:34:36:40 +# dnsmasq-dhcp[877466]: 1471211555 tags: ens8 +# dnsmasq-dhcp[877466]: 1471211555 DHCPACK(ens8) 10.199.199.14 52:54:00:34:36:40 +# dnsmasq-dhcp[877466]: 1471211555 requested options: 1:netmask, 28:broadcast, 2:time-offset, 3:router, +# dnsmasq-dhcp[877466]: 1471211555 requested options: 15:domain-name, 6:dns-server, 12:hostname +# dnsmasq-dhcp[877466]: 1471211555 next server: 10.199.199.1 +# dnsmasq-dhcp[877466]: 1471211555 sent size: 1 option: 53 message-type 5 +# dnsmasq-dhcp[877466]: 1471211555 sent size: 4 option: 54 server-identifier 10.199.199.1 +# dnsmasq-dhcp[877466]: 1471211555 sent size: 4 option: 51 lease-time 1h +# dnsmasq-dhcp[877466]: 1471211555 sent size: 4 option: 58 T1 30m +# dnsmasq-dhcp[877466]: 1471211555 sent size: 4 option: 59 T2 52m30s +# dnsmasq-dhcp[877466]: 1471211555 sent size: 4 option: 1 netmask 255.255.255.0 +# dnsmasq-dhcp[877466]: 1471211555 sent size: 4 option: 28 broadcast 10.199.199.255 +# dnsmasq-dhcp[877466]: 1471211555 sent size: 4 option: 3 router 10.199.199.1 +# dnsmasq-dhcp[877466]: 1471211555 sent size: 4 option: 6 dns-server 10.199.199.1 +# dnsmasq-dhcp[877466]: 1471211555 sent size: 8 option: 15 domain-name test.com +# dnsmasq-script[877466]: ['/var/home/joshua/dnsmasq-lease.py', 'old', '52:54:00:34:36:40', '10.199.199.14'] +# dnsmasq-script[877466]: environ({'DNSMASQ_INTERFACE': 'ens8', 'DNSMASQ_LEASE_EXPIRES': '1638422371', 'DNSMASQ_REQUESTED_OPTIONS': '1,28,2,3,15,6,12', 'DNSMASQ_TAGS': 'ens8', 'DNSMASQ_TIME_REMAINING': '3600', 'DNSMASQ_LOG_DHCP': '1', 'LC_CTYPE': 'C.UTF-8'}) + +action = argv[1] + +api_uri = environ.get('API_URI', 'http://127.0.0.1:9999/checkin/dnsmasq') +api_headers = { + 'ContentType': 'application/json' +} + +print(environ) + +if action in ['add']: + macaddr = argv[2] + ipaddr = argv[3] + api_data = dumps({ + 'action': action, + 'macaddr': macaddr, + 'ipaddr': ipaddr, + 'hostname': environ.get('DNSMASQ_SUPPLIED_HOSTNAME'), + 'client_id': environ.get('DNSMASQ_CLIENT_ID'), + 'expiry': environ.get('DNSMASQ_LEASE_EXPIRES'), + 'vendor_class': environ.get('DNSMASQ_VENDOR_CLASS'), + 'user_class': environ.get('DNSMASQ_USER_CLASS0') + }) + post(api_uri, headers=api_headers, data=api_data, verify=False) + +elif action in ['tftp']: + size = argv[2] + destaddr = argv[3] + filepath = argv[4] + api_data = dumps({ + 'action': action, + 'size': size, + 'destaddr': destaddr, + 'filepath': filepath + }) + post(api_uri, headers=api_headers, data=api_data, verify=False) + +exit(0) diff --git a/bootstrap-daemon/pvcbootstrapd/flaskapi.py b/bootstrap-daemon/pvcbootstrapd/flaskapi.py new file mode 100755 index 0000000..2bdea5a --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/flaskapi.py @@ -0,0 +1,235 @@ +#!/usr/bin/env python3 + +# pvcbootstrapd.py - PVC Cluster Auto-bootstrap +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +import flask +import json + +from pvcbootstrapd.Daemon import config, API_VERSION + +import pvcbootstrapd.lib.dnsmasq as dnsmasq +import pvcbootstrapd.lib.lib as lib +import pvcbootstrapd.lib.db as db +import pvcbootstrapd.lib.git as git +import pvcbootstrapd.lib.ansible as ansible + +from time import sleep +from threading import Thread, Event +from dataclasses import dataclass +from flask_restful import Resource, Api, abort +from celery import Celery +from celery.utils.log import get_task_logger + + +logger = get_task_logger(__name__) + + +# Create Flask app and set config values +app = flask.Flask(__name__) +blueprint = flask.Blueprint('api', __name__, url_prefix='') +api = Api(blueprint) +app.register_blueprint(blueprint) + +app.config["CELERY_BROKER_URL"] = f"redis://{config['queue_address']}:{config['queue_port']}{config['queue_path']}" + +celery = Celery(app.name, broker=app.config["CELERY_BROKER_URL"]) +celery.conf.update(app.config) + + +# +# Celery functions +# +@celery.task(bind=True) +def dnsmasq_checkin(self, data): + lib.dnsmasq_checkin(config, data) + + +@celery.task(bind=True) +def host_checkin(self, data): + lib.host_checkin(config, data) + + +# +# API routes +# +class API_Root(Resource): + def get(self): + """ + Return basic details of the API + --- + tags: + - root + responses: + 200: + description: OK + schema: + type: object + id: Message + properties: + message: + type: string + description: A text message describing the result + example: "The foo was successfully maxed" + """ + return { "message": "pvcbootstrapd API" }, 200 +api.add_resource(API_Root, '/') + + +class API_Checkin(Resource): + def get(self): + """ + Return checkin details of the API + --- + tags: + - checkin + responses: + 200: + description: OK + schema: + type: object + id: Message + """ + return { "message": "pvcbootstrapd API Checkin interface" }, 200 +api.add_resource(API_Checkin, '/checkin') + + +class API_Checkin_DNSMasq(Resource): + def post(self): + """ + Register a checkin from the DNSMasq subsystem + --- + tags: + - checkin + consumes: + - application/json + parameters: + - in: body + name: dnsmasq_checkin_event + description: An event checkin from an external bootstrap tool component. + schema: + type: object + required: + - action + properties: + action: + type: string + description: The action of the event. + example: "add" + macaddr: + type: string + description: (add, old) The MAC address from a DHCP request. + example: "ff:ff:ff:ab:cd:ef" + ipaddr: + type: string + description: (add, old) The IP address from a DHCP request. + example: "10.199.199.10" + hostname: + type: string + description: (add, old) The client hostname from a DHCP request. + example: "pvc-installer-live" + client_id: + type: string + description: (add, old) The client ID from a DHCP request. + example: "01:ff:ff:ff:ab:cd:ef" + vendor_class: + type: string + description: (add, old) The DHCP vendor-class option from a DHCP request. + example: "CPQRIB3 (HP Proliant DL360 G6 iLO)" + user_class: + type: string + description: (add, old) The DHCP user-class option from a DHCP request. + example: None + responses: + 200: + description: OK + schema: + type: object + id: Message + """ + try: + data = json.loads(flask.request.data) + except Exception as e: + logger.warn(e) + data = { 'action': None } + logger.info(f"Handling DNSMasq checkin for: {data}") + + task = dnsmasq_checkin.delay(data) + return { "message": "received checkin from DNSMasq" }, 200 +api.add_resource(API_Checkin_DNSMasq, '/checkin/dnsmasq') + + +class API_Checkin_Host(Resource): + def post(self): + """ + Register a checkin from the Host subsystem + --- + tags: + - checkin + consumes: + - application/json + parameters: + - in: body + name: host_checkin_event + description: An event checkin from an external bootstrap tool component. + schema: + type: object + required: + - action + properties: + action: + type: string + description: The action of the event. + example: "begin" + hostname: + type: string + description: The system hostname. + example: "hv1.mydomain.tld" + host_macaddr: + type: string + description: The MAC address of the system provisioning interface. + example: "ff:ff:ff:ab:cd:ef" + host_ipaddr: + type: string + description: The IP address of the system provisioning interface. + example: "10.199.199.11" + bmc_macaddr: + type: string + description: The MAC address of the system BMC interface. + example: "ff:ff:ff:01:23:45" + bmc_ipaddr: + type: string + description: The IP addres of the system BMC interface. + example: "10.199.199.10" + responses: + 200: + description: OK + schema: + type: object + id: Message + """ + try: + data = json.loads(flask.request.data) + except Exception as e: + data = { 'action': None } + logger.info(f"Handling Host checkin for: {data}") + + task = host_checkin.delay(data) + return { "message": "received checkin from Host" }, 200 +api.add_resource(API_Checkin_Host, '/checkin/host') diff --git a/bootstrap-daemon/pvcbootstrapd/lib/__init__.py b/bootstrap-daemon/pvcbootstrapd/lib/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/bootstrap-daemon/pvcbootstrapd/lib/ansible.py b/bootstrap-daemon/pvcbootstrapd/lib/ansible.py new file mode 100755 index 0000000..a63c98c --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/lib/ansible.py @@ -0,0 +1,63 @@ +#!/usr/bin/env python3 + +# ansible.py - PVC Cluster Auto-bootstrap Ansible libraries +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +import pvcbootstrapd.lib.git as git + +import ansible_runner +import tempfile +import yaml + +from time import sleep +from celery.utils.log import get_task_logger + + +logger = get_task_logger(__name__) + + +def run_bootstrap(config, cspec, cluster, nodes): + """ + Run an Ansible bootstrap against a cluster + """ + logger.debug(nodes) + + # Construct our temporary INI inventory string + logger.info(f"Constructing virtual Ansible inventory") + base_yaml = git.load_base_yaml(config, cluster.name) + local_domain = base_yaml.get('local_domain') + inventory = [f"""[{cluster.name}]"""] + for node in nodes: + inventory.append(f"""{node.name}.{local_domain} ansible_host={node.host_ipaddr}""") + inventory = '\n'.join(inventory) + logger.debug(inventory) + + # Waiting 30 seconds to ensure everything is booted an stabilized + logger.info("Waiting 30s before starting Ansible bootstrap.") + sleep(30) + + # Run the Ansible playbooks + with tempfile.TemporaryDirectory(prefix="pvc-ansible-bootstrap_") as pdir: + r = ansible_runner.run(private_data_dir=f"{pdir}", inventory=inventory, limit=f"{cluster.name}", playbook=f"{config['ansible_path']}/pvc.yml", extravars={"bootstrap": "yes"}) + logger.info("Final status:") + logger.info("{}: {}".format(r.status, r.rc)) + logger.info(r.stats) + if r.rc == 0: + git.commit_repository() + git.push_repository() diff --git a/bootstrap-daemon/pvcbootstrapd/lib/dataclasses.py b/bootstrap-daemon/pvcbootstrapd/lib/dataclasses.py new file mode 100755 index 0000000..612f9c5 --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/lib/dataclasses.py @@ -0,0 +1,49 @@ +#!/usr/bin/env python3 + +# dataclasses.py - PVC Cluster Auto-bootstrap dataclasses +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +from dataclasses import dataclass + + +@dataclass +class Cluster: + """ + An instance of a Cluster + """ + id: int + name: str + state: str + + +@dataclass +class Node: + """ + An instance of a Node + """ + id: int + cluster: str + state: str + name: str + nid: int + bmc_macaddr: str + bmc_iapddr: str + host_macaddr: str + host_ipaddr: str + diff --git a/bootstrap-daemon/pvcbootstrapd/lib/db.py b/bootstrap-daemon/pvcbootstrapd/lib/db.py new file mode 100755 index 0000000..89821c7 --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/lib/db.py @@ -0,0 +1,219 @@ +#!/usr/bin/env python3 + +# db.py - PVC Cluster Auto-bootstrap database libraries +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +import os +import sqlite3 +import contextlib +import json + +from time import sleep +from pvcbootstrapd.lib.dataclasses import Cluster, Node + + +# +# Database functions +# +@contextlib.contextmanager +def dbconn(db_path): + conn = sqlite3.connect(db_path) + conn.execute("PRAGMA foreign_keys = 1") + cur = conn.cursor() + yield cur + conn.commit() + conn.close() + + +def init_database(config): + db_path = config["database_path"] + if not os.path.isfile(db_path): + # Initializing the database + with dbconn(db_path) as cur: + # Table listing all clusters + cur.execute( + """CREATE TABLE clusters + (id INTEGER PRIMARY KEY AUTOINCREMENT, + name TEXT UNIQUE NOT NULL, + state TEXT NOT NULL)""" + ) + # Table listing all nodes + # FK: cluster -> clusters.id + cur.execute( + """CREATE TABLE nodes + (id INTEGER PRIMARY KEY AUTOINCREMENT, + cluster INTEGER NOT NULL, + state TEXT NOT NULL, + name TEXT UNIQUE NOT NULL, + nodeid INTEGER NOT NULL, + bmc_macaddr TEXT NOT NULL, + bmc_ipaddr TEXT NOT NULL, + host_macaddr TEXT NOT NULL, + host_ipaddr TEXT NOT NULL, + CONSTRAINT cluster_col FOREIGN KEY (cluster) REFERENCES clusters(id) ON DELETE CASCADE )""" + ) + + +# +# Cluster functions +# +def get_cluster(config, cid=None, name=None): + if cid is None and name is None: + return None + elif cid is not None: + findfield = 'id' + datafield = cid + elif name is not None: + findfield = 'name' + datafield = name + + with dbconn(config["database_path"]) as cur: + cur.execute( + f"""SELECT * FROM clusters WHERE {findfield} = ?""", + (datafield,) + ) + rows = cur.fetchall() + + if len(rows) > 0: + row = rows[0] + else: + return None + + return Cluster(row[0], row[1], row[2]) + + +def add_cluster(config, name, state): + with dbconn(config["database_path"]) as cur: + cur.execute( + """INSERT INTO clusters + (name, state) + VALUES + (?, ?)""", + (name, state) + ) + + return get_cluster(config, name=name) + + +def update_cluster_state(config, name, state): + with dbconn(config["database_path"]) as cur: + cur.execute( + """UPDATE clusters + SET state = ? + WHERE name = ?""", + (state, name) + ) + + return get_cluster(config, name=name) + + +# +# Node functions +# +def get_node(config, cluster_name, nid=None, name=None, bmc_macaddr=None): + cluster = get_cluster(config, name=cluster_name) + + if nid is None and name is None and bmc_macaddr is None: + return None + elif nid is not None: + findfield = 'id' + datafield = nid + elif bmc_macaddr is not None: + findfield = 'bmc_macaddr' + datafield = bmc_macaddr + elif name is not None: + findfield = 'name' + datafield = name + + with dbconn(config["database_path"]) as cur: + cur.execute( + f"""SELECT * FROM nodes WHERE {findfield} = ? AND cluster = ?""", + (datafield, cluster.id) + ) + rows = cur.fetchall() + + + if len(rows) > 0: + row = rows[0] + else: + return None + + return Node(row[0], cluster.name, row[2], row[3], row[4], row[5], row[6], row[7], row[8]) + + +def get_nodes_in_cluster(config, cluster_name): + cluster = get_cluster(config, name=cluster_name) + + with dbconn(config["database_path"]) as cur: + cur.execute( + """SELECT * FROM nodes WHERE cluster = ?""", + (cluster.id, ) + ) + rows = cur.fetchall() + + node_list = list() + for row in rows: + node_list.append( + Node(row[0], cluster.name, row[2], row[3], row[4], row[5], row[6], row[7], row[8]) + ) + + return node_list + + +def add_node(config, cluster_name, state, name, nodeid, bmc_macaddr, bmc_ipaddr, host_macaddr, host_ipaddr): + cluster = get_cluster(config, name=cluster_name) + + with dbconn(config["database_path"]) as cur: + cur.execute( + """INSERT INTO nodes + (cluster, state, name, nodeid, bmc_macaddr, bmc_ipaddr, host_macaddr, host_ipaddr) + VALUES + (?, ?, ?, ?, ?, ?, ?, ?)""", + (cluster.id, state, name, nodeid, bmc_macaddr, bmc_ipaddr, host_macaddr, host_ipaddr) + ) + + return get_node(config, cluster_name, name=name) + + +def update_node_state(config, cluster_name, name, state): + cluster = get_cluster(config, name=cluster_name) + + with dbconn(config["database_path"]) as cur: + cur.execute( + """UPDATE nodes + SET state = ? + WHERE name = ? AND cluster = ?""", + (state, name, cluster.id) + ) + + return get_node(config, cluster_name, name=name) + + +def update_node_addresses(config, cluster_name, name, bmc_macaddr, bmc_ipaddr, host_macaddr, host_ipaddr): + cluster = get_cluster(config, name=cluster_name) + + with dbconn(config["database_path"]) as cur: + cur.execute( + """UPDATE nodes + SET bmc_macaddr = ?, bmc_ipaddr = ?, host_macaddr = ?, host_ipaddr = ? + WHERE name = ? AND cluster = ?""", + (bmc_macaddr, bmc_ipaddr, host_macaddr, host_ipaddr, name, cluster.id) + ) + + return get_node(config, cluster_name, name=name) diff --git a/bootstrap-daemon/pvcbootstrapd/lib/dnsmasq.py b/bootstrap-daemon/pvcbootstrapd/lib/dnsmasq.py new file mode 100755 index 0000000..2766477 --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/lib/dnsmasq.py @@ -0,0 +1,108 @@ +#!/usr/bin/env python3 + +# dnsmasq.py - PVC Cluster Auto-bootstrap DNSMasq instance +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +import os +import flask +import click +import requests +import subprocess +import signal +import json +import pvcbootstrapd.lib.lib as lib +from time import sleep +from threading import Thread, Event +from dataclasses import dataclass +from flask_restful import Resource, Api, abort +from celery import Celery + + +class DNSMasq: + """ + Implementes a daemonized instance of DNSMasq for providing DHCP and TFTP services + + The DNSMasq instance listens on the configured 'dhcp_address', and instead of a "real" + leases database forwards requests to the 'dnsmasq-lease.py' script. This script will + then hit the pvcbootstrapd '/checkin' API endpoint to perform actions. + + TFTP is provided to automate the bootstrap of a node, providing the pvc-installer + over TFTP as well as a seed configuration which is created by the API. + """ + def __init__(self, config): + self.environment = { + "API_URI": f"http://{config['api_address']}:{config['api_port']}/checkin/dnsmasq" + } + self.dnsmasq_cmd = [ + "/usr/sbin/dnsmasq", + "--bogus-priv", + "--no-hosts", + "--dhcp-authoritative", + "--filterwin2k", + "--expand-hosts", + "--domain-needed", + f"--domain={config['dhcp_domain']}", + f"--local=/{config['dhcp_domain']}/", + "--log-facility=-", + "--log-dhcp", + "--keep-in-foreground", + f"--dhcp-script={os.getcwd()}/pvcbootstrapd/dnsmasq-lease.py", + "--bind-interfaces", + f"--listen-address={config['dhcp_address']}", + f"--dhcp-option=3,{config['dhcp_gateway']}", + f"--dhcp-range={config['dhcp_lease_start']},{config['dhcp_lease_end']},{config['dhcp_lease_time']}", + "--enable-tftp", + f"--tftp-root={config['tftp_root_path']}/", + # This block of dhcp-match, tag-if, and dhcp-boot statements sets the following TFTP setup: + # If the machine sends client-arch 0, and is not tagged iPXE, load undionly.kpxe (chainload) + # If the machine sends client-arch 7 or 9, and is not tagged iPXE, load ipxe.efi (chainload) + # If the machine sends the iPXE option, load boot.ipxe (iPXE boot configuration) + "--dhcp-match=set:o_bios,option:client-arch,0", + "--dhcp-match=set:o_uefi,option:client-arch,7", + "--dhcp-match=set:o_uefi,option:client-arch,9", + "--dhcp-match=set:ipxe,175", + "--tag-if=set:bios,tag:!ipxe,tag:o_bios", + "--tag-if=set:uefi,tag:!ipxe,tag:o_uefi", + f"--dhcp-boot=tag:bios,undionly.kpxe", + f"--dhcp-boot=tag:uefi,ipxe.efi", + f"--dhcp-boot=tag:ipxe,boot.ipxe", + ] + if config["debug"]: + self.dnsmasq_cmd.append( + "--leasefile-ro" + ) + + print(self.dnsmasq_cmd) + self.stdout = subprocess.PIPE + + def execute(self): + self.proc = subprocess.Popen( + self.dnsmasq_cmd, + env=self.environment, + ) + + def start(self): + self.thread = Thread(target=self.execute, args=()) + self.thread.start() + + def stop(self): + self.proc.send_signal(signal.SIGTERM) + + def reload(self): + self.proc.send_signal(signal.SIGHUP) diff --git a/bootstrap-daemon/pvcbootstrapd/lib/git.py b/bootstrap-daemon/pvcbootstrapd/lib/git.py new file mode 100755 index 0000000..e1d57b1 --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/lib/git.py @@ -0,0 +1,166 @@ +#!/usr/bin/env python3 + +# git.py - PVC Cluster Auto-bootstrap Git repository libraries +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +import os.path +import git +import yaml + +from celery.utils.log import get_task_logger + + +logger = get_task_logger(__name__) + + +def init_repository(config): + """ + Clone the Ansible git repository + """ + if not os.path.exists(config['ansible_path']): + logger.info(f"Cloning configuration repository {config['ansible_remote']} branch {config['ansible_branch']} to {config['ansible_path']}") + git_ssh_cmd = f"ssh -i {config['ansible_keyfile']}" + with git.Git().custom_environment(GIT_SSH_COMMAND=git_ssh_cmd): + git.Repo.clone_from(config['ansible_remote'], config['ansible_path'], branch=config['ansible_branch']) + g = git.cmd.Git(f"{config['ansible_path']}") + else: + g = git.cmd.Git(f"{config['ansible_path']}") + g.checkout(config['ansible_branch']) + + for submodule in g.submodules: + submodule.update(init=True) + + +def pull_repository(config): + """ + Pull (with rebase) the Ansible git repository + """ + logger.info(f"Updating local configuration repository {config['ansible_path']}") + try: + git_ssh_cmd = f"ssh -i {config['ansible_keyfile']}" + with git.Git().custom_environment(GIT_SSH_COMMAND=git_ssh_cmd): + g = git.cmd.Git(f"{config['ansible_path']}") + g.pull(rebase=True) + except Exception as e: + logger.warn(e) + + +def commit_repository(config): + """ + Commit uncommitted changes to the Ansible git repository + """ + logger.info(f"Committing changes to local configuration repository {config['ansible_path']}") + + try: + g = git.cmd.Git(f"{config['ansible_path']}") + g.add('--all') + g.commit( + '-m', + 'Automated commit from PVC Bootstrap Ansible subsystem', + author="PVC Bootstrap " + ) + except Exception as e: + logger.warn(e) + + +def push_repository(config): + """ + Push changes to the default remote + """ + logger.info(f"Pushing changes from local configuration repository {config['ansible_path']}") + + try: + g = git.cmd.Git(f"{config['ansible_path']}") + origin = g.remote(name='origin') + origin.push() + except Exception as e: + logger.warn(e) + + +def load_cspec_yaml(config): + """ + Load the bootstrap group_vars for all known clusters + """ + # Pull down the repository + pull_repository(config) + + # Load our clusters file and read the clusters from it + clusters_file = f"{config['ansible_path']}/{config['ansible_clusters_file']}" + logger.info(f"Loading cluster configuration from file '{clusters_file}'") + with open(clusters_file, 'r') as clustersfh: + clusters = yaml.load(clustersfh, Loader=yaml.SafeLoader).get('clusters', list()) + + # Define a base cpec + cspec = { + 'bootstrap': dict(), + 'hooks': dict(), + } + + # Read each cluster's cspec and update the base cspec + logger.info(f"Loading per-cluster specifications...") + for cluster in clusters: + cspec_file = f"{config['ansible_path']}/group_vars/{cluster}/{config['ansible_cspec_files_bootstrap']}" + if os.path.exists(cspec_file): + with open(cspec_file, 'r') as cpsecfh: + try: + cspec_yaml = yaml.load(cpsecfh, Loader=yaml.SafeLoader) + except Exception as e: + logger.warn(f"Failed to load {config['ansible_cspec_files_bootstrap']} for cluster {cluster}: {e}") + continue + + # Convert the MAC address keys to lowercase + # DNSMasq operates with lowercase keys, but often these are written with uppercase. + # Convert them to lowercase to prevent discrepancies later on. + cspec_yaml['bootstrap'] = {k.lower(): v for k, v in cspec_yaml['bootstrap'].items()} + + # Load in the base YAML for the cluster + base_yaml = load_base_yaml(config, cluster) + + # Set per-node values from elsewhere + for node in cspec_yaml['bootstrap']: + # Set the cluster value automatically + cspec_yaml['bootstrap'][node]['node']['cluster'] = cluster + + # Set the domain value automatically via base config + cspec_yaml['bootstrap'][node]['node']['domain'] = base_yaml['local_domain'] + + # Set the node FQDN value automatically + cspec_yaml['bootstrap'][node]['node']['fqdn'] = f"{cspec_yaml['bootstrap'][node]['node']['hostname']}.{cspec_yaml['bootstrap'][node]['node']['domain']}" + + # Append bootstrap entries to the main dictionary + cspec['bootstrap'] = {**cspec['bootstrap'], **cspec_yaml['bootstrap']} + + # Append hooks to the main dictionary (per-cluster) + if cspec_yaml.get('hooks'): + cspec['hooks'][cluster] = cspec_yaml['hooks'] + + logger.info(f"Finished loading per-cluster specifications") + logger.debug(f"cspec = {cspec}") + return cspec + + +def load_base_yaml(config, cluster): + """ + Load the base.yml group_vars for a cluster + """ + base_file = f"{config['ansible_path']}/group_vars/{cluster}/base.yml" + with open(base_file, 'r') as varsfile: + base_yaml = yaml.load(varsfile, Loader=yaml.SafeLoader) + + return base_yaml diff --git a/bootstrap-daemon/pvcbootstrapd/lib/hooks.py b/bootstrap-daemon/pvcbootstrapd/lib/hooks.py new file mode 100755 index 0000000..53a9cb1 --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/lib/hooks.py @@ -0,0 +1,267 @@ +#!/usr/bin/env python3 + +# hooks.py - PVC Cluster Auto-bootstrap Hook libraries +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +import pvcbootstrapd.lib.git as git +import pvcbootstrapd.lib.db as db + +import ansible_runner +import tempfile +import yaml +import paramiko +import contextlib + +from re import match +from time import sleep +from celery.utils.log import get_task_logger + + +logger = get_task_logger(__name__) + + +@contextlib.contextmanager +def run_paramiko(node_address, username): + ssh_client = paramiko.SSHClient() + ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy()) + ssh_client.connect(hostname=node_address, username=username) + yield ssh_client + ssh_client.close() + + +def run_hook_osddb(config, target, args): + """ + Add an OSD DB defined by args['disk'] + """ + for node in targets: + node_name = node.name + node_address = node.host_ipaddr + + device = args['disk'] + + logger.info(f"Creating OSD DB on node {node_name} device {device}") + + # Using a direct command on the target here is somewhat messy, but avoids many + # complexities of determining a valid API listen address, etc. + pvc_cmd_string = f"pvc storage osd create-db-vg --yes {node_name} {device}" + + with run_paramiko(node_address, config['deploy_username']) as c: + stdin, stdout, stderr = c.exec_command(pvc_cmd_string) + logger.debug(stdout.readlines()) + logger.debug(stderr.readlines()) + + +def run_hook_osd(config, targets, args): + """ + Add an OSD defined by args['disk'] with weight args['weight'] + """ + for node in targets: + node_name = node.name + node_address = node.host_ipaddr + + device = args['disk'] + weight = args.get('weight', 1) + ext_db_flag = args.get('ext_db', False) + ext_db_ratio = args.get('ext_db_ratio', 0.05) + + logger.info(f"Creating OSD on node {node_name} device {device} weight {weight}") + + # Using a direct command on the target here is somewhat messy, but avoids many + # complexities of determining a valid API listen address, etc. + pvc_cmd_string = f"pvc storage osd add --yes {node_name} {device} --weight {weight}" + if ext_db_flag: + pvc_cmd_string = f"{pvc_cmd_string} --ext-db --ext-db-ratio {ext_db_ratio}" + + with run_paramiko(node_address, config['deploy_username']) as c: + stdin, stdout, stderr = c.exec_command(pvc_cmd_string) + logger.debug(stdout.readlines()) + logger.debug(stderr.readlines()) + + +def run_hook_pool(config, targets, args): + """ + Add an pool defined by args['name'] on device tier args['tier'] + """ + for node in targets: + node_name = node.name + node_address = node.host_ipaddr + + name = args['name'] + pgs = args.get('pgs', '64') + tier = args.get('tier', 'default') # Does nothing yet + + logger.info(f"Creating storage pool on node {node_name} name {name} pgs {pgs} tier {tier}") + + # Using a direct command on the target here is somewhat messy, but avoids many + # complexities of determining a valid API listen address, etc. + pvc_cmd_string = f"pvc storage pool add {name} {pgs}" + + with run_paramiko(node_address, config['deploy_username']) as c: + stdin, stdout, stderr = c.exec_command(pvc_cmd_string) + logger.debug(stdout.readlines()) + logger.debug(stderr.readlines()) + + # This only runs once on whatever the first node is + break + + +def run_hook_network(config, targets, args): + """ + Add an network defined by args (many) + """ + for node in targets: + node_name = node.name + node_address = node.host_ipaddr + + vni = args['vni'] + description = args['description'] + nettype = args['type'] + mtu = args.get('mtu', None) + + pvc_cmd_string = f"pvc network add {vni} --description {description} --type {nettype}" + + if mtu is not None and mtu not in ['auto', 'default']: + pvc_cmd_string = f"{pvc_cmd_string} --mtu {mtu}" + + if nettype == 'managed': + domain = args['domain'] + pvc_cmd_string = f"{pvc_cmd_string} --domain {domain}" + + dns_servers = args.get('dns_servers', []) + for dns_server in dns_servers: + pvc_cmd_string = f"{pvc_cmd_string} --dns-server {dns_server}" + + is_ip4 = args['ip4'] + if is_ip4: + ip4_network = args['ip4_network'] + pvc_cmd_string = f"{pvc_cmd_string} --ipnet {ip4_network}" + + ip4_gateway = args['ip4_gateway'] + pvc_cmd_string = f"{pvc_cmd_string} --gateway {ip4_gateway}" + + ip4_dhcp = args['ip4_dhcp'] + if ip4_dhcp: + pvc_cmd_string = f"{pvc_cmd_string} --dhcp" + ip4_dhcp_start = args['ip4_dhcp_start'] + ip4_dhcp_end = args['ip4_dhcp_end'] + pvc_cmd_string = f"{pvc_cmd_string} --dhcp-start {ip4_dhcp_start} --dhcp-end {ip4_dhcp_end}" + else: + pvc_cmd_string = f"{pvc_cmd_string} --no-dhcp" + + is_ip6 = args['ip6'] + if is_ip6: + ip6_network = args['ip6_network'] + pvc_cmd_string = f"{pvc_cmd_string} --ipnet6 {ip6_network}" + + ip6_gateway = args['ip6_gateway'] + pvc_cmd_string = f"{pvc_cmd_string} --gateway6 {ip6_gateway}" + + logger.info(f"Creating network on node {node_name} VNI {vni} type {nettype}") + + with run_paramiko(node_address, config['deploy_username']) as c: + stdin, stdout, stderr = c.exec_command(pvc_cmd_string) + logger.debug(stdout.readlines()) + logger.debug(stderr.readlines()) + + # This only runs once on whatever the first node is + break + + +def run_hook_script(config, targets, args): + for node in targets: + node_name = node.name + node_address = node.host_ipaddr + + script = args.get('script', None) + source = args.get('source', None) + path = args.get('path', None) + + logger.info(f"Running script on node {node_name}") + + with run_paramiko(node_address, config['deploy_username']) as c: + if script is not None: + remote_path = '/tmp/pvcbootstrapd.hook' + with tempfile.NamedTemporaryFile(mode='w') as tf: + tf.write(script) + tf.seek(0) + + # Send the file to the remote system + tc = c.open_sftp() + tc.put(tf.name, remote_path) + tc.chmod(remote_path, 0o755) + tc.close() + elif source == 'local': + if not match(r'^/', path): + path = config['ansible_path'] + '/' + path + + remote_path = '/tmp/pvcbootstrapd.hook' + if path is None: + continue + + tc = c.open_sftp() + tc.put(path, remote_path) + tc.chmod(remote_path, 0o755) + tc.close() + elif source == 'remote': + remote_path = path + + stdin, stdout, stderr = c.exec_command(remote_path) + logger.debug(stdout.readlines()) + logger.debug(stderr.readlines()) + + +hook_functions = { + 'osddb': run_hook_osddb, + 'osd': run_hook_osd, + 'pool': run_hook_pool, + 'network': run_hook_network, + 'script': run_hook_script +} + + +def run_hooks(config, cspec, cluster, nodes): + """ + Run an Ansible bootstrap against a cluster + """ + logger.debug(nodes) + + cluster_hooks = cspec['hooks'][cluster.name] + + logger.debug(cspec) + + cluster_nodes = db.get_nodes_in_cluster(config, cluster.name) + + for hook in cluster_hooks: + hook_target = hook['target'] + hook_name = hook['name'] + logger.info(f"Running hook on {hook_target}: {hook_name}") + + if 'all' in hook_target: + target_nodes = cluster_nodes + else: + target_nodes = [node for node in cluster_nodes if node.name in hook_target] + + hook_type = hook['type'] + hook_args = hook['args'] + + # Run the hook function + hook_functions[hook_type](config, target_nodes, hook_args) + + # Wait 5s between hooks + sleep(5) diff --git a/bootstrap-daemon/pvcbootstrapd/lib/host.py b/bootstrap-daemon/pvcbootstrapd/lib/host.py new file mode 100755 index 0000000..d0f6cdc --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/lib/host.py @@ -0,0 +1,71 @@ +#!/usr/bin/env python3 + +# host.py - PVC Cluster Auto-bootstrap host libraries +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +from celery.utils.log import get_task_logger + +import pvcbootstrapd.lib.db as db + + +logger = get_task_logger(__name__) + + +def installer_init(config, cspec, data): + bmc_macaddr = data['bmc_macaddr'] + bmc_ipaddr = data['bmc_ipaddr'] + host_macaddr = data['host_macaddr'] + host_ipaddr = data['host_ipaddr'] + cspec_cluster = cspec['bootstrap'][bmc_macaddr]['node']['cluster'] + cspec_hostname = cspec['bootstrap'][bmc_macaddr]['node']['hostname'] + cspec_nid = int(''.join(filter(str.isdigit, cspec_hostname))) + + cluster = db.get_cluster(config, name=cspec_cluster) + if cluster is None: + cluster = db.add_cluster(config, cspec_cluster, "provisioning") + logger.debug(cluster) + + node = db.get_node(config, cspec_cluster, name=cspec_hostname) + if node is None: + node = db.add_node(config, cspec_cluster, "installing", cspec_hostname, cspec_nid, bmc_macaddr, bmc_ipaddr, host_macaddr, host_ipaddr) + else: + node = db.update_node_addresses(config, cspec_cluster, cspec_hostname, bmc_macaddr, bmc_ipaddr, host_macaddr, host_ipaddr) + logger.debug(node) + + +def installer_complete(config, cspec, data): + bmc_macaddr = data['bmc_macaddr'] + cspec_hostname = cspec['bootstrap'][bmc_macaddr]['node']['hostname'] + cspec_cluster = cspec['bootstrap'][bmc_macaddr]['node']['cluster'] + + node = db.update_node_state(config, cspec_cluster, cspec_hostname, "installed") + logger.debug(node) + + +def set_boot_state(config, cspec, data, state): + bmc_macaddr = data['bmc_macaddr'] + bmc_ipaddr = data['bmc_ipaddr'] + host_macaddr = data['host_macaddr'] + host_ipaddr = data['host_ipaddr'] + cspec_cluster = cspec['bootstrap'][bmc_macaddr]['node']['cluster'] + cspec_hostname = cspec['bootstrap'][bmc_macaddr]['node']['hostname'] + + node = db.update_node_addresses(config, cspec_cluster, cspec_hostname, bmc_macaddr, bmc_ipaddr, host_macaddr, host_ipaddr) + node = db.update_node_state(config, cspec_cluster, cspec_hostname, state) + logger.debug(node) diff --git a/bootstrap-daemon/pvcbootstrapd/lib/installer.py b/bootstrap-daemon/pvcbootstrapd/lib/installer.py new file mode 100755 index 0000000..290232e --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/lib/installer.py @@ -0,0 +1,79 @@ +#!/usr/bin/env python3 + +# lib.py - PVC Cluster Auto-bootstrap libraries +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +from jinja2 import Template + + +# +# Worker Functions - PXE/Installer Per-host Templates +# +def add_pxe(config, cspec_node, host_macaddr): + # Generate a per-client iPXE configuration for this host + destination_filename = f"{config['tftp_host_path']}/mac-{host_macaddr.replace(':', '')}.ipxe" + template_filename = f"{config['tftp_root_path']}/host-ipxe.j2" + + with open(template_filename, 'r') as tfh: + template = Template(tfh.read()) + + imgargs_host_list = cspec_node.get('config', {}).get('kernel_options') + if imgargs_host_list is not None: + imgargs_host = ' '.join(imgargs_host_list) + else: + imgargs_host = None + + rendered = template.render( + imgargs_host=imgargs_host + ) + + with open(destination_filename, 'w') as dfh: + dfh.write(rendered) + dfh.write('\n') + + +def add_preseed(config, cspec_node, host_macaddr, system_drive_target): + # Generate a per-client Installer configuration for this host + destination_filename = f"{config['tftp_host_path']}/mac-{host_macaddr.replace(':', '')}.preseed" + template_filename = f"{config['tftp_root_path']}/host-preseed.j2" + + with open(template_filename, 'r') as tfh: + template = Template(tfh.read()) + + add_packages_list = cspec_node.get('config', {}).get('packages') + if add_packages_list is not None: + add_packages = ','.join(add_packages_list) + else: + add_packages = None + + # We use the dhcp_address here to allow the listen_address to be 0.0.0.0 + rendered = template.render( + debrelease=cspec_node.get('config', {}).get('release'), + debmirror=cspec_node.get('config', {}).get('mirror'), + addpkglist=add_packages, + filesystem=cspec_node.get('config', {}).get('filesystem'), + skip_blockcheck=False, + fqdn=cspec_node['node']['fqdn'], + target_disk=system_drive_target, + pvcbootstrapd_checkin_uri=f"http://{config['dhcp_address']}:{config['api_port']}/checkin/host" + ) + + with open(destination_filename, 'w') as dfh: + dfh.write(rendered) + dfh.write('\n') diff --git a/bootstrap-daemon/pvcbootstrapd/lib/lib.py b/bootstrap-daemon/pvcbootstrapd/lib/lib.py new file mode 100755 index 0000000..06d1233 --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/lib/lib.py @@ -0,0 +1,148 @@ +#!/usr/bin/env python3 + +# lib.py - PVC Cluster Auto-bootstrap libraries +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +import pvcbootstrapd.lib.db as db +import pvcbootstrapd.lib.git as git +import pvcbootstrapd.lib.redfish as redfish +import pvcbootstrapd.lib.host as host +import pvcbootstrapd.lib.ansible as ansible +import pvcbootstrapd.lib.hooks as hooks + +from pvcbootstrapd.lib.dataclasses import Cluster, Node + +from time import sleep +from threading import Thread, Event +from celery import Celery +from celery.utils.log import get_task_logger +from jinja2 import Template + + +logger = get_task_logger(__name__) + + +# +# Worker Functions - Checkins (Celery root tasks) +# +def dnsmasq_checkin(config, data): + """ + Handle checkins from DNSMasq + """ + logger.debug(f"data = {data}") + + # This is an add event; what we do depends on some stuff + if data['action'] in ['add']: + logger.info(f"Receiving 'add' checkin from DNSMasq for MAC address '{data['macaddr']}'") + cspec = git.load_cspec_yaml(config) + is_in_bootstrap_map = True if data['macaddr'] in cspec['bootstrap'] else False + if is_in_bootstrap_map: + if cspec['bootstrap'][data['macaddr']]['bmc'].get('redfish', None) is not None: + if cspec['bootstrap'][data['macaddr']]['bmc']['redfish']: + is_redfish = True + else: + is_redfish = False + else: + is_redfish = redfish.check_redfish(config, data) + + logger.info(f"Is device '{data['macaddr']}' Redfish capable? {is_redfish}") + if is_redfish: + redfish.redfish_init(config, cspec, data) + else: + logger.warn(f"Device '{data['macaddr']}' not in bootstrap map; ignoring.") + + return + + # This is a tftp event; a node installer has booted + if data['action'] in ['tftp']: + logger.info(f"Receiving 'tftp' checkin from DNSMasq for IP address '{data['destaddr']}'") + return + +def host_checkin(config, data): + """ + Handle checkins from the PVC node + """ + logger.info(f"Registering checkin for host {data['hostname']}") + logger.debug(f"data = {data}") + cspec = git.load_cspec_yaml(config) + bmc_macaddr = data['bmc_macaddr'] + cspec_cluster = cspec['bootstrap'][bmc_macaddr]['node']['cluster'] + + if data['action'] in ['install-start']: + # Node install has started + logger.info(f"Registering install start for host {data['hostname']}") + host.installer_init(config, cspec, data) + + elif data['action'] in ['install-complete']: + # Node install has finished + logger.info(f"Registering install complete for host {data['hostname']}") + host.installer_complete(config, cspec, data) + + elif data['action'] in ['system-boot_initial']: + # Node has booted for the first time and can begin Ansible runs once all nodes up + logger.info(f"Registering first boot for host {data['hostname']}") + target_state = "booted-initial" + + host.set_boot_state(config, cspec, data, target_state) + sleep(1) + + all_nodes = db.get_nodes_in_cluster(config, cspec_cluster) + ready_nodes = [node for node in all_nodes if node.state == target_state] + + # Continue once all nodes are in the booted-initial state + logger.info(f"Ready: {len(ready_nodes)} All: {len(all_nodes)}") + if len(ready_nodes) >= len(all_nodes): + cluster = db.update_cluster_state(config, cspec_cluster, "ansible-running") + + ansible.run_bootstrap(config, cspec, cluster, ready_nodes) + + elif data['action'] in ['system-boot_configured']: + # Node has been booted after Ansible run and can begin hook runs + logger.info(f"Registering post-Ansible boot for host {data['hostname']}") + target_state = "booted-configured" + + host.set_boot_state(config, cspec, data, target_state) + sleep(1) + + all_nodes = db.get_nodes_in_cluster(config, cspec_cluster) + ready_nodes = [node for node in all_nodes if node.state == target_state] + + # Continue once all nodes are in the booted-configured state + logger.info(f"Ready: {len(ready_nodes)} All: {len(all_nodes)}") + if len(ready_nodes) >= len(all_nodes): + cluster = db.update_cluster_state(config, cspec_cluster, "hooks-running") + + hooks.run_hooks(config, cspec, cluster, ready_nodes) + + elif data['action'] in ['system-boot_completed']: + # Node has been fully configured and can be shut down for the final time + logger.info(f"Registering post-hooks boot for host {data['hostname']}") + target_state = "booted-completed" + + host.set_boot_state(config, cspec, data, target_state) + sleep(1) + + all_nodes = db.get_nodes_in_cluster(config, cspec_cluster) + ready_nodes = [node for node in all_nodes if node.state == target_state] + + logger.info(f"Ready: {len(ready_nodes)} All: {len(all_nodes)}") + if len(ready_nodes) >= len(all_nodes): + cluster = db.update_cluster_state(config, cspec_cluster, "completed") + + # Hosts will now power down ready for real activation in production diff --git a/bootstrap-daemon/pvcbootstrapd/lib/redfish.py b/bootstrap-daemon/pvcbootstrapd/lib/redfish.py new file mode 100755 index 0000000..d1079ee --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/lib/redfish.py @@ -0,0 +1,785 @@ +#!/usr/bin/env python3 + +# redfish.py - PVC Cluster Auto-bootstrap Redfish libraries +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +# Refs: +# https://downloads.dell.com/manuals/all-products/esuprt_software/esuprt_it_ops_datcentr_mgmt/dell-management-solution-resources_white-papers11_en-us.pdf +# https://downloads.dell.com/solutions/dell-management-solution-resources/RESTfulSerConfig-using-iDRAC-REST%20API%28DTC%20copy%29.pdf + +import requests +import urllib3 +import json +import re +import math +from sys import stderr, argv +from time import sleep +from celery.utils.log import get_task_logger + +import pvcbootstrapd.lib.installer as installer +import pvcbootstrapd.lib.db as db + + +logger = get_task_logger(__name__) + + +# +# Helper Classes +# +class AuthenticationException(Exception): + def __init__(self, error=None, response=None): + if error is not None: + self.short_message = error + else: + self.short_message = "Generic authentication failure" + + if response is not None: + response.status_code = response.status_code + + rinfo = response.json()['error']['@Message.ExtendedInfo'][0] + if rinfo.get('Message') is not None: + self.full_message = rinfo['Message'] + self.res_message = rinfo['Resolution'] + self.severity = rinfo['Severity'] + self.message_id = rinfo['MessageId'] + else: + self.full_message = '' + self.res_message = '' + self.severity = 'Fatal' + self.message_id = rinfo['MessageId'] + else: + response.status_code = None + + def __str__(self): + if response.status_code is not None: + message = f"{self.short_message}: {self.full_message} {self.res_message} (HTTP Code: {response.status_code}, Severity: {self.severity}, ID: {self.message_id})" + else: + message = f"{self.short_message}" + return str(message) + + +class RedfishSession: + def __init__(self, host, username, password): + # Disable urllib3 warnings + urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) + + # Perform login + login_payload = { "UserName": username, "Password": password } + login_uri = f"{host}/redfish/v1/Sessions" + login_headers = {'content-type': 'application/json'} + + self.host = None + login_response = None + + tries = 1 + max_tries = 25 + while tries < max_tries: + logger.info(f"Trying to log in to Redfish ({tries}/{max_tries - 1})...") + try: + login_response = requests.post( + login_uri, + data=json.dumps(login_payload), + headers=login_headers, + verify=False, + timeout=5 + ) + break + except Exception as e: + sleep(2) + tries += 1 + + if login_response is None: + logger.error("Failed to log in to Redfish") + return + + if login_response.status_code not in [200, 201]: + raise AuthenticationException( + f"Login failed", + response=login_response + ) + logger.info(f"Logged in to Redfish at {host} successfully") + + self.host = host + self.token = login_response.headers.get('X-Auth-Token') + self.headers = { 'content-type': 'application/json', 'x-auth-token': self.token } + + logout_uri = login_response.headers.get('Location') + if re.match(r"^/", logout_uri): + self.logout_uri = f"{host}{logout_uri}" + else: + self.logout_uri = logout_uri + + def __del__(self): + if self.host is None: + return + + logout_headers = { "Content-Type": "application/json", "X-Auth-Token": self.token } + + logout_response = requests.delete( + self.logout_uri, + headers=logout_headers, + verify=False, + timeout=15 + ) + + if logout_response.status_code not in [200, 201]: + raise AuthenticationException( + f"Logout failed", + response=logout_response + ) + logger.info(f"Logged out of Redfish at {host} successfully") + + def get(self, uri): + url = f"{self.host}{uri}" + + response = requests.get(url, headers=self.headers, verify=False) + + if response.status_code in [200, 201]: + return response.json() + else: + rinfo = response.json()['error']['@Message.ExtendedInfo'][0] + if rinfo.get('Message') is not None: + message = f"{rinfo['Message']} {rinfo['Resolution']}" + severity = rinfo['Severity'] + message_id = rinfo['MessageId'] + else: + message = rinfo + severity = 'Error' + message_id = 'N/A' + logger.warn(f"! Error: GET request to {url} failed") + logger.warn(f"! HTTP Code: {response.status_code} Severity: {severity} ID: {message_id}") + logger.warn(f"! Details: {message}") + return None + + def delete(self, uri): + url = f"{self.host}{uri}" + + response = requests.delete(url, headers=self.headers, verify=False) + + if response.status_code in [200, 201]: + return response.json() + else: + rinfo = response.json()['error']['@Message.ExtendedInfo'][0] + if rinfo.get('Message') is not None: + message = f"{rinfo['Message']} {rinfo['Resolution']}" + severity = rinfo['Severity'] + message_id = rinfo['MessageId'] + else: + message = rinfo + severity = 'Error' + message_id = 'N/A' + + logger.warn(f"! Error: DELETE request to {url} failed") + logger.warn(f"! HTTP Code: {response.status_code} Severity: {severity} ID: {message_id}") + logger.warn(f"! Details: {message}") + return None + + def post(self, uri, data): + url = f"{self.host}{uri}" + payload = json.dumps(data) + + response = requests.post(url, data=payload, headers=self.headers, verify=False) + + if response.status_code in [200, 201]: + return response.json() + else: + rinfo = response.json()['error']['@Message.ExtendedInfo'][0] + if rinfo.get('Message') is not None: + message = f"{rinfo['Message']} {rinfo['Resolution']}" + severity = rinfo['Severity'] + message_id = rinfo['MessageId'] + else: + message = rinfo + severity = 'Error' + message_id = 'N/A' + + logger.warn(f"! Error: POST request to {url} failed") + logger.warn(f"! HTTP Code: {response.status_code} Severity: {severity} ID: {message_id}") + logger.warn(f"! Details: {message}") + return None + + def put(self, uri, data): + url = f"{self.host}{uri}" + payload = json.dumps(data) + + response = requests.put(url, data=payload, headers=self.headers, verify=False) + + if response.status_code in [200, 201]: + return response.json() + else: + rinfo = response.json()['error']['@Message.ExtendedInfo'][0] + if rinfo.get('Message') is not None: + message = f"{rinfo['Message']} {rinfo['Resolution']}" + severity = rinfo['Severity'] + message_id = rinfo['MessageId'] + else: + message = rinfo + severity = 'Error' + message_id = 'N/A' + + logger.warn(f"! Error: PUT request to {url} failed") + logger.warn(f"! HTTP Code: {response.status_code} Severity: {severity} ID: {message_id}") + logger.warn(f"! Details: {message}") + return None + + def patch(self, uri, data): + url = f"{self.host}{uri}" + payload = json.dumps(data) + + response = requests.patch(url, data=payload, headers=self.headers, verify=False) + + if response.status_code in [200, 201]: + return response.json() + else: + rinfo = response.json()['error']['@Message.ExtendedInfo'][0] + if rinfo.get('Message') is not None: + message = f"{rinfo['Message']} {rinfo['Resolution']}" + severity = rinfo['Severity'] + message_id = rinfo['MessageId'] + else: + message = rinfo + severity = 'Error' + message_id = 'N/A' + + logger.warn(f"! Error: PATCH request to {url} failed") + logger.warn(f"! HTTP Code: {response.status_code} Severity: {severity} ID: {message_id}") + logger.warn(f"! Details: {message}") + return None + + +# +# Helper functions +# +def format_bytes_tohuman(databytes): + """ + Format a string of bytes into a human-readable value (using base-1000) + """ + # Matrix of human-to-byte values + byte_unit_matrix = { + "B": 1, + "KB": 1000, + "MB": 1000 * 1000, + "GB": 1000 * 1000 * 1000, + "TB": 1000 * 1000 * 1000 * 1000, + "PB": 1000 * 1000 * 1000 * 1000 * 1000, + "EB": 1000 * 1000 * 1000 * 1000 * 1000 * 1000, + } + + datahuman = "" + for unit in sorted(byte_unit_matrix, key=byte_unit_matrix.get, reverse=True): + if unit in ['TB', 'PB', 'EB']: + # Handle the situation where we might want to round to integer values + # for some entries (2TB) but not others (e.g. 1.92TB). We round if the + # result is within +/- 2% of the integer value, otherwise we use two + # decimal places. + new_bytes = databytes / byte_unit_matrix[unit] + new_bytes_plustwopct = new_bytes * 1.02 + new_bytes_minustwopct = new_bytes * 0.98 + cieled_bytes = int(math.ceil(databytes / byte_unit_matrix[unit])) + rounded_bytes = round(databytes / byte_unit_matrix[unit], 2) + if cieled_bytes > new_bytes_minustwopct and cieled_bytes < new_bytes_plustwopct: + new_bytes = cieled_bytes + else: + new_bytes = rounded_bytes + + # Round up if 5 or more digits + if new_bytes > 999: + # We can jump down another level + continue + else: + # We're at the end, display with this size + datahuman = "{}{}".format(new_bytes, unit) + + return datahuman + + +def get_system_drive_target(session, cspec_node, storage_root): + """ + Determine the system drive target for the installer + """ + # Handle an invalid >2 number of system disks, use only first 2 + if len(cspec_node['config']['system_disks']) > 2: + cspec_drives = cspec_node['config']['system_disks'][0:2] + else: + cspec_drives = cspec_node['config']['system_disks'] + + # If we have no storage root, we just return the first entry from + # the cpsec_drives as-is and hope the administrator has the right + # format here. + if storage_root is None: + return cspec_drives[0] + # We proceed with Redfish configuration to determine the disks + else: + storage_detail = session.get(storage_root) + + # Grab a full list of drives + drive_list = list() + for storage_member in storage_detail['Members']: + storage_member_root = storage_member['@odata.id'] + storage_member_detail = session.get(storage_member_root) + for drive in storage_member_detail['Drives']: + drive_root = drive['@odata.id'] + drive_detail = session.get(drive_root) + drive_list.append(drive_detail) + + system_drives = list() + + # Iterate through each drive and include those that match + for cspec_drive in cspec_drives: + if re.match(r"^\/dev", cspec_drive) or re.match(r"^detect:", cspect_drive): + # We only match the first drive that has these conditions for use in the preseed config + logger.info("Found a drive with a 'detect:' string or Linux '/dev' path, using it for bootstrap.") + return cspec_drive + + # Match any chassis-ID spec drives + for drive in drive_list: + # Like "Disk.Bay.2:Enclosure.Internal.0-1:RAID.Integrated.1-1" + drive_name = drive['Id'].split(':')[0] + # Craft up the cspec version of this + cspec_drive_name = f"Drive.Bay.{cspec_drive}" + if drive_name == cspec_drive_name: + system_drives.append(drive) + + # We found a single drive, so determine its actual detect string + if len(system_drives) == 1: + logger.info("Found a single drive matching the requested chassis ID, using it as the system disk.") + + # Get the model's first word + drive_model = system_drives[0].get('Model', 'INVALID').split()[0] + # Get and convert the size in bytes value to human + drive_size_bytes = system_drives[0].get('CapacityBytes', 0) + drive_size_human = format_bytes_tohuman(drive_size_bytes) + # Get the drive ID out of all the valid entries + # How this works is that, for each non-array disk, we must find what position our exact disk is + # So for example, say we want disk 3 out of 4, and all 4 are the same size and model and not in + # another (RAID) volume. This will give us an index of 2. Then in the installer this will match + # the 3rd list entry from "lsscsi". This is probably an unneccessary hack, since people will + # probably just give the first disk if they want one, or 2 disks if they want a RAID-1, but this + # is here just in case + idx = 0 + for drive in drive_list: + list_drive_model = drive.get('Model', 'INVALID').split()[0] + list_drive_size_bytes = drive.get('CapacityBytes', 0) + list_drive_in_array = False if drive.get('Links', {}).get('Volumes', [''])[0].get('@odata.id').split('/')[-1] == drive.get('Id') else True + if drive_model == list_drive_model and drive_size_bytes == list_drive_size_bytes and not list_drive_in_array: + index = idx + idx += 1 + drive_id = index + + # Create the target string + system_drive_target = f"detect:{drive_model}:{drive_size_human}:{drive_id}" + + # We found two drives, so create a RAID-1 array then determine the volume's detect string + elif len(system_drives) == 2: + logger.info("Found two drives matching the requested chassis IDs, creating a RAID-1 and using it as the system disk.") + + drive_one = system_drives[0] + drive_one_id = drive_one.get('Id', 'INVALID') + drive_one_path = drive_one.get('@odata.id', 'INVALID') + drive_one_controller = drive_one_id.split(':')[-1] + drive_two = system_drives[1] + drive_two_id = drive_two.get('Id', 'INVALID') + drive_two_path = drive_two.get('@odata.id', 'INVALID') + drive_two_controller = drive_two_id.split(':')[-1] + + # Determine that the drives are on the same controller + if drive_one_controller != drive_two_controller: + logger.error("Two drives are not on the same controller; this should not happen") + return None + + # Get the controller details + controller_root = f"{storage_root}/{drive_one_controller}" + controller_detail = session.get(controller_root) + + # Get the name of the controller (for crafting the detect string) + controller_name = controller_detail.get('Name', 'INVALID').split()[0] + + # Get the volume root for the controller + controller_volume_root = controller_detail.get('Volumes', {}).get('@odata.id') + + # Get the pre-creation list of volumes on the controller + controller_volumes_pre = [volume['@odata.id'] for volume in session.get(controller_volume_root).get('Members', [])] + + # Create the RAID-1 volume + payload = { + "VolumeType": "Mirrored", + "Drives": [ + { + "@odata.id": drive_one_path + }, + { + "@odata.id": drive_two_path + } + ] + } + session.post(controller_volume_root, payload) + + # Wait for the volume to be created + new_volume_list = [] + while len(new_volume_list) < 1: + sleep(5) + controller_volumes_post = [volume['@odata.id'] for volume in session.get(controller_volume_root).get('Members', [])] + new_volume_list = list(set(controller_volumes_post).difference(controller_volumes_pre)) + new_volume_root = new_volume_list[0] + + # Get the IDX of the volume out of any others + volume_id = 0 + for idx, volume in enumerate(controller_volumes_post): + if volume == new_volume_root: + volume_id = idx + break + + # Get and convert the size in bytes value to human + volume_detail = session.get(new_volume_root) + volume_size_bytes = volume_detail.get('CapacityBytes', 0) + volume_size_human = format_bytes_tohuman(volume_size_bytes) + + # Create the target string + system_drive_target = f"detect:{controller_name}:{volume_size_human}:{volume_id}" + + # We found too few or too many drives, error + else: + system_drive_target = None + + return system_drive_target + +# +# Redfish Task functions +# +def set_indicator_state(session, system_root, redfish_vendor, state): + """ + Set the system indicator LED to the desired state (on/off) + """ + state_values_write = { + 'Dell': { + 'on': 'Blinking', + 'off': 'Off', + }, + 'default': { + 'on': 'Lit', + 'off': 'Off', + }, + } + + state_values_read = { + 'default': { + 'on': 'Lit', + 'off': 'Off', + }, + } + + try: + # Allow vendor-specific overrides + if redfish_vendor not in state_values_write: + redfish_vendor = "default" + # Allow nice names ("on"/"off") + if state in state_values_write[redfish_vendor]: + state = state_values_write[redfish_vendor][state] + + # Get current state + system_detail = session.get(system_root) + current_state = system_detail['IndicatorLED'] + except KeyError: + return False + + try: + state_read = state + # Allow vendor-specific overrides + if redfish_vendor not in state_values_read: + redfish_vendor = "default" + # Allow nice names ("on"/"off") + if state_read in state_values_read[redfish_vendor]: + state_read = state_values_read[redfish_vendor][state] + + if state_read == current_state: + return False + except KeyError: + return False + + session.patch( + system_root, + { "IndicatorLED": state } + ) + + return True + + +def set_power_state(session, system_root, redfish_vendor, state): + """ + Set the system power state to the desired state + """ + state_values = { + 'default': { + 'on': 'On', + 'off': 'ForceOff', + }, + } + + try: + # Allow vendor-specific overrides + if redfish_vendor not in state_values: + redfish_vendor = "default" + # Allow nice names ("on"/"off") + if state in state_values[redfish_vendor]: + state = state_values[redfish_vendor][state] + + # Get current state, target URI, and allowable values + system_detail = session.get(system_root) + current_state = system_detail['PowerState'] + power_root = system_detail['Actions']['#ComputerSystem.Reset']['target'] + power_choices = system_detail['Actions']['#ComputerSystem.Reset']['ResetType@Redfish.AllowableValues'] + except KeyError: + return False + + # Remap some namings so we can check the current state against the target state + if state in ['ForceOff']: + target_state = 'Off' + else: + target_state = state + + if target_state == current_state: + return False + + if state not in power_choices: + return False + + session.post( + power_root, + { "ResetType": state } + ) + + return True + + +def set_boot_override(session, system_root, redfish_vendor, target): + """ + Set the system boot override to the desired target + """ + try: + system_detail = session.get(system_root) + boot_targets = system_detail['Boot']['BootSourceOverrideSupported'] + except KeyError: + return False + + if target not in boot_targets: + return False + + session.patch( + system_root, + { "Boot": { "BootSourceOverrideTarget": target } } + ) + + return True + + +def check_redfish(config, data): + """ + Validate that a BMC is Redfish-capable + """ + headers = { "Content-Type": "application/json" } + logger.info("Checking for Redfish response...") + count = 0 + while True: + try: + count += 1 + if count > 30: + retcode = 500 + logger.warn("Aborted after 300s; device too slow or not booting.") + break + resp = requests.get(f"https://{data['ipaddr']}/redfish/v1", headers=headers, verify=False, timeout=10) + retcode = resp.retcode + break + except Exception: + logger.info(f"Attempt {count}...") + continue + + if retcode == 200: + return True + else: + return False + + +# +# Entry function +# +def redfish_init(config, cspec, data): + """ + Initialize a new node with Redfish + """ + bmc_ipaddr = data['ipaddr'] + bmc_macaddr = data['macaddr'] + bmc_host = f"https://{bmc_ipaddr}" + + cspec_node = cspec['bootstrap'][bmc_macaddr] + logger.debug(f"cspec_node = {cspec_node}") + + bmc_username = cspec_node['bmc']['username'] + bmc_password = cspec_node['bmc']['password'] + + host_macaddr = '' + host_ipaddr = '' + + cspec_cluster = cspec_node['node']['cluster'] + cspec_hostname = cspec_node['node']['hostname'] + cspec_nid = int(''.join(filter(str.isdigit, cspec_hostname))) + + cluster = db.get_cluster(config, name=cspec_cluster) + if cluster is None: + cluster = db.add_cluster(config, cspec_cluster, "provisioning") + logger.debug(cluster) + + node = db.get_node(config, cspec_cluster, name=cspec_hostname) + if node is None: + node = db.add_node(config, cspec_cluster, "characterizing", cspec_hostname, cspec_nid, bmc_macaddr, bmc_ipaddr, host_macaddr, host_ipaddr) + else: + node = db.update_node_addresses(config, cspec_cluster, cspec_hostname, bmc_macaddr, bmc_ipaddr, host_macaddr, host_ipaddr) + logger.debug(node) + + # Create the session and log in + session = RedfishSession(bmc_host, bmc_username, bmc_password) + if session.host is None: + logger.info("Aborting Redfish configuration; reboot BMC to try again.") + del session + return + + logger.info("Characterizing node...") + # Get Refish bases + redfish_base_root = '/redfish/v1' + redfish_base_detail = session.get(redfish_base_root) + + redfish_vendor = list(redfish_base_detail['Oem'].keys())[0] + redfish_name = redfish_base_detail['Name'] + redfish_version = redfish_base_detail['RedfishVersion'] + + systems_base_root = redfish_base_detail['Systems']['@odata.id'].rstrip('/') + systems_base_detail = session.get(systems_base_root) + + system_root = systems_base_detail['Members'][0]['@odata.id'].rstrip('/') + + # Force off the system and turn on the indicator + set_power_state(session, system_root, redfish_vendor, 'off') + set_indicator_state(session, system_root, redfish_vendor, 'on') + + # Get the system details + system_detail = session.get(system_root) + + system_sku = system_detail['SKU'].strip() + system_serial = system_detail['SerialNumber'].strip() + system_power_state = system_detail['PowerState'].strip() + system_indicator_state = system_detail['IndicatorLED'].strip() + system_health_state = system_detail['Status']['Health'].strip() + + # Walk down the EthernetInterfaces construct to get the bootstrap interface MAC address + try: + ethernet_root = system_detail['EthernetInterfaces']['@odata.id'].rstrip('/') + ethernet_detail = session.get(ethernet_root) + first_interface_root = ethernet_detail['Members'][0]['@odata.id'].rstrip('/') + first_interface_detail = session.get(first_interface_root) + # Something went wrong, so fall back + except KeyError: + first_interface_detail = dict() + + # Try to get the MAC address directly from the interface detail (Redfish standard) + if first_interface_detail.get('MACAddress') is not None: + bootstrap_mac_address = first_interface_detail['MACAddress'].strip().lower() + # Try to get the MAC address from the HostCorrelation->HostMACAddress (HP DL360x G8) + elif len(system_detail.get('HostCorrelation', {}).get('HostMACAddress', [])) > 0: + bootstrap_mac_address = system_detail['HostCorrelation']['HostMACAddress'][0].strip().lower() + # We can't find it, so use a dummy value + else: + logger.error("Could not find a valid MAC address for the bootstrap interface.") + return + + # Display the system details + logger.info("Found details from node characterization:") + logger.info(f"> System Manufacturer: {redfish_vendor}") + logger.info(f"> System Redfish Version: {redfish_version}") + logger.info(f"> System Redfish Name: {redfish_name}") + logger.info(f"> System SKU: {system_sku}") + logger.info(f"> System Serial: {system_serial}") + logger.info(f"> Power State: {system_power_state}") + logger.info(f"> Indicator LED: {system_indicator_state}") + logger.info(f"> Health State: {system_health_state}") + logger.info(f"> Bootstrap NIC MAC: {bootstrap_mac_address}") + + # Update node host MAC address + host_macaddr = bootstrap_mac_address + node = db.update_node_addresses(config, cspec_cluster, cspec_hostname, bmc_macaddr, bmc_ipaddr, host_macaddr, host_ipaddr) + logger.debug(node) + + logger.info("Determining system disk...") + storage_root = system_detail.get('Storage', {}).get('@odata.id') + system_drive_target = get_system_drive_target(session, cspec_node, storage_root) + if system_drive_target is None: + logger.error("No valid drives found; configure a single system drive as a 'detect:' string or Linux '/dev' path instead and try again.") + return + logger.info(f"Found system disk {system_drive_target}") + + # Create our preseed configuration + logger.info("Creating node boot configurations...") + installer.add_pxe(config, cspec_node, host_macaddr) + installer.add_preseed(config, cspec_node, host_macaddr, system_drive_target) + + # Adjust any BIOS settings + logger.info("Adjusting BIOS settings...") + bios_root = system_detail.get('Bios', {}).get('@odata.id') + if bios_root is not None: + bios_detail = session.get(bios_root) + bios_attributes = list(bios_detail['Attributes'].keys()) + for setting, value in cspec_node['bmc'].get('bios_settings', {}).items(): + if setting not in bios_attributes: + continue + + payload = { "Attributes": { setting: value } } + session.patch(f"{bios_root}/Settings", payload) + + # Set boot override to Pxe for the installer boot + logger.info("Setting temporary PXE boot...") + set_boot_override(session, system_root, redfish_vendor, 'Pxe') + + # Turn on the system + logger.info("Powering on node...") + set_power_state(session, system_root, redfish_vendor, 'on') + + node = db.update_node_state(config, cspec_cluster, cspec_hostname, 'pxe-booting') + + logger.info("Waiting for completion of node and cluster installation...") + # Wait for the system to install and be configured + while node.state != "booted-completed": + sleep(60) + # Keep the Redfish session alive + session.get(redfish_base_root) + # Refresh our node state + node = db.get_node(config, cspec_cluster, name=cspec_hostname) + + # Graceful shutdown of the machine + set_power_state(session, system_root, redfish_vendor, 'GracefulShutdown') + system_power_state = "On" + while system_power_state != "Off": + sleep(5) + # Refresh our power state from the system details + system_detail = session.get(system_root) + system_power_state = system_detail['PowerState'].strip() + + # Turn off the indicator to indicate bootstrap has completed + set_indicator_state(session, system_root, redfish_vendor, 'off') + + # We must delete the session + del session + return diff --git a/bootstrap-daemon/pvcbootstrapd/lib/tftp.py b/bootstrap-daemon/pvcbootstrapd/lib/tftp.py new file mode 100755 index 0000000..18cbb63 --- /dev/null +++ b/bootstrap-daemon/pvcbootstrapd/lib/tftp.py @@ -0,0 +1,45 @@ +#!/usr/bin/env python3 + +# tftp.py - PVC Cluster Auto-bootstrap TFTP preparation libraries +# Part of the Parallel Virtual Cluster (PVC) system +# +# Copyright (C) 2018-2021 Joshua M. Boniface +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, version 3. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# +############################################################################### + +import os.path +import git +import yaml + +from celery.utils.log import get_task_logger + + +logger = get_task_logger(__name__) + + +def build_tftp_repository(config): + # Generate an installer config + os.system(f"{config['root_path']}/repo/pvc-installer/buildpxe.sh -o {config['tftp_root_path']} -u {config['deploy_user']}") + + +def init_tftp(config): + """ + Prepare a TFTP root + """ + if not os.patch.exists(config['tftp_root_path']): + os.makedirs(config['tftp_root_path']) + os.makedirs(config['tftp_host_path']) + + build_tftp_repository(config) diff --git a/bootstrap-daemon/requirements.txt b/bootstrap-daemon/requirements.txt new file mode 100644 index 0000000..5344efc --- /dev/null +++ b/bootstrap-daemon/requirements.txt @@ -0,0 +1,9 @@ +ansible +ansible_runner +pyyaml +gitpython +requests +flask +flask_restful +click +sqlite3 diff --git a/docs/images/pvcbootstrapd-net.png b/docs/images/pvcbootstrapd-net.png new file mode 100644 index 0000000..ff2c2f5 Binary files /dev/null and b/docs/images/pvcbootstrapd-net.png differ diff --git a/docs/images/pvcbootstrapd-phy.png b/docs/images/pvcbootstrapd-phy.png new file mode 100644 index 0000000..374730e Binary files /dev/null and b/docs/images/pvcbootstrapd-phy.png differ diff --git a/docs/swagger.html b/docs/swagger.html new file mode 100644 index 0000000..a724f5b --- /dev/null +++ b/docs/swagger.html @@ -0,0 +1,13 @@ + + + + PVC Bootstrap API Documentation + + + + + + + + + diff --git a/docs/swagger.json b/docs/swagger.json new file mode 100644 index 0000000..0f57543 --- /dev/null +++ b/docs/swagger.json @@ -0,0 +1,191 @@ +{ + "definitions": { + "Message": { + "properties": { + "message": { + "description": "A text message describing the result", + "example": "The foo was successfully maxed", + "type": "string" + } + }, + "type": "object" + } + }, + "host": "localhost:9999", + "info": { + "title": "PVC Bootstrap API", + "version": "1.0" + }, + "paths": { + "/": { + "get": { + "description": "", + "responses": { + "200": { + "description": "OK", + "schema": { + "$ref": "#/definitions/Message" + } + } + }, + "summary": "Return basic details of the API", + "tags": [ + "root" + ] + } + }, + "/checkin": { + "get": { + "description": "", + "responses": { + "200": { + "description": "OK", + "schema": { + "$ref": "#/definitions/Message" + } + } + }, + "summary": "Return checkin details of the API", + "tags": [ + "checkin" + ] + } + }, + "/checkin/dnsmasq": { + "post": { + "consumes": [ + "application/json" + ], + "description": "", + "parameters": [ + { + "description": "An event checkin from an external bootstrap tool component.", + "in": "body", + "name": "dnsmasq_checkin_event", + "schema": { + "properties": { + "action": { + "description": "The action of the event.", + "example": "add", + "type": "string" + }, + "client_id": { + "description": "(add, old) The client ID from a DHCP request.", + "example": "01:ff:ff:ff:ab:cd:ef", + "type": "string" + }, + "hostname": { + "description": "(add, old) The client hostname from a DHCP request.", + "example": "pvc-installer-live", + "type": "string" + }, + "ipaddr": { + "description": "(add, old) The IP address from a DHCP request.", + "example": "10.199.199.10", + "type": "string" + }, + "macaddr": { + "description": "(add, old) The MAC address from a DHCP request.", + "example": "ff:ff:ff:ab:cd:ef", + "type": "string" + }, + "user_class": { + "description": "(add, old) The DHCP user-class option from a DHCP request.", + "example": "None", + "type": "string" + }, + "vendor_class": { + "description": "(add, old) The DHCP vendor-class option from a DHCP request.", + "example": "CPQRIB3 (HP Proliant DL360 G6 iLO)", + "type": "string" + } + }, + "required": [ + "action" + ], + "type": "object" + } + } + ], + "responses": { + "200": { + "description": "OK", + "schema": { + "$ref": "#/definitions/Message" + } + } + }, + "summary": "Register a checkin from the DNSMasq subsystem", + "tags": [ + "checkin" + ] + } + }, + "/checkin/host": { + "post": { + "consumes": [ + "application/json" + ], + "description": "", + "parameters": [ + { + "description": "An event checkin from an external bootstrap tool component.", + "in": "body", + "name": "host_checkin_event", + "schema": { + "properties": { + "action": { + "description": "The action of the event.", + "example": "begin", + "type": "string" + }, + "bmc_ipaddr": { + "description": "The IP addres of the system BMC interface.", + "example": "10.199.199.10", + "type": "string" + }, + "bmc_macaddr": { + "description": "The MAC address of the system BMC interface.", + "example": "ff:ff:ff:01:23:45", + "type": "string" + }, + "host_ipaddr": { + "description": "The IP address of the system provisioning interface.", + "example": "10.199.199.11", + "type": "string" + }, + "host_macaddr": { + "description": "The MAC address of the system provisioning interface.", + "example": "ff:ff:ff:ab:cd:ef", + "type": "string" + }, + "hostname": { + "description": "The system hostname.", + "example": "hv1.mydomain.tld", + "type": "string" + } + }, + "required": [ + "action" + ], + "type": "object" + } + } + ], + "responses": { + "200": { + "description": "OK", + "schema": { + "$ref": "#/definitions/Message" + } + } + }, + "summary": "Register a checkin from the Host subsystem", + "tags": [ + "checkin" + ] + } + } + }, + "swagger": "2.0" +} \ No newline at end of file diff --git a/gen-api-doc b/gen-api-doc new file mode 100755 index 0000000..b44b6fb --- /dev/null +++ b/gen-api-doc @@ -0,0 +1,24 @@ +#!/usr/bin/env python3 + +# gen-doc.py - Generate a Swagger JSON document for the API +# Part of the Parallel Virtual Cluster (PVC) system + +from flask_swagger import swagger +import os +import sys +import json + +os.environ['PVCD_CONFIG_FILE'] = "./bootstrap-daemon/pvcbootstrapd.yaml.sample" + +sys.path.append('bootstrap-daemon') + +import pvcbootstrapd.flaskapi as pvcbootstrapd + +swagger_file = "docs/swagger.json" +swagger_data = swagger(pvcbootstrapd.app) +swagger_data['info']['version'] = "1.0" +swagger_data['info']['title'] = "PVC Bootstrap API" +swagger_data['host'] = "localhost:9999" + +with open(swagger_file, 'w') as fd: + fd.write(json.dumps(swagger_data, sort_keys=True, indent=4)) diff --git a/install-pvcbootstrapd.sh b/install-pvcbootstrapd.sh new file mode 100755 index 0000000..cdb83db --- /dev/null +++ b/install-pvcbootstrapd.sh @@ -0,0 +1,211 @@ +#!/usr/bin/env bash + +# PVC Bootstrap system installer + +echo "Welcome to the PVC bootstrap installer. This will guide you through the setup process." +echo +echo "Please enter the bootstrap root directory; all components will be installed here:" +echo -n "[/srv/pvc] > " +read root_directory +if [[ -z ${root_directory} ]]; then + root_directory="/srv/pvc" +fi +echo + +echo "Please enter the IP network for the Bootstrap network (should be a /24):" +echo -n "[10.255.255.0/24] > " +read bootstrap_network +if [[ -z ${bootstrap_network} ]]; then + bootstrap_network="10.255.255.0/24" +fi +echo + +echo "Will the bootstrap interface be a vLAN? Note: It should not be configured yet if so!" +echo -n "[y/N] > " +read is_bootstrap_interface_vlan +case ${is_bootstrap_interface_vlan} in + y|Y|yes|Yes|YES) is_bootstrap_interface_vlan="yes" ;; + *) is_bootstrap_interface_vlan="no" ;; +esac +echo + +all_interfaces=( $( + ip address | grep '^[0-9]' | grep 'bond\|eno\|enp\|ens\|wlp' | awk '{ print $2 }' | tr -d ':' +) ) +if [[ "${is_bootstrap_interface_vlan}" == "yes" ]]; then +echo "Please enter the underlying device for the Bootstrap network vLAN:" +else +echo "Please enter the Bootstrap network interface:" +fi +echo "Available interfaces: ${all_interfaces[@]}" +bootstrap_interface="" +while true; do + echo -n "> " + read bootstrap_interface + if [[ -n ${bootstrap_interface} && "${all_interfaces[@]}" =~ "${bootstrap_interface}" ]]; then + break + fi +done +echo + +if [[ "${is_bootstrap_interface_vlan}" == "yes" ]]; then +echo "Please enter the Bootstrap network vLAN ID:" +echo -n "> " +read bootstrap_vlan +echo +fi + +echo "Please enter the Git remote (SSH-only) for your local PVC repository:" +while [[ -z ${git_remote} ]]; do +echo -n "> " +read git_remote +done +echo + +echo "Please enter the branch to use from the local PVC repository:" +echo -n "[master] > " +read git_branch +if [[ -z ${git_branch} ]]; then + git_branch="master" +fi +echo + +echo "Please enter a username for Ansible management of the cluster:" +echo -m "[deploy] >" +read deploy_username +if [[ -z ${deploy_username} ]]; then + deploy_username="deploy" +fi +echo + +echo "Proceeding with setup!" +echo + +echo "Installing dependencies..." +apt-get update +apt-get install --yes vlan iptables redis python3 python3-pip python3-virtualenv virtualenv + +echo "Creating root directory..." +sudo mkdir -p ${root_directory} +sudo chown $USER ${root_directory} + +echo "Creating virtualenv..." +virtualenv --python python3 ${root_directory}/venv + +echo "Installing pvcbootstrapd..." +cp -a bootstrap-daemon ${root_directory}/pvcbootstrapd + +echo "Determining IP addresses..." +bootstrap_address="$( awk -F'.' '{ print $1"."$2"."$3".1" }' <<<"${bootstrap_network}" )" +bootstrap_dhcpstart="$( awk -F'.' '{ print $1"."$2"."$3".100" }' <<<"${bootstrap_network}" )" +bootstrap_dhcpend="$( awk -F'.' '{ print $1"."$2"."$3".199" }' <<<"${bootstrap_network}" )" + +echo "Creating configuration..." +cp ${root_directory}/pvcbootstrapd/pvcbootstrapd.yaml.template ${root_directory}/pvcbootstrapd/pvcbootstrapd.yaml +sed -i "s/DEPLOY_USERNAME/${deploy_username}/" ${root_directory}/pvcbootstrapd/pvcbootstrapd.yaml +sed -i "s/ROOT_DIRECTORY/${root_directory}/" ${root_directory}/pvcbootstrapd/pvcbootstrapd.yaml +sed -i "s/BOOTSTRAP_ADDRESS/${bootstrap_address}/" ${root_directory}/pvcbootstrapd/pvcbootstrapd.yaml +sed -i "s/BOOTSTRAP_DHCPSTART/${bootstrap_dhcpstart}/" ${root_directory}/pvcbootstrapd/pvcbootstrapd.yaml +sed -i "s/BOOTSTRAP_DHCPEND/${bootstrap_dhcpend}/" ${root_directory}/pvcbootstrapd/pvcbootstrapd.yaml +sed -i "s/GIT_REMOTE/${git_remote}/" ${root_directory}/pvcbootstrapd/pvcbootstrapd.yaml +sed -i "s/GIT_BRANCH/${git_branch}/" ${root_directory}/pvcbootstrapd/pvcbootstrapd.yaml + +echo "Creating network configuration for interface ${bootstrap_interface} (is vLAN? ${is_bootstrap_interface_vlan})..." +if [[ "${is_bootstrap_interface_vlan}" == "yes" ]]; then +cat < /proc/sys/net/ipv4/ip_forward + post-up iptables -A FORWARD -i $IFACE -j ACCEPT + post-up iptables -A FORWARD -o $IFACE -m state --state ESTABLISHED,RELATED -j ACCEPT + post-up iptables -t nat -A POSTROUTING -i $IFACE -j MASQUERADE +EOF +else +cat < /proc/sys/net/ipv4/ip_forward + post-up iptables -A FORWARD -i $IFACE -j ACCEPT + post-up iptables -A FORWARD -o $IFACE -m state --state ESTABLISHED,RELATED -j ACCEPT + post-up iptables -t nat -A POSTROUTING -i $IFACE -j MASQUERADE +EOF +fi + +echo "Installing service units..." +cat < " +read edit_flag +case ${edit_flag} in + y|Y|yes|Yes|YES) + vim ${root_directory}/pvcbootstrapd/pvcbootstrapd.yaml + ;; + *) + true + ;; +esac +echo + +echo "Restart system to activate?" +echo -n "[Y/n] > " +read reboot_flag +case ${reboot_flag} in + n/N/no/No/NO) + exit 0 + ;; + *) + true + sudo reboot + ;; +esac