* [API Daemon] Fixes a bug that failed uploading of RAW block devices in "storage volume upload"
* [API Daemon/CLI Client] Adds support for VM automirrors, replicating the functionality of autobackup but for cross-cluster mirroring
* [CLI Client] Improves the help output of several commands
* [API Daemon/CLI Client] Moves VM snapshot age conversions to human-readable values out of the API and into the client to open up more programatic handling in the future
* [Worker Daemon] Improves the Celery logging output clarity by including the calling function in any task output
-- Joshua M. Boniface <joshua@boniface.me> Mon, 18 Nov 2024 10:53:56 -0500
**New Feature**: Adds VM snapshot sending (`vm snapshot send`), VM mirroring (`vm mirror create`), and (offline) mirror promotion (`vm mirror promote`). Permits transferring VM snapshots to remote clusters, individually or repeatedly, and promoting them to active status, for disaster recovery and migration between clusters.
**Breaking Change**: Migrates the API daemon into Gunicorn when in production mode. Permits more scalable and performant operation of the API. **Requires additional dependency packages on all coordinator nodes** (`gunicorn`, `python3-gunicorn`, `python3-setuptools`); upgrade via `pvc-ansible` is strongly recommended.
**Enhancement**: Provides whole cluster utilization stats in the cluster status data. Permits better observability into the overall resource utilization of the cluster.
**Enhancement**: Adds a new storage benchmark format (v2) which includes additional resource utilization statistics. This allows for better evaluation of storage performance impact on the cluster as a whole. The updated format also permits arbitrary benchmark job names for easier parsing and tracking.
* [API Daemon] Allows scanning of new volumes added manually via other commands
* [API Daemon/CLI Client] Adds whole cluster utilization statistics to cluster status
* [API Daemon] Moves production API execution into Gunicorn
* [API Daemon] Adds a new storage benchmark format (v2) with additional resource tracking
* [API Daemon] Adds support for named storage benchmark jobs
* [API Daemon] Fixes a bug in OSD creation which would create `split` OSDs if `--osd-count` was set to 1
* [API Daemon] Adds support for the `mirror` VM state used by snapshot mirrors
* [CLI Client] Fixes several output display bugs in various commands and in Worker task outputs
* [CLI Client] Improves and shrinks the status progress bar output to support longer messages
* [API Daemon] Adds support for sending snapshots to remote clusters
* [API Daemon] Adds support for updating and promoting snapshot mirrors to remote clusters
* [Node Daemon] Improves timeouts during primary/secondary coordinator transitions to avoid deadlocks
* [Node Daemon] Improves timeouts during keepalive updates to avoid deadlocks
* [Node Daemon] Refactors fencing thread structure to ensure a single fencing task per cluster and sequential node fences to avoid potential anomalies (e.g. fencing 2 nodes simultaneously)
* [Node Daemon] Fixes a bug in fencing if VM locks were already freed, leaving VMs in an invalid state
* [Node Daemon] Increases the wait time during system startup to ensure Zookeeper has more time to synchronize
-- Joshua M. Boniface <joshua@boniface.me> Tue, 15 Oct 2024 11:39:11 -0400
**Deprecation Warning**: `pvc vm backup` commands are now deprecated and will be removed in **0.9.100**. Use `pvc vm snapshot` commands instead.
**Breaking Change**: The on-disk format of VM snapshot exports differs from backup exports, and the PVC autobackup system now leverages these. It is recommended to start fresh with a new tree of backups for `pvc autobackup` for maximum compatibility.
**Breaking Change**: VM autobackups now run in `pvcworkerd` instead of the CLI client directly, allowing them to be triggerd from any node (or externally). It is important to apply the timer unit changes from the `pvc-ansible` role after upgrading to 0.9.99 to avoid duplicate runs.
**Usage Note**: VM snapshots are displayed in the `pvc vm list` and `pvc vm info` outputs, not in a unique "list" endpoint.
* [API Daemon] Adds a proper error when an invalid provisioner profile is specified
* [Node Daemon] Sorts Ceph pools properly in node keepalive to avoid incorrect ordering
* [Health Daemon] Improves handling of IPMI checks by adding multiple tries but a shorter timeout
* [API Daemon] Improves handling of XML parsing errors in VM configurations
* [ALL] Adds support for whole VM snapshots, including configuration XML details, and direct rollback to snapshots
* [ALL] Adds support for exporting and importing whole VM snapshots
* [Client CLI] Removes vCPU topology from short VM info output
* [Client CLI] Improves output format of VM info output
* [API Daemon] Adds an endpoint to get the current primary node
* [Client CLI] Fixes a bug where API requests were made 3 times
* [Other] Improves the build-and-deploy.sh script
* [API Daemon] Improves the "vm rename" command to avoid redefining VM, preserving history etc.
* [API Daemon] Adds an indication when a task is run on the primary node
* [API Daemon] Fixes a bug where the ZK schema relative path didn't work sometimes
-- Joshua M. Boniface <joshua@boniface.me> Wed, 28 Aug 2024 11:15:55 -0400
**Breaking Changes:** This release features a major reconfiguration to how monitoring and reporting of the cluster health works. Node health plugins now report "faults", as do several other issues which were previously manually checked for in "cluster" daemon library for the "/status" endpoint, from within the Health daemon. These faults are persistent, and under each given identifier can be triggered once and subsequent triggers simply update the "last reported" time. An additional set of API endpoints and commands are added to manage these faults, either by "ack"(nowledging) them (keeping the alert around to be further updated but setting its health delta to 0%), or "delete"ing them (completely removing the fault unless it retriggers), both individually, to (from the CLI) multiple, or all. Cluster health reporting is now done based on these faults instead of anything else, and the default interval for health checks is reduced to 15 seconds to accomodate this. In addition to this, Promethius metrics have been added, along with an example Grafana dashboard, for the PVC cluster itself, as well as a proxy to the Ceph cluster metrics. This release also fixes some bugs in the VM provisioner that were introduced in 0.9.83; these fixes require a **reimport or reconfiguration of any provisioner scripts**; reference the updated examples for details.
* [All] Adds persistent fault reporting to clusters, replacing the old cluster health calculations.
* [API Daemon] Adds cluster-level Prometheus metric exporting as well as a Ceph Prometheus proxy to the API.
* [CLI Client] Improves formatting output of "pvc cluster status".
* [Node Daemon] Fixes several bugs and enhances the working of the psql health check plugin.
* [Worker Daemon] Fixes several bugs in the example provisioner scripts, and moves the libvirt_schema library into the daemon common libraries.
-- Joshua M. Boniface <joshua@boniface.me> Sat, 09 Dec 2023 23:05:40 -0500
**Breaking Changes:** This release features a breaking change for the daemon config. A new unified "pvc.conf" file is required for all daemons (and the CLI client for Autobackup and API-on-this-host functionality), which will be written by the "pvc" role in the PVC Ansible framework. Using the "update-pvc-daemons" oneshot playbook from PVC Ansible is **required** to update to this release, as it will ensure this file is written to the proper place before deploying the new package versions, and also ensures that the old entires are cleaned up afterwards. In addition, this release fully splits the node worker and health subsystems into discrete daemons ("pvcworkerd" and "pvchealthd") and packages ("pvc-daemon-worker" and "pvc-daemon-health") respectively. The "pvc-daemon-node" package also now depends on both packages, and the "pvc-daemon-api" package can now be reliably used outside of the PVC nodes themselves (for instance, in a VM) without any strange cross-dependency issues.
* [All] Unifies all daemon (and on-node CLI task) configuration into a "pvc.conf" YAML configuration.
* [All] Splits the node worker subsystem into a discrete codebase and package ("pvc-daemon-worker"), still named "pvcworkerd".
* [All] Splits the node health subsystem into a discrete codebase and package ("pvc-daemon-health"), named "pvchealthd".
* [All] Improves Zookeeper node logging to avoid bugs and to support multiple simultaneous daemon writes.
* [All] Fixes several bugs in file logging and splits file logs by daemon.
* [Node Daemon] Improves several log messages to match new standards from Health daemon.
* [API Daemon] Reworks Celery task routing and handling to move all worker tasks to Worker daemon.
-- Joshua M. Boniface <joshua@boniface.me> Fri, 01 Dec 2023 17:33:53 -0500
**Breaking Changes:** This large release features a number of major changes. While these should all be a seamless transition, the behaviour of several commands and the backend system for handling them has changed significantly, along with new dependencies from PVC Ansible. A full cluster configuration update via `pvc.yml` is recommended after installing this version. Redis is replaced with KeyDB on coordinator nodes as a Celery backend; this transition will be handled gracefully by the `pvc-ansible` playbooks, though note that KeyDB will be exposed on the Upstream interface. The Celery worker system is renamed `pvcworkerd`, is now active on all nodes (coordinator and non-coordinator), and is expanded to encompass several commands that previously used a similar, custom setup within the node daemons, including "pvc vm flush-locks" and all "pvc storage osd" tasks. The previously-mentioned CLI commands now all feature "--wait"/"--no-wait" flags, with wait showing a progress bar and status output of the task run. The "pvc cluster task" command can now used for viewing all task types, replacing the previously-custom/specific "pvc provisioner status" command. All example provisioner scripts have been updated to leverage new helper functions in the Celery system; while updating these is optional, an administrator is recommended to do so for optimal log output behaviour.
* [All] Adds support for multiple OSDs on individual disks (NVMe workloads).
* [All] Corrects and updates OSD replace, refresh, remove, and add functionality; replace no longer purges.
* [All] Switches to KeyDB (multi-master) instead of Redis and adds node monitoring plugin.
* [All] Replaces Zookeeper/Node Daemon-based message passing and task handling with pvcworkerd Celery workers on all nodes; increases worker concurrency to 3 (per node).
* [All] Moves all task-like functions to Celery and updates existing Celery tasks to use new helpers and ID system.
* [CLI Client] Adds "--wait/--no-wait" options with progress bars to all Celery-based tasks, "--wait" default; adds a standardized task interface under "pvc cluster task".
* [Node Daemon] Cleans up the fencing handler and related functions.
* [Node Daemon] Fixes bugs with VM memory reporting during keepalives.
* [Node Daemon] Fixes a potential race condition during primary/secondary transition by backgrounding systemctl commands.
* [API Daemon] Updates example provisioner plugins to use new Celery functions.
-- Joshua M. Boniface <joshua@boniface.me> Fri, 17 Nov 2023 01:29:41 -0500
Ensure you have updated to the latest version of the PVC Ansible repository before deploying this version or using PVC Ansible oneshot playbooks for management.
**Breaking Change [CLI]**: The `--restart` option for VM configuration changes now has an explicit `--no-restart` to disable restarting, or a prompt if neither is specified; `--unsafe` no longer bypasses this prompt which was a bug. Applies to most `vm <cmd> set` commands like `vm vcpu set`, `vm memory set`, etc. All instances also feature restart confirmation afterwards, which, if `--restart` is provided, will prompt for confirmation unless `--yes` or `--unsafe` is specified.
**Breaking Change [CLI]**: The `--long` option previously on some `info` commands no longer exists; use `-f long`/`--format long` instead.
* [CLI] Significantly refactors the CLI client code for consistency and cleanliness
* [CLI] Implements `-f`/`--format` options for all `list` and `info` commands in a consistent way
* [CLI] Changes the behaviour of VM modification options with "--restart" to provide a "--no-restart"; defaults to a prompt if neither is specified and ignores the "--unsafe" global entirely
* [API] Fixes several bugs in the 3-debootstrap.py provisioner example script
* [Node] Fixes some bugs around VM shutdown on node flush
* [Documentation] Adds mentions of Ganeti and Harvester
-- Joshua M. Boniface <joshua@boniface.me> Fri, 18 Aug 2023 12:20:43 -0400
* [Documentation] Reworks and updates various documentation sections
* [Node Daemon] Adjusts the fencing process to use a power off rather than a power reset for maximum certainty
* [Node Daemon] Ensures that MTU values are validated during the first read too
* [Node Daemon] Corrects the loading of the bridge_mtu value to use the current active setting rather than a fixed default to prevent unintended surprises
-- Joshua M. Boniface <joshua@boniface.me> Tue, 12 Oct 2021 13:48:19 -0400
* [Node Daemon] Fixes several bugs and crashes in node daemon
* [General] Updates linting rules for newer Flake8 linter
* [Daemons/CLI client] Adds VM network and disk hot attach/detach support; NOTE: Changes the default behaviour of `pvc vm network add`/`remove` and `pvc vm volume add`/`remove`
* [API Daemon] Adds checks for pool size when resizing volumes
* [API Daemon] Adds checks for RAM and vCPU sizes when defining or modifying VMs
* [Node Daemon] Adjusts log text of VM migrations to show the correct source node
* [API Daemon] Adjusts the OVA importer to support floppy RASD types for compatability
* [API Daemon] Ensures that volume resize commands without a suffix get B appended
* [API Daemon] Removes the explicit setting of image-features in PVC; defaulting to the limited set has been moved to the ceph.conf configuration on nodes via PVC Ansible
* [Node Daemon] Removes Rados module polling of Ceph cluster and returns to command-based polling for timeout purposes, and removes some flaky return statements