Bump version to 0.9.103

Fix double-appending domain bug
Since storage_hosts now includes the storage domain as FQDNs, don't re-append it within vmbuilder.
2024-11-01 17:23:24 -04:00 · 2024-11-01 17:18:51 -04:00 · 2024-10-30 13:12:08 -04:00 · 2024-10-30 12:53:29 -04:00 · 2024-10-25 23:51:08 -04:00 · 2024-10-25 23:47:33 -04:00
71 changed files with 16483 additions and 12269 deletions
--- a/.bbuilder-tasks.yaml
+++ b/.bbuilder-tasks.yaml
@ -4,4 +4,4 @@ bbuilder:
    published:
      - git submodule update --init
      - /bin/bash build-stable-deb.sh
-      - sudo /usr/local/bin/deploy-package -C pvc
+      - sudo /usr/local/bin/deploy-package -C pvc -D bookworm
--- a/.version
+++ b/.version
@ -1 +1 @@
-0.9.89
+0.9.103
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,5 +1,128 @@
 ## PVC Changelog
 ###### [v0.9.103](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.103)
  * [Provisioner] Fixes a bug with the change in `storage_hosts` to FQDNs affecting the VM Builder
  * [Monitoring] Fixes the Munin plugin to work properly with sudo
 ###### [v0.9.102](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.102)
  * [API Daemon] Ensures that received config snapshots update storage hosts in addition to secret UUIDs
  * [CLI Client] Fixes several bugs around local connection handling and connection listings
 ###### [v0.9.101](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.101)
  **New Feature**: Adds VM snapshot sending (`vm snapshot send`), VM mirroring (`vm mirror create`), and (offline) mirror promotion (`vm mirror promote`). Permits transferring VM snapshots to remote clusters, individually or repeatedly, and promoting them to active status, for disaster recovery and migration between clusters.
  **Breaking Change**: Migrates the API daemon into Gunicorn when in production mode. Permits more scalable and performant operation of the API. **Requires additional dependency packages on all coordinator nodes** (`gunicorn`, `python3-gunicorn`, `python3-setuptools`); upgrade via `pvc-ansible` is strongly recommended.
  **Enhancement**: Provides whole cluster utilization stats in the cluster status data. Permits better observability into the overall resource utilization of the cluster.
  **Enhancement**: Adds a new storage benchmark format (v2) which includes additional resource utilization statistics. This allows for better evaluation of storage performance impact on the cluster as a whole. The updated format also permits arbitrary benchmark job names for easier parsing and tracking.
  * [API Daemon] Allows scanning of new volumes added manually via other commands
  * [API Daemon/CLI Client] Adds whole cluster utilization statistics to cluster status
  * [API Daemon] Moves production API execution into Gunicorn
  * [API Daemon] Adds a new storage benchmark format (v2) with additional resource tracking
  * [API Daemon] Adds support for named storage benchmark jobs
  * [API Daemon] Fixes a bug in OSD creation which would create `split` OSDs if `--osd-count` was set to 1
  * [API Daemon] Adds support for the `mirror` VM state used by snapshot mirrors
  * [CLI Client] Fixes several output display bugs in various commands and in Worker task outputs
  * [CLI Client] Improves and shrinks the status progress bar output to support longer messages
  * [API Daemon] Adds support for sending snapshots to remote clusters
  * [API Daemon] Adds support for updating and promoting snapshot mirrors to remote clusters
  * [Node Daemon] Improves timeouts during primary/secondary coordinator transitions to avoid deadlocks
  * [Node Daemon] Improves timeouts during keepalive updates to avoid deadlocks
  * [Node Daemon] Refactors fencing thread structure to ensure a single fencing task per cluster and sequential node fences to avoid potential anomalies (e.g. fencing 2 nodes simultaneously)
  * [Node Daemon] Fixes a bug in fencing if VM locks were already freed, leaving VMs in an invalid state
  * [Node Daemon] Increases the wait time during system startup to ensure Zookeeper has more time to synchronize
 ###### [v0.9.100](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.100)
  * [API Daemon] Improves the handling of "detect:" disk strings on newer systems by leveraging the "nvme" command
  * [Client CLI] Update help text about "detect:" disk strings
  * [Meta] Updates deprecation warnings and updates builder to only add this version for Debian 12 (Bookworm)
 ###### [v0.9.99](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.99)
  **Deprecation Warning**: `pvc vm backup` commands are now deprecated and will be removed in a future version. Use `pvc vm snapshot` commands instead.
  **Breaking Change**: The on-disk format of VM snapshot exports differs from backup exports, and the PVC autobackup system now leverages these. It is recommended to start fresh with a new tree of backups for `pvc autobackup` for maximum compatibility.
  **Breaking Change**: VM autobackups now run in `pvcworkerd` instead of the CLI client directly, allowing them to be triggerd from any node (or externally). It is important to apply the timer unit changes from the `pvc-ansible` role after upgrading to 0.9.99 to avoid duplicate runs.
  **Usage Note**: VM snapshots are displayed in the `pvc vm list` and `pvc vm info` outputs, not in a unique "list" endpoint.
  * [API Daemon] Adds a proper error when an invalid provisioner profile is specified
  * [Node Daemon] Sorts Ceph pools properly in node keepalive to avoid incorrect ordering
  * [Health Daemon] Improves handling of IPMI checks by adding multiple tries but a shorter timeout
  * [API Daemon] Improves handling of XML parsing errors in VM configurations
  * [ALL] Adds support for whole VM snapshots, including configuration XML details, and direct rollback to snapshots
  * [ALL] Adds support for exporting and importing whole VM snapshots
  * [Client CLI] Removes vCPU topology from short VM info output
  * [Client CLI] Improves output format of VM info output
  * [API Daemon] Adds an endpoint to get the current primary node
  * [Client CLI] Fixes a bug where API requests were made 3 times
  * [Other] Improves the build-and-deploy.sh script
  * [API Daemon] Improves the "vm rename" command to avoid redefining VM, preserving history etc.
  * [API Daemon] Adds an indication when a task is run on the primary node
  * [API Daemon] Fixes a bug where the ZK schema relative path didn't work sometimes
 ###### [v0.9.98](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.98)
  * [CLI Client] Fixed output when API call times out
  * [Node Daemon] Improves the handling of fence states
  * [API Daemon/CLI Client] Adds support for storage snapshot rollback
  * [CLI Client] Adds additional warning messages about snapshot consistency to help output
  * [API Daemon] Fixes a bug listing snapshots by pool/volume
  * [Node Daemon] Adds a --version flag for information gathering by update-motd.sh
 ###### [v0.9.97](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.97)
  * [Client CLI] Ensures --lines is always an integer value
  * [Node Daemon] Fixes a bug if d_network changes during iteration
  * [Node Daemon] Moves to using allocated instead of free memory for node reporting
  * [API Daemon] Fixes a bug if lingering RBD snapshots exist when removing a volume (#180)
 ###### [v0.9.96](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.96)
  * [API Daemon] Fixes a bug when reporting node stats
  * [API Daemon] Fixes a bug deleteing successful benchmark results
 ###### [v0.9.95](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.95)
  * [API Daemon/CLI Client] Adds a flag to allow duplicate VNIs in network templates
  * [API Daemon] Ensures that storage template disks are returned in disk ID order
  * [Client CLI] Fixes a display bug showing all OSDs as split
 ###### [v0.9.94](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.94)
  * [CLI Client] Fixes an incorrect ordering issue with autobackup summary emails
  * [API Daemon/CLI Client] Adds an additional safety check for 80% cluster fullness when doing volume adds or resizes
  * [API Daemon/CLI Client] Adds safety checks to volume clones as well
  * [API Daemon] Fixes a few remaining memory bugs for stopped/disabled VMs
 ###### [v0.9.93](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.93)
  * [API Daemon] Fixes a bug where stuck zkhandler threads were not cleaned up on error
 ###### [v0.9.92](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.92)
  * [CLI Client] Adds the new restore state to the colours list for VM status
  * [API Daemon] Fixes an incorrect variable assignment
  * [Provisioner] Improves the error handling of various steps in the debootstrap and rinse example scripts
  * [CLI Client] Fixes two bugs around missing keys that were added recently (uses get() instead direct dictionary refs)
  * [CLI Client] Improves API error handling via GET retries (x3) and better server status code handling
 ###### [v0.9.91](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.91)
  * [Client CLI] Fixes a bug and improves output during cluster task events.
  * [Client CLI] Improves the output of the task list display.
  * [Provisioner] Fixes some missing cloud-init modules in the default debootstrap script.
  * [Client CLI] Fixes a bug with a missing argument to the vm_define helper function.
  * [All] Fixes inconsistent package find + rm commands to avoid errors in dpkg.
 ###### [v0.9.90](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.90)
  * [Client CLI/API Daemon] Adds additional backup metainfo and an emailed report option to autobackups.
  * [All] Adds a live migration maximum downtime selector to help with busy VM migrations.
  * [API Daemon] Fixes a database migration bug on Debian 10/11.
  * [Node Daemon] Fixes a race condition when applying Zookeeper schema changes.
 ###### [v0.9.89](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.89)
  * [API/Worker Daemons] Fixes a bug with the Celery result backends not being properly initialized on Debian 10/11.
--- a/README.md
+++ b/README.md
@ -1,10 +1,11 @@
 <p align="center">
-<img alt="Logo banner" src="images/pvc_logo_black.png"/>
+<img alt="Logo banner" src="https://docs.parallelvirtualcluster.org/en/latest/images/pvc_logo_black.png"/>
 <br/><br/>
 <a href="https://www.parallelvirtualcluster.org"><img alt="Website" src="https://img.shields.io/badge/visit-website-blue"/></a>
 <a href="https://github.com/parallelvirtualcluster/pvc/releases"><img alt="Latest Release" src="https://img.shields.io/github/release-pre/parallelvirtualcluster/pvc"/></a>
 <a href="https://docs.parallelvirtualcluster.org/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/parallelvirtualcluster/badge/?version=latest"/></a>
 <a href="https://github.com/parallelvirtualcluster/pvc"><img alt="License" src="https://img.shields.io/github/license/parallelvirtualcluster/pvc"/></a>
 <a href="https://github.com/psf/black"><img alt="Code style: Black" src="https://img.shields.io/badge/code%20style-black-000000.svg"/></a>
 <a href="https://github.com/parallelvirtualcluster/pvc/releases"><img alt="Release" src="https://img.shields.io/github/release-pre/parallelvirtualcluster/pvc"/></a>
 <a href="https://docs.parallelvirtualcluster.org/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/parallelvirtualcluster/badge/?version=latest"/></a>
 </p>
 ## What is PVC?
@ -23,62 +24,64 @@ Installation of PVC is accomplished by two main components: a [Node installer IS
 Just give it physical servers, and it will run your VMs without you having to think about it, all in just an hour or two of setup time.
 More information about PVC, its motivations, the hardware requirements, and setting up and managing a cluster [can be found over at our docs page](https://docs.parallelvirtualcluster.org).
 ## Getting Started
 To get started with PVC, please see the [About](https://docs.parallelvirtualcluster.org/en/latest/about-pvc/) page for general information about the project, and the [Getting Started](https://docs.parallelvirtualcluster.org/en/latest/deployment/getting-started/) page for details on configuring your first cluster.
 ## Changelog
-View the changelog in [CHANGELOG.md](CHANGELOG.md). **Please note that any breaking changes are announced here; ensure you read the changelog before upgrading!**
+View the changelog in [CHANGELOG.md](https://github.com/parallelvirtualcluster/pvc/blob/master/CHANGELOG.md). **Please note that any breaking changes are announced here; ensure you read the changelog before upgrading!**
 ## Screenshots
 These screenshots show some of the available functionality of the PVC system and CLI as of PVC v0.9.85.
-<p><img alt="0. Integrated help" src="images/0-integrated-help.png"/><br/>
+<p><img alt="0. Integrated help" src="https://raw.githubusercontent.com/parallelvirtualcluster/pvc/refs/heads/master/images/0-integrated-help.png"/><br/>
 <i>The CLI features an integrated, fully-featured help system to show details about every possible command.</i>
 </p>
-<p><img alt="1. Connection management" src="images/1-connection-management.png"/><br/>
+<p><img alt="1. Connection management" src="https://raw.githubusercontent.com/parallelvirtualcluster/pvc/refs/heads/master/images/1-connection-management.png"/><br/>
 <i>A single CLI instance can manage multiple clusters, including a quick detail view, and will default to a "local" connection if an "/etc/pvc/pvc.conf" file is found; sensitive API keys are hidden by default.</i>
 </p>
-<p><img alt="2. Cluster details and output formats" src="images/2-cluster-details-and-output-formats.png"/><br/>
+<p><img alt="2. Cluster details and output formats" src="https://raw.githubusercontent.com/parallelvirtualcluster/pvc/refs/heads/master/images/2-cluster-details-and-output-formats.png"/><br/>
 <i>PVC can show the key details of your cluster at a glance, including health, persistent fault events, and key resources; the CLI can output both in pretty human format and JSON for easier machine parsing in scripts.</i>
 </p>
-<p><img alt="3. Node information" src="images/3-node-information.png"/><br/>
+<p><img alt="3. Node information" src="https://raw.githubusercontent.com/parallelvirtualcluster/pvc/refs/heads/master/images/3-node-information.png"/><br/>
 <i>PVC can show details about the nodes in the cluster, including their live health and resource utilization.</i>
 </p>
-<p><img alt="4. VM information" src="images/4-vm-information.png"/><br/>
+<p><img alt="4. VM information" src="https://raw.githubusercontent.com/parallelvirtualcluster/pvc/refs/heads/master/images/4-vm-information.png"/><br/>
 <i>PVC can show details about the VMs in the cluster, including their state, resource allocations, current hosting node, and metadata.</i>
 </p>
-<p><img alt="5. VM details" src="images/5-vm-details.png"/><br/>
+<p><img alt="5. VM details" src="https://raw.githubusercontent.com/parallelvirtualcluster/pvc/refs/heads/master/images/5-vm-details.png"/><br/>
 <i>In addition to the above basic details, PVC can also show extensive information about a running VM's devices and other resource utilization.</i>
 </p>
-<p><img alt="6. Network information" src="images/6-network-information.png"/><br/>
+<p><img alt="6. Network information" src="https://raw.githubusercontent.com/parallelvirtualcluster/pvc/refs/heads/master/images/6-network-information.png"/><br/>
 <i>PVC has two major client network types, and ensures a consistent configuration of client networks across the entire cluster; managed networks can feature DHCP, DNS, firewall, and other functionality including DHCP reservations.</i>
 </p>
-<p><img alt="7. Storage information" src="images/7-storage-information.png"/><br/>
+<p><img alt="7. Storage information" src="https://raw.githubusercontent.com/parallelvirtualcluster/pvc/refs/heads/master/images/7-storage-information.png"/><br/>
 <i>PVC provides a convenient abstracted view of the underlying Ceph system and can manage all core aspects of it.</i>
 </p>
-<p><img alt="8. VM and node logs" src="images/8-vm-and-node-logs.png"/><br/>
+<p><img alt="8. VM and node logs" src="https://raw.githubusercontent.com/parallelvirtualcluster/pvc/refs/heads/master/images/8-vm-and-node-logs.png"/><br/>
 <i>PVC can display logs from VM serial consoles (if properly configured) and nodes in-client to facilitate quick troubleshooting.</i>
 </p>
-<p><img alt="9. VM and worker tasks" src="images/9-vm-and-worker-tasks.png"/><br/>
+<p><img alt="9. VM and worker tasks" src="https://raw.githubusercontent.com/parallelvirtualcluster/pvc/refs/heads/master/images/9-vm-and-worker-tasks.png"/><br/>
 <i>PVC provides full VM lifecycle management, as well as long-running worker-based commands (in this example, clearing a VM's storage locks).</i>
 </p>
-<p><img alt="10. Provisioner" src="images/10-provisioner.png"/><br/>
+<p><img alt="10. Provisioner" src="https://raw.githubusercontent.com/parallelvirtualcluster/pvc/refs/heads/master/images/10-provisioner.png"/><br/>
 <i>PVC features an extensively customizable and configurable VM provisioner system, including EC2-compatible CloudInit support, allowing you to define flexible VM profiles and provision new VMs with a single command.</i>
 </p>
-<p><img alt="11. Prometheus and Grafana dashboard" src="images/11-prometheus-grafana.png"/><br/>
+<p><img alt="11. Prometheus and Grafana dashboard" src="https://raw.githubusercontent.com/parallelvirtualcluster/pvc/refs/heads/master/images/11-prometheus-grafana.png"/><br/>
 <i>PVC features several monitoring integration examples under "node-daemon/monitoring", including CheckMK, Munin, and, most recently, Prometheus, including an example Grafana dashboard for cluster monitoring and alerting.</i>
 </p>
--- a/api-daemon/migrations/versions/977e7b4d3497_pvc_version_0_9_89.py
+++ b/api-daemon/migrations/versions/977e7b4d3497_pvc_version_0_9_89.py
@ -0,0 +1,28 @@
 """PVC version 0.9.89
 Revision ID: 977e7b4d3497
 Revises: 88fa0d88a9f8
 Create Date: 2024-01-10 16:09:44.659027
 """
 from alembic import op
 import sqlalchemy as sa
 # revision identifiers, used by Alembic.
 revision = '977e7b4d3497'
 down_revision = '88fa0d88a9f8'
 branch_labels = None
 depends_on = None
 def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.add_column('system_template', sa.Column('migration_max_downtime', sa.Integer(), default="300", server_default="300", nullable=True))
    # ### end Alembic commands ###
 def downgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.drop_column('system_template', 'migration_max_downtime')
    # ### end Alembic commands ###
--- a/api-daemon/provisioner/examples/script/3-debootstrap.py
+++ b/api-daemon/provisioner/examples/script/3-debootstrap.py
@ -150,6 +150,10 @@
 from daemon_lib.vmbuilder import VMBuilder
 # These are some global variables used below
 default_root_password = "test123"
 # The VMBuilderScript class must be named as such, and extend VMBuilder.
 class VMBuilderScript(VMBuilder):
    def setup(self):
@ -498,11 +502,15 @@ class VMBuilderScript(VMBuilder):
        ret = os.system(
            f"debootstrap --include={','.join(deb_packages)} {deb_release} {temp_dir} {deb_mirror}"
        )
        ret = int(ret >> 8)
        if ret > 0:
-            self.fail("Failed to run debootstrap")
+            self.fail(f"Debootstrap failed with exit code {ret}")
        # Bind mount the devfs so we can grub-install later
-        os.system("mount --bind /dev {}/dev".format(temp_dir))
+        ret = os.system("mount --bind /dev {}/dev".format(temp_dir))
        ret = int(ret >> 8)
        if ret > 0:
            self.fail(f"/dev bind mount failed with exit code {ret}")
        # Create an fstab entry for each volume
        fstab_file = "{}/etc/fstab".format(temp_dir)
@ -589,11 +597,13 @@ After=multi-user.target
                 - migrator
                 - bootcmd
                 - write-files
                 - growpart
                 - resizefs
                 - set_hostname
                 - update_hostname
                 - update_etc_hosts
                 - ca-certs
                 - users-groups
                 - ssh
                cloud_config_modules:
@ -686,23 +696,36 @@ GRUB_DISABLE_LINUX_UUID=false
        # Do some tasks inside the chroot using the provided context manager
        with chroot(temp_dir):
            # Install and update GRUB
-            os.system(
+            ret = os.system(
                "grub-install --force /dev/rbd/{}/{}_{}".format(
                    root_volume["pool"], vm_name, root_volume["disk_id"]
                )
            )
-            os.system("update-grub")
+            ret = int(ret >> 8)
            if ret > 0:
                self.fail(f"GRUB install failed with exit code {ret}")
            ret = os.system("update-grub")
            ret = int(ret >> 8)
            if ret > 0:
                self.fail(f"GRUB update failed with exit code {ret}")
            # Set a really dumb root password so the VM can be debugged
            # EITHER CHANGE THIS YOURSELF, here or in Userdata, or run something after install
            # to change the root password: don't leave it like this on an Internet-facing machine!
-            os.system("echo root:test123 | chpasswd")
+            ret = os.system(f"echo root:{default_root_password} | chpasswd")
            ret = int(ret >> 8)
            if ret > 0:
                self.fail(f"Root password change failed with exit code {ret}")
            # Enable cloud-init target on (first) boot
            # Your user-data should handle this and disable it once done, or things get messy.
            # That cloud-init won't run without this hack seems like a bug... but even the official
            # Debian cloud images are affected, so who knows.
-            os.system("systemctl enable cloud-init.target")
+            ret = os.system("systemctl enable cloud-init.target")
            ret = int(ret >> 8)
            if ret > 0:
                self.fail(f"Enable of cloud-init failed with exit code {ret}")
    def cleanup(self):
        """
@ -727,7 +750,7 @@ GRUB_DISABLE_LINUX_UUID=false
        temp_dir = "/tmp/target"
        # Unmount the bound devfs
-        os.system("umount {}/dev".format(temp_dir))
+        os.system("umount -f {}/dev".format(temp_dir))
        # Use this construct for reversing the list, as the normal reverse() messes with the list
        for volume in list(reversed(self.vm_data["volumes"])):
@ -744,7 +767,7 @@ GRUB_DISABLE_LINUX_UUID=false
            ):
                # Unmount filesystem
                retcode, stdout, stderr = pvc_common.run_os_command(
-                    f"umount {mount_path}"
+                    f"umount -f {mount_path}"
                )
                if retcode:
                    self.log_err(
--- a/api-daemon/provisioner/examples/script/4-rinse.py
+++ b/api-daemon/provisioner/examples/script/4-rinse.py
@ -150,6 +150,11 @@
 from daemon_lib.vmbuilder import VMBuilder
 # These are some global variables used below
 default_root_password = "test123"
 default_local_time = "UTC"
 # The VMBuilderScript class must be named as such, and extend VMBuilder.
 class VMBuilderScript(VMBuilder):
    def setup(self):
@ -524,13 +529,23 @@ class VMBuilderScript(VMBuilder):
        ret = os.system(
            f"rinse --arch {rinse_architecture} --directory {temporary_directory} --distribution {rinse_release} --cache-dir {rinse_cache} --add-pkg-list /tmp/addpkg --verbose {mirror_arg}"
        )
        ret = int(ret >> 8)
        if ret > 0:
-            self.fail("Failed to run rinse")
+            self.fail(f"Rinse failed with exit code {ret}")
        # Bind mount the devfs, sysfs, and procfs so we can grub-install later
-        os.system("mount --bind /dev {}/dev".format(temporary_directory))
+        ret = os.system("mount --bind /dev {}/dev".format(temporary_directory))
-        os.system("mount --bind /sys {}/sys".format(temporary_directory))
+        ret = int(ret >> 8)
-        os.system("mount --bind /proc {}/proc".format(temporary_directory))
+        if ret > 0:
            self.fail(f"/dev bind mount failed with exit code {ret}")
        ret = os.system("mount --bind /sys {}/sys".format(temporary_directory))
        ret = int(ret >> 8)
        if ret > 0:
            self.fail(f"/sys bind mount failed with exit code {ret}")
        ret = os.system("mount --bind /proc {}/proc".format(temporary_directory))
        ret = int(ret >> 8)
        if ret > 0:
            self.fail(f"/proc bind mount failed with exit code {ret}")
        # Create an fstab entry for each volume
        fstab_file = "{}/etc/fstab".format(temporary_directory)
@ -642,41 +657,76 @@ GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=
        # Do some tasks inside the chroot using the provided context manager
        with chroot(temporary_directory):
            # Fix the broken kernel from rinse by setting a systemd machine ID and running the post scripts
-            os.system("systemd-machine-id-setup")
+            ret = os.system("systemd-machine-id-setup")
-            os.system(
+            ret = int(ret >> 8)
            if ret > 0:
                self.fail(f"Machine ID setup failed with exit code {ret}")
            ret = os.system(
                "rpm -q --scripts kernel-core | grep -A20  'posttrans scriptlet' | tail -n+2 | bash -x"
            )
            ret = int(ret >> 8)
            if ret > 0:
                self.fail(f"RPM kernel reinstall failed with exit code {ret}")
            # Install any post packages
-            os.system(f"dnf install -y {' '.join(post_packages)}")
+            if len(post_packages) > 0:
                ret = os.system(f"dnf install -y {' '.join(post_packages)}")
                ret = int(ret >> 8)
                if ret > 0:
                    self.fail(f"DNF install failed with exit code {ret}")
            # Install and update GRUB config
-            os.system(
+            ret = os.system(
                "grub2-install --force /dev/rbd/{}/{}_{}".format(
                    root_volume["pool"], vm_name, root_volume["disk_id"]
                )
            )
            ret = int(ret >> 8)
            if ret > 0:
                self.fail(f"GRUB install failed with exit code {ret}")
            os.system("grub2-mkconfig -o /boot/grub2/grub.cfg")
            ret = int(ret >> 8)
            if ret > 0:
                self.fail(f"GRUB update failed with exit code {ret}")
            # Set a really dumb root password so the VM can be debugged
            # EITHER CHANGE THIS YOURSELF, here or in Userdata, or run something after install
            # to change the root password: don't leave it like this on an Internet-facing machine!
-            os.system("echo root:test123 | chpasswd")
+            ret = os.system(f"echo root:{default_root_password} | chpasswd")
            ret = int(ret >> 8)
            if ret > 0:
                self.fail(f"Root password change failed with exit code {ret}")
            # Enable dbus-broker
-            os.system("systemctl enable dbus-broker.service")
+            ret = os.system("systemctl enable dbus-broker.service")
            ret = int(ret >> 8)
            if ret > 0:
                self.fail(f"Enable of dbus-broker failed with exit code {ret}")
            # Enable NetworkManager
            os.system("systemctl enable NetworkManager.service")
            ret = int(ret >> 8)
            if ret > 0:
                self.fail(f"Enable of NetworkManager failed with exit code {ret}")
            # Enable cloud-init target on (first) boot
            # Your user-data should handle this and disable it once done, or things get messy.
            # That cloud-init won't run without this hack seems like a bug... but even the official
            # Debian cloud images are affected, so who knows.
            os.system("systemctl enable cloud-init.target")
            ret = int(ret >> 8)
            if ret > 0:
                self.fail(f"Enable of cloud-init failed with exit code {ret}")
            # Set the timezone to UTC
-            os.system("ln -sf ../usr/share/zoneinfo/UTC /etc/localtime")
+            ret = os.system(
                f"ln -sf ../usr/share/zoneinfo/{default_local_time} /etc/localtime"
            )
            ret = int(ret >> 8)
            if ret > 0:
                self.fail(f"Localtime update failed with exit code {ret}")
    def cleanup(self):
        """
--- a/api-daemon/pvc-api-db-upgrade
+++ b/api-daemon/pvc-api-db-upgrade
@ -12,15 +12,7 @@ fi
 pushd /usr/share/pvc
-case "$( cat /etc/debian_version )" in
+export FLASK_APP=./pvcapid-manage-flask.py
-    10.*|11.*)
+flask db upgrade
        # Debian 10 & 11
        ./pvcapid-manage_legacy.py db upgrade
    ;;
    *)
        # Debian 12+
        flask --app ./pvcapid-manage_flask.py db upgrade
    ;;
 esac
 popd
--- a/api-daemon/pvcapid-manage-flask.py
+++ b/api-daemon/pvcapid-manage-flask.py
--- a/api-daemon/pvcapid-manage-zk.py
+++ b/api-daemon/pvcapid-manage-zk.py
@ -21,4 +21,5 @@
 from daemon_lib.zkhandler import ZKSchema
-ZKSchema.write()
+schema = ZKSchema(root_path=".")
 schema.write()
--- a/api-daemon/pvcapid-manage_legacy.py
+++ b/api-daemon/pvcapid-manage_legacy.py
@ -1,33 +0,0 @@
 #!/usr/bin/env python3
 # pvcapid-manage_legacy.py - PVC Database management tasks (Legacy)
 # Part of the Parallel Virtual Cluster (PVC) system
 #
 #    Copyright (C) 2018-2024 Joshua M. Boniface <joshua@boniface.me>
 #
 #    This program is free software: you can redistribute it and/or modify
 #    it under the terms of the GNU General Public License as published by
 #    the Free Software Foundation, version 3.
 #
 #    This program is distributed in the hope that it will be useful,
 #    but WITHOUT ANY WARRANTY; without even the implied warranty of
 #    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 #    GNU General Public License for more details.
 #
 #    You should have received a copy of the GNU General Public License
 #    along with this program.  If not, see <https://www.gnu.org/licenses/>.
 #
 ###############################################################################
 from flask_migrate import Migrate, MigrateCommand, Manager
 from pvcapid.flaskapi import app, db
 from pvcapid.models import *  # noqa F401,F403
 migrate = Migrate(app, db)
 manager = Manager(app)
 manager.add_command("db", MigrateCommand)
 if __name__ == "__main__":
    manager.run()
--- a/api-daemon/pvcapid.py
+++ b/api-daemon/pvcapid.py
@ -19,6 +19,13 @@
 #
 ###############################################################################
-import pvcapid.Daemon  # noqa: F401
+import sys
 from os import path
 # Ensure current directory (/usr/share/pvc) is in the system path for Gunicorn
 current_dir = path.dirname(path.abspath(__file__))
 sys.path.append(current_dir)
 import pvcapid.Daemon  # noqa: F401, E402
 pvcapid.Daemon.entrypoint()
--- a/api-daemon/pvcapid/Daemon.py
+++ b/api-daemon/pvcapid/Daemon.py
@ -19,15 +19,13 @@
 #
 ###############################################################################
-
+import subprocess
 from ssl import SSLContext, TLSVersion
 from distutils.util import strtobool as dustrtobool
 import daemon_lib.config as cfg
 # Daemon version
-version = "0.9.89"
+version = "0.9.100~git-73c0834f"
 # API version
 API_VERSION = 1.0
@ -53,7 +51,6 @@ def strtobool(stringv):
 # Configuration Parsing
 ##########################################################
 # Get our configuration
 config = cfg.get_configuration()
 config["daemon_name"] = "pvcapid"
@ -61,22 +58,16 @@ config["daemon_version"] = version
 ##########################################################
-# Entrypoint
+# Flask App Creation for Gunicorn
 ##########################################################
-def entrypoint():
+def create_app():
-    import pvcapid.flaskapi as pvc_api  # noqa: E402
+    """
-
+    Create and return the Flask app and SSL context if necessary.
-    if config["api_ssl_enabled"]:
+    """
-        context = SSLContext()
+    # Import the Flask app from pvcapid.flaskapi after adjusting the path
-        context.minimum_version = TLSVersion.TLSv1
+    import pvcapid.flaskapi as pvc_api
        context.get_ca_certs()
        context.load_cert_chain(
            config["api_ssl_cert_file"], keyfile=config["api_ssl_key_file"]
        )
    else:
        context = None
    # Print our startup messages
    print("")
@ -102,9 +93,69 @@ def entrypoint():
    print("")
    pvc_api.celery_startup()
-    pvc_api.app.run(
+
    return pvc_api.app
 ##########################################################
 # Entrypoint
 ##########################################################
 def entrypoint():
    if config["debug"]:
        app = create_app()
        if config["api_ssl_enabled"]:
            ssl_context = SSLContext()
            ssl_context.minimum_version = TLSVersion.TLSv1
            ssl_context.get_ca_certs()
            ssl_context.load_cert_chain(
                config["api_ssl_cert_file"], keyfile=config["api_ssl_key_file"]
            )
        else:
            ssl_context = None
        app.run(
            config["api_listen_address"],
            config["api_listen_port"],
            threaded=True,
-        ssl_context=context,
+            ssl_context=ssl_context,
        )
    else:
        # Build the command to run Gunicorn
        gunicorn_cmd = [
            "gunicorn",
            "--workers",
            "1",
            "--threads",
            "8",
            "--timeout",
            "86400",
            "--bind",
            "{}:{}".format(config["api_listen_address"], config["api_listen_port"]),
            "pvcapid.Daemon:create_app()",
            "--log-level",
            "info",
            "--access-logfile",
            "-",
            "--error-logfile",
            "-",
        ]
        if config["api_ssl_enabled"]:
            gunicorn_cmd += [
                "--certfile",
                config["api_ssl_cert_file"],
                "--keyfile",
                config["api_ssl_key_file"],
            ]
        # Run Gunicorn
        try:
            subprocess.run(gunicorn_cmd)
        except KeyboardInterrupt:
            exit(0)
        except Exception as e:
            print(e)
            exit(1)
--- a/api-daemon/pvcapid/flaskapi.py
+++ b/api-daemon/pvcapid/flaskapi.py
--- a/api-daemon/pvcapid/helper.py
+++ b/api-daemon/pvcapid/helper.py
@ -21,7 +21,9 @@
 import flask
 import json
 import logging
 import lxml.etree as etree
 import sys
 from re import match
 from requests import get
@ -40,6 +42,15 @@ import daemon_lib.network as pvc_network
 import daemon_lib.ceph as pvc_ceph
 logger = logging.getLogger(__name__)
 logger.setLevel(logging.INFO)
 handler = logging.StreamHandler(sys.stdout)
 handler.setLevel(logging.INFO)
 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
 handler.setFormatter(formatter)
 logger.addHandler(handler)
 #
 # Cluster base functions
 #
@ -641,6 +652,7 @@ def vm_define(
    selector,
    autostart,
    migration_method,
    migration_max_downtime=300,
    user_tags=[],
    protected_tags=[],
 ):
@ -668,6 +680,7 @@ def vm_define(
        selector,
        autostart,
        migration_method,
        migration_max_downtime,
        profile=None,
        tags=tags,
    )
@ -763,6 +776,134 @@ def vm_restore(
    return output, retcode
@ZKConnection(config)
 def create_vm_snapshot(
    zkhandler,
    domain,
    snapshot_name=None,
 ):
    """
    Take a snapshot of a VM.
    """
    retflag, retdata = pvc_vm.create_vm_snapshot(
        zkhandler,
        domain,
        snapshot_name,
    )
    if retflag:
        retcode = 200
    else:
        retcode = 400
    output = {"message": retdata.replace('"', "'")}
    return output, retcode
@ZKConnection(config)
 def remove_vm_snapshot(
    zkhandler,
    domain,
    snapshot_name,
 ):
    """
    Take a snapshot of a VM.
    """
    retflag, retdata = pvc_vm.remove_vm_snapshot(
        zkhandler,
        domain,
        snapshot_name,
    )
    if retflag:
        retcode = 200
    else:
        retcode = 400
    output = {"message": retdata.replace('"', "'")}
    return output, retcode
@ZKConnection(config)
 def rollback_vm_snapshot(
    zkhandler,
    domain,
    snapshot_name,
 ):
    """
    Roll back to a snapshot of a VM.
    """
    retflag, retdata = pvc_vm.rollback_vm_snapshot(
        zkhandler,
        domain,
        snapshot_name,
    )
    if retflag:
        retcode = 200
    else:
        retcode = 400
    output = {"message": retdata.replace('"', "'")}
    return output, retcode
@ZKConnection(config)
 def export_vm_snapshot(
    zkhandler,
    domain,
    snapshot_name,
    export_path,
    incremental_parent=None,
 ):
    """
    Export a snapshot of a VM to files.
    """
    retflag, retdata = pvc_vm.export_vm_snapshot(
        zkhandler,
        domain,
        snapshot_name,
        export_path,
        incremental_parent,
    )
    if retflag:
        retcode = 200
    else:
        retcode = 400
    output = {"message": retdata.replace('"', "'")}
    return output, retcode
@ZKConnection(config)
 def import_vm_snapshot(
    zkhandler,
    domain,
    snapshot_name,
    export_path,
    retain_snapshot=False,
 ):
    """
    Import a snapshot of a VM from files.
    """
    retflag, retdata = pvc_vm.import_vm_snapshot(
        zkhandler,
        domain,
        snapshot_name,
        export_path,
        retain_snapshot,
    )
    if retflag:
        retcode = 200
    else:
        retcode = 400
    output = {"message": retdata.replace('"', "'")}
    return output, retcode
@ZKConnection(config)
 def vm_attach_device(zkhandler, vm, device_spec_xml):
    """
@ -826,6 +967,7 @@ def get_vm_meta(zkhandler, vm):
        domain_node_selector,
        domain_node_autostart,
        domain_migrate_method,
        domain_migrate_max_downtime,
    ) = pvc_common.getDomainMetadata(zkhandler, dom_uuid)
    retcode = 200
@ -835,6 +977,7 @@ def get_vm_meta(zkhandler, vm):
        "node_selector": domain_node_selector.lower(),
        "node_autostart": domain_node_autostart,
        "migration_method": domain_migrate_method.lower(),
        "migration_max_downtime": int(domain_migrate_max_downtime),
    }
    return retdata, retcode
@ -842,7 +985,14 @@ def get_vm_meta(zkhandler, vm):
@ZKConnection(config)
 def update_vm_meta(
-    zkhandler, vm, limit, selector, autostart, provisioner_profile, migration_method
+    zkhandler,
    vm,
    limit,
    selector,
    autostart,
    provisioner_profile,
    migration_method,
    migration_max_downtime,
 ):
    """
    Update metadata of a VM.
@ -858,7 +1008,14 @@ def update_vm_meta(
            autostart = False
    retflag, retdata = pvc_vm.modify_vm_metadata(
-        zkhandler, vm, limit, selector, autostart, provisioner_profile, migration_method
+        zkhandler,
        vm,
        limit,
        selector,
        autostart,
        provisioner_profile,
        migration_method,
        migration_max_downtime,
    )
    if retflag:
@ -996,11 +1153,11 @@ def vm_remove(zkhandler, name):
@ZKConnection(config)
-def vm_start(zkhandler, name):
+def vm_start(zkhandler, name, force=False):
    """
    Start a VM in the PVC cluster.
    """
-    retflag, retdata = pvc_vm.start_vm(zkhandler, name)
+    retflag, retdata = pvc_vm.start_vm(zkhandler, name, force=force)
    if retflag:
        retcode = 200
@ -1044,11 +1201,11 @@ def vm_shutdown(zkhandler, name, wait):
@ZKConnection(config)
-def vm_stop(zkhandler, name):
+def vm_stop(zkhandler, name, force=False):
    """
    Forcibly stop a VM in the PVC cluster.
    """
-    retflag, retdata = pvc_vm.stop_vm(zkhandler, name)
+    retflag, retdata = pvc_vm.stop_vm(zkhandler, name, force=force)
    if retflag:
        retcode = 200
@ -1062,7 +1219,7 @@ def vm_stop(zkhandler, name):
@ZKConnection(config)
 def vm_disable(zkhandler, name, force=False):
    """
-    Disable (shutdown or force stop if required)a  VM in the PVC cluster.
+    Disable (shutdown or force stop if required) a VM in the PVC cluster.
    """
    retflag, retdata = pvc_vm.disable_vm(zkhandler, name, force=force)
@ -1134,7 +1291,7 @@ def vm_flush_locks(zkhandler, vm):
        zkhandler, None, None, None, vm, is_fuzzy=False, negate=False
    )
-    if retdata[0].get("state") not in ["stop", "disable"]:
+    if retdata[0].get("state") not in ["stop", "disable", "mirror"]:
        return {"message": "VM must be stopped to flush locks"}, 400
    retflag, retdata = pvc_vm.flush_locks(zkhandler, vm)
@ -1148,6 +1305,342 @@ def vm_flush_locks(zkhandler, vm):
    return output, retcode
@ZKConnection(config)
 def vm_snapshot_receive_block_full(zkhandler, pool, volume, snapshot, size, request):
    """
    Receive an RBD volume from a remote system
    """
    import rados
    import rbd
    _, rbd_detail = pvc_ceph.get_list_volume(
        zkhandler, pool, limit=volume, is_fuzzy=False
    )
    if len(rbd_detail) > 0:
        volume_exists = True
    else:
        volume_exists = False
    cluster = rados.Rados(conffile="/etc/ceph/ceph.conf")
    cluster.connect()
    ioctx = cluster.open_ioctx(pool)
    if not volume_exists:
        rbd_inst = rbd.RBD()
        rbd_inst.create(ioctx, volume, size)
        retflag, retdata = pvc_ceph.add_volume(
            zkhandler, pool, volume, str(size) + "B", force_flag=True, zk_only=True
        )
        if not retflag:
            ioctx.close()
            cluster.shutdown()
            if retflag:
                retcode = 200
            else:
                retcode = 400
            output = {"message": retdata.replace('"', "'")}
            return output, retcode
    image = rbd.Image(ioctx, volume)
    last_chunk = 0
    chunk_size = 1024 * 1024 * 1024
    logger.info(f"Importing full snapshot {pool}/{volume}@{snapshot}")
    while True:
        chunk = request.stream.read(chunk_size)
        if not chunk:
            break
        image.write(chunk, last_chunk)
        last_chunk += len(chunk)
    image.close()
    ioctx.close()
    cluster.shutdown()
    return {"message": "Successfully received RBD block device"}, 200
@ZKConnection(config)
 def vm_snapshot_receive_block_diff(
    zkhandler, pool, volume, snapshot, source_snapshot, request
 ):
    """
    Receive an RBD volume from a remote system
    """
    import rados
    import rbd
    cluster = rados.Rados(conffile="/etc/ceph/ceph.conf")
    cluster.connect()
    ioctx = cluster.open_ioctx(pool)
    image = rbd.Image(ioctx, volume)
    if len(request.files) > 0:
        logger.info(f"Applying {len(request.files)} RBD diff chunks for {snapshot}")
        for i in range(len(request.files)):
            object_key = f"object_{i}"
            if object_key in request.files:
                object_data = request.files[object_key].read()
                offset = int.from_bytes(object_data[:8], "big")
                length = int.from_bytes(object_data[8:16], "big")
                data = object_data[16 : 16 + length]
                logger.info(f"Applying RBD diff chunk at {offset} ({length} bytes)")
                image.write(data, offset)
    else:
        return {"message": "No data received"}, 400
    image.close()
    ioctx.close()
    cluster.shutdown()
    return {
        "message": f"Successfully received {len(request.files)} RBD diff chunks"
    }, 200
@ZKConnection(config)
 def vm_snapshot_receive_block_createsnap(zkhandler, pool, volume, snapshot):
    """
    Create the snapshot of a remote volume
    """
    import rados
    import rbd
    cluster = rados.Rados(conffile="/etc/ceph/ceph.conf")
    cluster.connect()
    ioctx = cluster.open_ioctx(pool)
    image = rbd.Image(ioctx, volume)
    image.create_snap(snapshot)
    image.close()
    ioctx.close()
    cluster.shutdown()
    retflag, retdata = pvc_ceph.add_snapshot(
        zkhandler, pool, volume, snapshot, zk_only=True
    )
    if not retflag:
        if retflag:
            retcode = 200
        else:
            retcode = 400
        output = {"message": retdata.replace('"', "'")}
        return output, retcode
    return {"message": "Successfully received RBD snapshot"}, 200
@ZKConnection(config)
 def vm_snapshot_receive_config(zkhandler, snapshot, vm_config, source_snapshot=None):
    """
    Receive a VM configuration snapshot from a remote system, and modify it to work on our system
    """
    def parse_unified_diff(diff_text, original_text):
        """
        Take a unified diff and apply it to an original string
        """
        # Split the original string into lines
        original_lines = original_text.splitlines(keepends=True)
        patched_lines = []
        original_idx = 0  # Track position in original lines
        diff_lines = diff_text.splitlines(keepends=True)
        for line in diff_lines:
            if line.startswith("---") or line.startswith("+++"):
                # Ignore prefix lines
                continue
            if line.startswith("@@"):
                # Extract line numbers from the diff hunk header
                hunk_header = line
                parts = hunk_header.split(" ")
                original_range = parts[1]
                # Get the starting line number and range length for the original file
                original_start, _ = map(int, original_range[1:].split(","))
                # Adjust for zero-based indexing
                original_start -= 1
                # Add any lines between the current index and the next hunk's start
                while original_idx < original_start:
                    patched_lines.append(original_lines[original_idx])
                    original_idx += 1
            elif line.startswith("-"):
                # This line should be removed from the original, skip it
                original_idx += 1
            elif line.startswith("+"):
                # This line should be added to the patched version, removing the '+'
                patched_lines.append(line[1:])
            else:
                # Context line (unchanged), it has no prefix, add from the original
                patched_lines.append(original_lines[original_idx])
                original_idx += 1
        # Add any remaining lines from the original file after the last hunk
        patched_lines.extend(original_lines[original_idx:])
        return "".join(patched_lines).strip()
    # Get our XML configuration for this snapshot
    # We take the main XML configuration, then apply the diff for this particular incremental
    current_snapshot = [s for s in vm_config["snapshots"] if s["name"] == snapshot][0]
    vm_xml = vm_config["xml"]
    vm_xml_diff = "\n".join(current_snapshot["xml_diff_lines"])
    snapshot_vm_xml = parse_unified_diff(vm_xml_diff, vm_xml)
    xml_data = etree.fromstring(snapshot_vm_xml)
    # Replace the Ceph storage secret UUID with this cluster's
    our_ceph_secret_uuid = config["ceph_secret_uuid"]
    ceph_secrets = xml_data.xpath("//secret[@type='ceph']")
    for ceph_secret in ceph_secrets:
        ceph_secret.set("uuid", our_ceph_secret_uuid)
    # Replace the Ceph source hosts with this cluster's
    our_ceph_storage_hosts = config["storage_hosts"]
    our_ceph_storage_port = str(config["ceph_monitor_port"])
    ceph_sources = xml_data.xpath("//source[@protocol='rbd']")
    for ceph_source in ceph_sources:
        for host in ceph_source.xpath("host"):
            ceph_source.remove(host)
        for ceph_storage_host in our_ceph_storage_hosts:
            new_host = etree.Element("host")
            new_host.set("name", ceph_storage_host)
            new_host.set("port", our_ceph_storage_port)
            ceph_source.append(new_host)
    # Regenerate the VM XML
    snapshot_vm_xml = etree.tostring(xml_data, pretty_print=True).decode("utf8")
    if (
        source_snapshot is not None
        or pvc_vm.searchClusterByUUID(zkhandler, vm_config["uuid"]) is not None
    ):
        logger.info(
            f"Receiving incremental VM configuration for {vm_config['name']}@{snapshot}"
        )
        # Modify the VM based on our passed detail
        retcode, retmsg = pvc_vm.modify_vm(
            zkhandler,
            vm_config["uuid"],
            False,
            snapshot_vm_xml,
        )
        if not retcode:
            retcode = 400
            retdata = {"message": retmsg}
            return retdata, retcode
        retcode, retmsg = pvc_vm.modify_vm_metadata(
            zkhandler,
            vm_config["uuid"],
            None,  # Node limits are left unchanged
            vm_config["node_selector"],
            vm_config["node_autostart"],
            vm_config["profile"],
            vm_config["migration_method"],
            vm_config["migration_max_downtime"],
        )
        if not retcode:
            retcode = 400
            retdata = {"message": retmsg}
            return retdata, retcode
        current_vm_tags = zkhandler.children(("domain.meta.tags", vm_config["uuid"]))
        new_vm_tags = [t["name"] for t in vm_config["tags"]]
        remove_tags = []
        add_tags = []
        for tag in vm_config["tags"]:
            if tag["name"] not in current_vm_tags:
                add_tags.append((tag["name"], tag["protected"]))
        for tag in current_vm_tags:
            if tag not in new_vm_tags:
                remove_tags.append(tag)
        for tag in add_tags:
            name, protected = tag
            pvc_vm.modify_vm_tag(
                zkhandler, vm_config["uuid"], "add", name, protected=protected
            )
        for tag in remove_tags:
            pvc_vm.modify_vm_tag(zkhandler, vm_config["uuid"], "remove", name)
    else:
        logger.info(
            f"Receiving full VM configuration for {vm_config['name']}@{snapshot}"
        )
        # Define the VM based on our passed detail
        retcode, retmsg = pvc_vm.define_vm(
            zkhandler,
            snapshot_vm_xml,
            None,  # Target node is autoselected
            None,  # Node limits are invalid here so ignore them
            vm_config["node_selector"],
            vm_config["node_autostart"],
            vm_config["migration_method"],
            vm_config["migration_max_downtime"],
            vm_config["profile"],
            vm_config["tags"],
            "mirror",
        )
        if not retcode:
            retcode = 400
            retdata = {"message": retmsg}
            return retdata, retcode
    # Add this snapshot to the VM manually in Zookeeper
    zkhandler.write(
        [
            (
                (
                    "domain.snapshots",
                    vm_config["uuid"],
                    "domain_snapshot.name",
                    snapshot,
                ),
                snapshot,
            ),
            (
                (
                    "domain.snapshots",
                    vm_config["uuid"],
                    "domain_snapshot.timestamp",
                    snapshot,
                ),
                current_snapshot["timestamp"],
            ),
            (
                (
                    "domain.snapshots",
                    vm_config["uuid"],
                    "domain_snapshot.xml",
                    snapshot,
                ),
                snapshot_vm_xml,
            ),
            (
                (
                    "domain.snapshots",
                    vm_config["uuid"],
                    "domain_snapshot.rbd_snapshots",
                    snapshot,
                ),
                ",".join(current_snapshot["rbd_snapshots"]),
            ),
        ]
    )
    return {"message": "Successfully received VM configuration snapshot"}, 200
 #
 # Network functions
 #
@ -1851,11 +2344,29 @@ def ceph_volume_list(zkhandler, pool=None, limit=None, is_fuzzy=True):
@ZKConnection(config)
-def ceph_volume_add(zkhandler, pool, name, size):
+def ceph_volume_scan(zkhandler, pool, name):
    """
    (Re)scan a Ceph RBD volume for stats in the PVC Ceph storage cluster.
    """
    retflag, retdata = pvc_ceph.scan_volume(zkhandler, pool, name)
    if retflag:
        retcode = 200
    else:
        retcode = 400
    output = {"message": retdata.replace('"', "'")}
    return output, retcode
@ZKConnection(config)
 def ceph_volume_add(zkhandler, pool, name, size, force_flag=False):
    """
    Add a Ceph RBD volume to the PVC Ceph storage cluster.
    """
-    retflag, retdata = pvc_ceph.add_volume(zkhandler, pool, name, size)
+    retflag, retdata = pvc_ceph.add_volume(
        zkhandler, pool, name, size, force_flag=force_flag
    )
    if retflag:
        retcode = 200
@ -1867,11 +2378,13 @@ def ceph_volume_add(zkhandler, pool, name, size):
@ZKConnection(config)
-def ceph_volume_clone(zkhandler, pool, name, source_volume):
+def ceph_volume_clone(zkhandler, pool, name, source_volume, force_flag):
    """
    Clone a Ceph RBD volume to a new volume on the PVC Ceph storage cluster.
    """
-    retflag, retdata = pvc_ceph.clone_volume(zkhandler, pool, source_volume, name)
+    retflag, retdata = pvc_ceph.clone_volume(
        zkhandler, pool, source_volume, name, force_flag=force_flag
    )
    if retflag:
        retcode = 200
@ -1883,11 +2396,13 @@ def ceph_volume_clone(zkhandler, pool, name, source_volume):
@ZKConnection(config)
-def ceph_volume_resize(zkhandler, pool, name, size):
+def ceph_volume_resize(zkhandler, pool, name, size, force_flag):
    """
    Resize an existing Ceph RBD volume in the PVC Ceph storage cluster.
    """
-    retflag, retdata = pvc_ceph.resize_volume(zkhandler, pool, name, size)
+    retflag, retdata = pvc_ceph.resize_volume(
        zkhandler, pool, name, size, force_flag=force_flag
    )
    if retflag:
        retcode = 200
@ -2159,6 +2674,22 @@ def ceph_volume_snapshot_rename(zkhandler, pool, volume, name, new_name):
    return output, retcode
@ZKConnection(config)
 def ceph_volume_snapshot_rollback(zkhandler, pool, volume, name):
    """
    Roll back a Ceph RBD volume to a given snapshot in the PVC Ceph storage cluster.
    """
    retflag, retdata = pvc_ceph.rollback_snapshot(zkhandler, pool, volume, name)
    if retflag:
        retcode = 200
    else:
        retcode = 400
    output = {"message": retdata.replace('"', "'")}
    return output, retcode
@ZKConnection(config)
 def ceph_volume_snapshot_remove(zkhandler, pool, volume, name):
    """
--- a/api-daemon/pvcapid/models.py
+++ b/api-daemon/pvcapid/models.py
@ -36,6 +36,7 @@ class DBSystemTemplate(db.Model):
    node_selector = db.Column(db.Text)
    node_autostart = db.Column(db.Boolean, nullable=False)
    migration_method = db.Column(db.Text)
    migration_max_downtime = db.Column(db.Integer, default=300, server_default="300")
    ova = db.Column(db.Integer, db.ForeignKey("ova.id"), nullable=True)
    def __init__(
@ -50,6 +51,7 @@ class DBSystemTemplate(db.Model):
        node_selector,
        node_autostart,
        migration_method,
        migration_max_downtime,
        ova=None,
    ):
        self.name = name
@ -62,6 +64,7 @@ class DBSystemTemplate(db.Model):
        self.node_selector = node_selector
        self.node_autostart = node_autostart
        self.migration_method = migration_method
        self.migration_max_downtime = migration_max_downtime
        self.ova = ova
    def __repr__(self):
--- a/api-daemon/pvcapid/provisioner.py
+++ b/api-daemon/pvcapid/provisioner.py
@ -125,7 +125,7 @@ def list_template(limit, table, is_fuzzy=True):
            args = (template_data["id"],)
            cur.execute(query, args)
            disks = cur.fetchall()
-            data[template_id]["disks"] = disks
+            data[template_id]["disks"] = sorted(disks, key=lambda x: x["disk_id"])
    close_database(conn, cur)
@ -221,6 +221,7 @@ def create_template_system(
    node_selector=None,
    node_autostart=False,
    migration_method=None,
    migration_max_downtime=None,
    ova=None,
 ):
    if list_template_system(name, is_fuzzy=False)[-1] != 404:
@ -231,7 +232,7 @@ def create_template_system(
    if node_selector == "none":
        node_selector = None
-    query = "INSERT INTO system_template (name, vcpu_count, vram_mb, serial, vnc, vnc_bind, node_limit, node_selector, node_autostart, migration_method, ova) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s);"
+    query = "INSERT INTO system_template (name, vcpu_count, vram_mb, serial, vnc, vnc_bind, node_limit, node_selector, node_autostart, migration_method, migration_max_downtime, ova) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s);"
    args = (
        name,
        vcpu_count,
@ -243,6 +244,7 @@ def create_template_system(
        node_selector,
        node_autostart,
        migration_method,
        migration_max_downtime,
        ova,
    )
@ -282,12 +284,13 @@ def create_template_network(name, mac_template=None):
    return retmsg, retcode
-def create_template_network_element(name, vni):
+def create_template_network_element(name, vni, permit_duplicate=False):
    if list_template_network(name, is_fuzzy=False)[-1] != 200:
        retmsg = {"message": 'The network template "{}" does not exist.'.format(name)}
        retcode = 400
        return retmsg, retcode
    if not permit_duplicate:
        networks, code = list_template_network_vnis(name)
        if code != 200:
            networks = []
@ -438,6 +441,7 @@ def modify_template_system(
    node_selector=None,
    node_autostart=None,
    migration_method=None,
    migration_max_downtime=None,
 ):
    if list_template_system(name, is_fuzzy=False)[-1] != 200:
        retmsg = {"message": 'The system template "{}" does not exist.'.format(name)}
@ -505,6 +509,11 @@ def modify_template_system(
    if migration_method is not None:
        fields.append({"field": "migration_method", "data": migration_method})
    if migration_max_downtime is not None:
        fields.append(
            {"field": "migration_max_downtime", "data": int(migration_max_downtime)}
        )
    conn, cur = open_database(config)
    try:
        for field in fields:
--- a/api-daemon/swagger.html
+++ b/api-daemon/swagger.html
@ -1,13 +0,0 @@
 <!DOCTYPE html>
 <html>
  <head>
    <title>PVC Client API Documentation</title>
    <meta charset="utf-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <style> body { margin: 0; padding: 0; } </style>
  </head>
  <body>
    <redoc spec-url='./swagger.json' hide-loading></redoc>
    <script src="https://rebilly.github.io/ReDoc/releases/latest/redoc.min.js"> </script>
  </body>
 </html>
--- a/api-daemon/swagger.json
+++ b/api-daemon/swagger.json
--- a/build-and-deploy.sh
+++ b/build-and-deploy.sh
@ -13,6 +13,8 @@ else
 fi
 KEEP_ARTIFACTS=""
 API_ONLY=""
 PRIMARY_NODE=""
 if [[ -n ${1} ]]; then
    for arg in ${@}; do
        case ${arg} in
@ -20,33 +22,45 @@ if [[ -n ${1} ]]; then
                KEEP_ARTIFACTS="y"
                shift
            ;;
            -a|--api-only)
                API_ONLY="y"
                shift
            ;;
            -p=*|--become-primary=*)
                PRIMARY_NODE=$( awk -F'=' '{ print $NF }' <<<"${arg}" )
                shift
            ;;
        esac
    done
 fi
 HOSTS=( ${@} )
-echo "> Deploying to host(s): ${HOSTS[@]}"
+echo "Deploying to host(s): ${HOSTS[@]}"
 if [[ -n ${PRIMARY_NODE} ]]; then
    echo "Will become primary on ${PRIMARY_NODE} after updating it"
 fi
 # Move to repo root if we're not
 pushd $( git rev-parse --show-toplevel ) &>/dev/null
 # Prepare code
-echo "Preparing code (format and lint)..."
+echo "> Preparing code (format and lint)..."
 ./format || exit 1
 ./lint || exit 1
 # Build the packages
-echo -n "Building packages..."
+echo -n "> Building packages..."
 version="$( ./build-unstable-deb.sh 2>/dev/null )"
 echo " done. Package version ${version}."
 # Install the client(s) locally
-echo -n "Installing client packages locally..."
+echo -n "> Installing client packages locally..."
 $SUDO dpkg -i --force-all ../pvc-client*_${version}*.deb &>/dev/null
 echo " done".
 echo "> Copying packages..."
 for HOST in ${HOSTS[@]}; do
-    echo -n "Copying packages to host ${HOST}..."
+    echo -n ">>> Copying packages to host ${HOST}..."
    ssh $HOST $SUDO rm -rf /tmp/pvc &>/dev/null
    ssh $HOST mkdir /tmp/pvc &>/dev/null
    scp ../pvc-*_${version}*.deb $HOST:/tmp/pvc/ &>/dev/null
@ -57,26 +71,34 @@ if [[ -z ${KEEP_ARTIFACTS} ]]; then
 fi
 for HOST in ${HOSTS[@]}; do
-    echo "> Deploying packages to host ${HOST}"
+    echo "> Deploying packages on host ${HOST}"
-    echo -n "Installing packages..."
+    echo -n ">>> Installing packages..."
    ssh $HOST $SUDO dpkg -i --force-all /tmp/pvc/*.deb &>/dev/null
    ssh $HOST rm -rf /tmp/pvc &>/dev/null
    echo " done."
-    echo -n "Restarting PVC daemons..."
+    echo -n ">>> Restarting PVC daemons..."
    ssh $HOST $SUDO systemctl restart pvcapid &>/dev/null
    sleep 2
    ssh $HOST $SUDO systemctl restart pvcworkerd &>/dev/null
    if [[ -z ${API_ONLY} ]]; then
    sleep 2
    ssh $HOST $SUDO systemctl restart pvchealthd &>/dev/null
-#    sleep 2
+    sleep 2
-#    ssh $HOST $SUDO systemctl restart pvcnoded &>/dev/null
+    ssh $HOST $SUDO systemctl restart pvcnoded &>/dev/null
    echo " done."
-    echo -n "Waiting for node daemon to be running..."
+    echo -n ">>> Waiting for node daemon to be running..."
    while [[ $( ssh $HOST "pvc -q node list -f json ${HOST%%.*} | jq -r '.[].daemon_state'" 2>/dev/null ) != "run" ]]; do
        sleep 5
        echo -n "."
    done
    fi
    echo " done."
    if [[ -n ${PRIMARY_NODE} && ${PRIMARY_NODE} == ${HOST} ]]; then
        echo -n ">>> Setting node $HOST to primary coordinator state... "
        ssh $HOST pvc -q node primary --wait &>/dev/null
        ssh $HOST $SUDO systemctl restart pvcworkerd &>/dev/null
        echo "done."
    fi
 done
 popd &>/dev/null
--- a/client-cli/pvc/cli/cli.py
+++ b/client-cli/pvc/cli/cli.py
--- a/client-cli/pvc/cli/formatters.py
+++ b/client-cli/pvc/cli/formatters.py
@ -83,6 +83,37 @@ def cli_cluster_status_format_pretty(CLI_CONFIG, data):
    total_volumes = data.get("volumes", 0)
    total_snapshots = data.get("snapshots", 0)
    total_cpu_total = data.get("resources", {}).get("cpu", {}).get("total", 0)
    total_cpu_load = data.get("resources", {}).get("cpu", {}).get("load", 0)
    total_cpu_utilization = (
        data.get("resources", {}).get("cpu", {}).get("utilization", 0)
    )
    total_cpu_string = (
        f"{total_cpu_utilization:.1f}% ({total_cpu_load:.1f} / {total_cpu_total})"
    )
    total_memory_total = (
        data.get("resources", {}).get("memory", {}).get("total", 0) / 1024
    )
    total_memory_used = (
        data.get("resources", {}).get("memory", {}).get("used", 0) / 1024
    )
    total_memory_utilization = (
        data.get("resources", {}).get("memory", {}).get("utilization", 0)
    )
    total_memory_string = f"{total_memory_utilization:.1f}% ({total_memory_used:.1f} GB / {total_memory_total:.1f} GB)"
    total_disk_total = (
        data.get("resources", {}).get("disk", {}).get("total", 0) / 1024 / 1024
    )
    total_disk_used = (
        data.get("resources", {}).get("disk", {}).get("used", 0) / 1024 / 1024
    )
    total_disk_utilization = round(
        data.get("resources", {}).get("disk", {}).get("utilization", 0)
    )
    total_disk_string = f"{total_disk_utilization:.1f}% ({total_disk_used:.1f} GB / {total_disk_total:.1f} GB)"
    if maintenance == "true" or health == -1:
        health_colour = ansii["blue"]
    elif health > 90:
@ -94,9 +125,6 @@ def cli_cluster_status_format_pretty(CLI_CONFIG, data):
    output = list()
    output.append(f"{ansii['bold']}PVC cluster status:{ansii['end']}")
    output.append("")
    output.append(f"{ansii['purple']}Primary node:{ansii['end']}   {primary_node}")
    output.append(f"{ansii['purple']}PVC version:{ansii['end']}    {pvc_version}")
    output.append(f"{ansii['purple']}Upstream IP:{ansii['end']}    {upstream_ip}")
@ -136,7 +164,17 @@ def cli_cluster_status_format_pretty(CLI_CONFIG, data):
            )
        messages = "\n                ".join(message_list)
-        output.append(f"{ansii['purple']}Active Faults:{ansii['end']} {messages}")
+    else:
        messages = "None"
    output.append(f"{ansii['purple']}Active faults:{ansii['end']}  {messages}")
    output.append(f"{ansii['purple']}Total CPU:{ansii['end']}      {total_cpu_string}")
    output.append(
        f"{ansii['purple']}Total memory:{ansii['end']}   {total_memory_string}"
    )
    output.append(f"{ansii['purple']}Total disk:{ansii['end']}     {total_disk_string}")
    output.append("")
@ -168,12 +206,12 @@ def cli_cluster_status_format_pretty(CLI_CONFIG, data):
    output.append(f"{ansii['purple']}Nodes:{ansii['end']}          {nodes_string}")
-    vm_states = ["start", "disable"]
+    vm_states = ["start", "disable", "mirror"]
    vm_states.extend(
        [
            state
            for state in data.get("vms", {}).keys()
-            if state not in ["total", "start", "disable"]
+            if state not in ["total", "start", "disable", "mirror"]
        ]
    )
@ -183,8 +221,10 @@ def cli_cluster_status_format_pretty(CLI_CONFIG, data):
            continue
        if state in ["start"]:
            state_colour = ansii["green"]
-        elif state in ["migrate", "disable", "provision"]:
+        elif state in ["migrate", "disable", "provision", "mirror"]:
            state_colour = ansii["blue"]
        elif state in ["mirror"]:
            state_colour = ansii["purple"]
        elif state in ["stop", "fail"]:
            state_colour = ansii["red"]
        else:
@ -258,9 +298,6 @@ def cli_cluster_status_format_short(CLI_CONFIG, data):
    output = list()
    output.append(f"{ansii['bold']}PVC cluster status:{ansii['end']}")
    output.append("")
    if health != "-1":
        health = f"{health}%"
    else:
@ -295,7 +332,48 @@ def cli_cluster_status_format_short(CLI_CONFIG, data):
            )
        messages = "\n               ".join(message_list)
-        output.append(f"{ansii['purple']}Active Faults:{ansii['end']} {messages}")
+    else:
        messages = "None"
    output.append(f"{ansii['purple']}Active faults:{ansii['end']}  {messages}")
    total_cpu_total = data.get("resources", {}).get("cpu", {}).get("total", 0)
    total_cpu_load = data.get("resources", {}).get("cpu", {}).get("load", 0)
    total_cpu_utilization = (
        data.get("resources", {}).get("cpu", {}).get("utilization", 0)
    )
    total_cpu_string = (
        f"{total_cpu_utilization:.1f}% ({total_cpu_load:.1f} / {total_cpu_total})"
    )
    total_memory_total = (
        data.get("resources", {}).get("memory", {}).get("total", 0) / 1024
    )
    total_memory_used = (
        data.get("resources", {}).get("memory", {}).get("used", 0) / 1024
    )
    total_memory_utilization = (
        data.get("resources", {}).get("memory", {}).get("utilization", 0)
    )
    total_memory_string = f"{total_memory_utilization:.1f}% ({total_memory_used:.1f} GB / {total_memory_total:.1f} GB)"
    total_disk_total = (
        data.get("resources", {}).get("disk", {}).get("total", 0) / 1024 / 1024
    )
    total_disk_used = (
        data.get("resources", {}).get("disk", {}).get("used", 0) / 1024 / 1024
    )
    total_disk_utilization = round(
        data.get("resources", {}).get("disk", {}).get("utilization", 0)
    )
    total_disk_string = f"{total_disk_utilization:.1f}% ({total_disk_used:.1f} GB / {total_disk_total:.1f} GB)"
    output.append(f"{ansii['purple']}CPU usage:{ansii['end']}      {total_cpu_string}")
    output.append(
        f"{ansii['purple']}Memory usage:{ansii['end']}   {total_memory_string}"
    )
    output.append(f"{ansii['purple']}Disk usage:{ansii['end']}     {total_disk_string}")
    output.append("")
@ -580,9 +658,11 @@ def cli_cluster_fault_list_format_long(CLI_CONFIG, fault_data):
                fault_id=fault["id"],
                fault_status=fault["status"].title(),
                fault_health_delta=f"-{fault['health_delta']}%",
-                fault_acknowledged_at=fault["acknowledged_at"]
+                fault_acknowledged_at=(
                    fault["acknowledged_at"]
                    if fault["acknowledged_at"] != ""
-                else "N/A",
+                    else "N/A"
                ),
                fault_last_reported=fault["last_reported"],
                fault_first_reported=fault["first_reported"],
            )
@ -645,6 +725,24 @@ def cli_cluster_task_format_pretty(CLI_CONFIG, task_data):
        if _task_type_length > task_type_length:
            task_type_length = _task_type_length
        for arg_name, arg_data in task["kwargs"].items():
            # Skip the "run_on" argument
            if arg_name == "run_on":
                continue
            # task_arg_name column
            _task_arg_name_length = len(str(arg_name)) + 1
            if _task_arg_name_length > task_arg_name_length:
                task_arg_name_length = _task_arg_name_length
    task_header_length = (
        task_id_length + task_name_length + task_type_length + task_worker_length + 3
    )
    max_task_data_length = (
        MAX_CONTENT_WIDTH - task_header_length - task_arg_name_length - 2
    )
    for task in task_data:
        updated_kwargs = list()
        for arg_name, arg_data in task["kwargs"].items():
            # Skip the "run_on" argument
@ -656,8 +754,22 @@ def cli_cluster_task_format_pretty(CLI_CONFIG, task_data):
            if _task_arg_name_length > task_arg_name_length:
                task_arg_name_length = _task_arg_name_length
-            if len(str(arg_data)) > 17:
+            if isinstance(arg_data, list):
-                arg_data = arg_data[:17] + "..."
+                for subarg_data in arg_data:
                    if len(subarg_data) > max_task_data_length:
                        subarg_data = (
                            str(subarg_data[: max_task_data_length - 4]) + " ..."
                        )
                    # task_arg_data column
                    _task_arg_data_length = len(str(subarg_data)) + 1
                    if _task_arg_data_length > task_arg_data_length:
                        task_arg_data_length = _task_arg_data_length
                    updated_kwargs.append({"name": arg_name, "data": subarg_data})
            else:
                if len(str(arg_data)) > 24:
                    arg_data = str(arg_data[:24]) + " ..."
                    # task_arg_data column
                    _task_arg_data_length = len(str(arg_data)) + 1
@ -665,6 +777,7 @@ def cli_cluster_task_format_pretty(CLI_CONFIG, task_data):
                        task_arg_data_length = _task_arg_data_length
                updated_kwargs.append({"name": arg_name, "data": arg_data})
        task["kwargs"] = updated_kwargs
        tasks.append(task)
@ -792,7 +905,7 @@ def cli_connection_list_format_pretty(CLI_CONFIG, data):
    # Parse each connection and adjust field lengths
    for connection in data:
        for field, length in [(f, fields[f]["length"]) for f in fields]:
-            _length = len(str(connection[field]))
+            _length = len(str(connection[field])) + 1
            if _length > length:
                length = len(str(connection[field])) + 1
@ -892,7 +1005,7 @@ def cli_connection_detail_format_pretty(CLI_CONFIG, data):
    # Parse each connection and adjust field lengths
    for connection in data:
        for field, length in [(f, fields[f]["length"]) for f in fields]:
-            _length = len(str(connection[field]))
+            _length = len(str(connection[field])) + 1
            if _length > length:
                length = len(str(connection[field])) + 1
--- a/client-cli/pvc/cli/helpers.py
+++ b/client-cli/pvc/cli/helpers.py
@ -20,25 +20,16 @@
 ###############################################################################
 from click import echo as click_echo
 from click import confirm
 from datetime import datetime
 from distutils.util import strtobool
 from getpass import getuser
 from json import load as jload
 from json import dump as jdump
-from os import chmod, environ, getpid, path, makedirs, get_terminal_size
+from os import chmod, environ, getpid, path, get_terminal_size
 from re import findall
 from socket import gethostname
 from subprocess import run, PIPE
 from sys import argv
 from syslog import syslog, openlog, closelog, LOG_AUTH
 from yaml import load as yload
 from yaml import SafeLoader
 import pvc.lib.provisioner
 import pvc.lib.vm
 import pvc.lib.node
 DEFAULT_STORE_DATA = {"cfgfile": "/etc/pvc/pvc.conf"}
 DEFAULT_STORE_FILENAME = "pvc.json"
@ -176,9 +167,17 @@ def get_store(store_path):
    with open(store_file) as fh:
        try:
            store_data = jload(fh)
            return store_data
        except Exception:
-            return dict()
+            store_data = dict()
    if path.exists(DEFAULT_STORE_DATA["cfgfile"]):
        if store_data.get("local", None) != DEFAULT_STORE_DATA:
            del store_data["local"]
        if "local" not in store_data.keys():
            store_data["local"] = DEFAULT_STORE_DATA
            update_store(store_path, store_data)
    return store_data
 def update_store(store_path, store_data):
@ -195,322 +194,3 @@ def update_store(store_path, store_data):
    with open(store_file, "w") as fh:
        jdump(store_data, fh, sort_keys=True, indent=4)
 def get_autobackup_config(CLI_CONFIG, cfgfile):
    try:
        config = dict()
        with open(cfgfile) as fh:
            backup_config = yload(fh, Loader=SafeLoader)["autobackup"]
        config["backup_root_path"] = backup_config["backup_root_path"]
        config["backup_root_suffix"] = backup_config["backup_root_suffix"]
        config["backup_tags"] = backup_config["backup_tags"]
        config["backup_schedule"] = backup_config["backup_schedule"]
        config["auto_mount_enabled"] = backup_config["auto_mount"]["enabled"]
        if config["auto_mount_enabled"]:
            config["mount_cmds"] = list()
            _mount_cmds = backup_config["auto_mount"]["mount_cmds"]
            for _mount_cmd in _mount_cmds:
                if "{backup_root_path}" in _mount_cmd:
                    _mount_cmd = _mount_cmd.format(
                        backup_root_path=backup_config["backup_root_path"]
                    )
                config["mount_cmds"].append(_mount_cmd)
            config["unmount_cmds"] = list()
            _unmount_cmds = backup_config["auto_mount"]["unmount_cmds"]
            for _unmount_cmd in _unmount_cmds:
                if "{backup_root_path}" in _unmount_cmd:
                    _unmount_cmd = _unmount_cmd.format(
                        backup_root_path=backup_config["backup_root_path"]
                    )
                config["unmount_cmds"].append(_unmount_cmd)
    except FileNotFoundError:
        echo(CLI_CONFIG, "ERROR: Specified backup configuration does not exist!")
        exit(1)
    except KeyError as e:
        echo(CLI_CONFIG, f"ERROR: Backup configuration is invalid: {e}")
        exit(1)
    return config
 def vm_autobackup(
    CLI_CONFIG,
    autobackup_cfgfile=DEFAULT_AUTOBACKUP_FILENAME,
    force_full_flag=False,
    cron_flag=False,
 ):
    """
    Perform automatic backups of VMs based on an external config file.
    """
    # Validate that we are running on the current primary coordinator of the 'local' cluster connection
    real_connection = CLI_CONFIG["connection"]
    CLI_CONFIG["connection"] = "local"
    retcode, retdata = pvc.lib.node.node_info(CLI_CONFIG, DEFAULT_NODE_HOSTNAME)
    if not retcode or retdata.get("coordinator_state") != "primary":
        if cron_flag:
            echo(
                CLI_CONFIG,
                "Current host is not the primary coordinator of the local cluster and running in cron mode. Exiting cleanly.",
            )
            exit(0)
        else:
            echo(
                CLI_CONFIG,
                f"ERROR: Current host is not the primary coordinator of the local cluster; got connection '{real_connection}', host '{DEFAULT_NODE_HOSTNAME}'.",
            )
            echo(
                CLI_CONFIG,
                "Autobackup MUST be run from the cluster active primary coordinator using the 'local' connection. See '-h'/'--help' for details.",
            )
            exit(1)
    # Ensure we're running as root, or show a warning & confirmation
    if getuser() != "root":
        confirm(
            "WARNING: You are not running this command as 'root'. This command should be run under the same user as the API daemon, which is usually 'root'. Are you sure you want to continue?",
            prompt_suffix=" ",
            abort=True,
        )
    # Load our YAML config
    autobackup_config = get_autobackup_config(CLI_CONFIG, autobackup_cfgfile)
    # Get a list of all VMs on the cluster
    # We don't do tag filtering here, because we could match an arbitrary number of tags; instead, we
    # parse the list after
    retcode, retdata = pvc.lib.vm.vm_list(CLI_CONFIG, None, None, None, None, None)
    if not retcode:
        echo(CLI_CONFIG, f"ERROR: Failed to fetch VM list: {retdata}")
        exit(1)
    cluster_vms = retdata
    # Parse the list to match tags; too complex for list comprehension alas
    backup_vms = list()
    for vm in cluster_vms:
        vm_tag_names = [t["name"] for t in vm["tags"]]
        matching_tags = (
            True
            if len(
                set(vm_tag_names).intersection(set(autobackup_config["backup_tags"]))
            )
            > 0
            else False
        )
        if matching_tags:
            backup_vms.append(vm["name"])
    if len(backup_vms) < 1:
        echo(CLI_CONFIG, "Found no suitable VMs for autobackup.")
        exit(0)
    # Pretty print the names of the VMs we'll back up (to stderr)
    maxnamelen = max([len(n) for n in backup_vms]) + 2
    cols = 1
    while (cols * maxnamelen + maxnamelen + 2) <= MAX_CONTENT_WIDTH:
        cols += 1
    rows = len(backup_vms) // cols
    vm_list_rows = list()
    for row in range(0, rows + 1):
        row_start = row * cols
        row_end = (row * cols) + cols
        row_str = ""
        for x in range(row_start, row_end):
            if x < len(backup_vms):
                row_str += "{:<{}}".format(backup_vms[x], maxnamelen)
        vm_list_rows.append(row_str)
    echo(CLI_CONFIG, f"Found {len(backup_vms)} suitable VM(s) for autobackup.")
    echo(CLI_CONFIG, "Full VM list:", stderr=True)
    echo(CLI_CONFIG, "  {}".format("\n  ".join(vm_list_rows)), stderr=True)
    echo(CLI_CONFIG, "", stderr=True)
    if autobackup_config["auto_mount_enabled"]:
        # Execute each mount_cmds command in sequence
        for cmd in autobackup_config["mount_cmds"]:
            echo(
                CLI_CONFIG,
                f"Executing mount command '{cmd.split()[0]}'... ",
                newline=False,
            )
            tstart = datetime.now()
            ret = run(
                cmd.split(),
                stdout=PIPE,
                stderr=PIPE,
            )
            tend = datetime.now()
            ttot = tend - tstart
            if ret.returncode != 0:
                echo(
                    CLI_CONFIG,
                    f"failed. [{ttot.seconds}s]",
                )
                echo(
                    CLI_CONFIG,
                    f"Exiting; command reports: {ret.stderr.decode().strip()}",
                )
                exit(1)
            else:
                echo(CLI_CONFIG, f"done. [{ttot.seconds}s]")
    # For each VM, perform the backup
    for vm in backup_vms:
        backup_suffixed_path = f"{autobackup_config['backup_root_path']}{autobackup_config['backup_root_suffix']}"
        if not path.exists(backup_suffixed_path):
            makedirs(backup_suffixed_path)
        backup_path = f"{backup_suffixed_path}/{vm}"
        autobackup_state_file = f"{backup_path}/.autobackup.json"
        if not path.exists(backup_path) or not path.exists(autobackup_state_file):
            # There are no new backups so the list is empty
            state_data = dict()
            tracked_backups = list()
        else:
            with open(autobackup_state_file) as fh:
                state_data = jload(fh)
            tracked_backups = state_data["tracked_backups"]
        full_interval = autobackup_config["backup_schedule"]["full_interval"]
        full_retention = autobackup_config["backup_schedule"]["full_retention"]
        full_backups = [b for b in tracked_backups if b["type"] == "full"]
        if len(full_backups) > 0:
            last_full_backup = full_backups[0]
            last_full_backup_idx = tracked_backups.index(last_full_backup)
            if force_full_flag:
                this_backup_type = "forced-full"
                this_backup_incremental_parent = None
                this_backup_retain_snapshot = True
            elif last_full_backup_idx >= full_interval - 1:
                this_backup_type = "full"
                this_backup_incremental_parent = None
                this_backup_retain_snapshot = True
            else:
                this_backup_type = "incremental"
                this_backup_incremental_parent = last_full_backup["datestring"]
                this_backup_retain_snapshot = False
        else:
            # The very first backup must be full to start the tree
            this_backup_type = "full"
            this_backup_incremental_parent = None
            this_backup_retain_snapshot = True
        # Perform the backup
        echo(
            CLI_CONFIG,
            f"Backing up VM '{vm}' ({this_backup_type})... ",
            newline=False,
        )
        tstart = datetime.now()
        retcode, retdata = pvc.lib.vm.vm_backup(
            CLI_CONFIG,
            vm,
            backup_suffixed_path,
            incremental_parent=this_backup_incremental_parent,
            retain_snapshot=this_backup_retain_snapshot,
        )
        tend = datetime.now()
        ttot = tend - tstart
        if not retcode:
            echo(CLI_CONFIG, f"failed. [{ttot.seconds}s]")
            echo(CLI_CONFIG, f"Skipping cleanups; command reports: {retdata}")
            continue
        else:
            backup_datestring = findall(r"[0-9]{14}", retdata)[0]
            echo(
                CLI_CONFIG,
                f"done. Backup '{backup_datestring}' created. [{ttot.seconds}s]",
            )
        # Read backup file to get details
        backup_json_file = f"{backup_path}/{backup_datestring}/pvcbackup.json"
        with open(backup_json_file) as fh:
            backup_json = jload(fh)
        backup = {
            "datestring": backup_json["datestring"],
            "type": backup_json["type"],
            "parent": backup_json["incremental_parent"],
            "retained_snapshot": backup_json["retained_snapshot"],
        }
        tracked_backups.insert(0, backup)
        # Delete any full backups that are expired
        marked_for_deletion = list()
        found_full_count = 0
        for backup in tracked_backups:
            if backup["type"] == "full":
                found_full_count += 1
                if found_full_count > full_retention:
                    marked_for_deletion.append(backup)
        # Depete any incremental backups that depend on marked parents
        for backup in tracked_backups:
            if backup["type"] == "incremental" and backup["parent"] in [
                b["datestring"] for b in marked_for_deletion
            ]:
                marked_for_deletion.append(backup)
        # Execute deletes
        for backup_to_delete in marked_for_deletion:
            echo(
                CLI_CONFIG,
                f"Removing old VM '{vm}' backup '{backup_to_delete['datestring']}' ({backup_to_delete['type']})... ",
                newline=False,
            )
            tstart = datetime.now()
            retcode, retdata = pvc.lib.vm.vm_remove_backup(
                CLI_CONFIG,
                vm,
                backup_suffixed_path,
                backup_to_delete["datestring"],
            )
            tend = datetime.now()
            ttot = tend - tstart
            if not retcode:
                echo(CLI_CONFIG, f"failed. [{ttot.seconds}s]")
                echo(
                    CLI_CONFIG,
                    f"Skipping removal from tracked backups; command reports: {retdata}",
                )
                continue
            else:
                tracked_backups.remove(backup_to_delete)
                echo(CLI_CONFIG, f"done. [{ttot.seconds}s]")
        # Update tracked state information
        state_data["tracked_backups"] = tracked_backups
        with open(autobackup_state_file, "w") as fh:
            jdump(state_data, fh)
    if autobackup_config["auto_mount_enabled"]:
        # Execute each unmount_cmds command in sequence
        for cmd in autobackup_config["unmount_cmds"]:
            echo(
                CLI_CONFIG,
                f"Executing unmount command '{cmd.split()[0]}'... ",
                newline=False,
            )
            tstart = datetime.now()
            ret = run(
                cmd.split(),
                stdout=PIPE,
                stderr=PIPE,
            )
            tend = datetime.now()
            ttot = tend - tstart
            if ret.returncode != 0:
                echo(
                    CLI_CONFIG,
                    f"failed. [{ttot.seconds}s]",
                )
                echo(
                    CLI_CONFIG,
                    f"Continuing; command reports: {ret.stderr.decode().strip()}",
                )
            else:
                echo(CLI_CONFIG, f"done. [{ttot.seconds}s]")
--- a/client-cli/pvc/cli/parsers.py
+++ b/client-cli/pvc/cli/parsers.py
@ -68,7 +68,8 @@ def cli_connection_list_parser(connections_config, show_keys_flag):
                }
            )
-    return connections_data
+    # Return, ensuring local is always first
    return sorted(connections_data, key=lambda x: (x.get("name") != "local"))
 def cli_connection_detail_parser(connections_config):
@ -121,4 +122,5 @@ def cli_connection_detail_parser(connections_config):
                }
            )
-    return connections_data
+    # Return, ensuring local is always first
    return sorted(connections_data, key=lambda x: (x.get("name") != "local"))
--- a/client-cli/pvc/cli/waiters.py
+++ b/client-cli/pvc/cli/waiters.py
@ -19,6 +19,8 @@
 #
 ###############################################################################
 import sys
 from click import progressbar
 from time import sleep, time
@ -105,7 +107,7 @@ def wait_for_celery_task(CLI_CONFIG, task_detail, start_late=False):
    # Start following the task state, updating progress as we go
    total_task = task_status.get("total")
-    with progressbar(length=total_task, show_eta=False) as bar:
+    with progressbar(length=total_task, width=20, show_eta=False) as bar:
        last_task = 0
        maxlen = 21
        echo(
@ -115,26 +117,39 @@ def wait_for_celery_task(CLI_CONFIG, task_detail, start_late=False):
        )
        while True:
            sleep(0.5)
            task_status = pvc.lib.common.task_status(
                CLI_CONFIG, task_id=task_id, is_watching=True
            )
            if isinstance(task_status, tuple):
                continue
            if task_status.get("state") != "RUNNING":
                break
-            if task_status.get("current") > last_task:
+            if task_status.get("current") == 0:
                continue
            current_task = int(task_status.get("current"))
            total_task = int(task_status.get("total"))
            bar.length = total_task
            if current_task > last_task:
                bar.update(current_task - last_task)
                last_task = current_task
-                # The extensive spaces at the end cause this to overwrite longer previous messages
+
            curlen = len(str(task_status.get("status")))
            if curlen > maxlen:
                maxlen = curlen
            lendiff = maxlen - curlen
            overwrite_whitespace = " " * lendiff
-                echo(
+
-                    CLI_CONFIG,
+            percent_complete = (current_task / total_task) * 100
-                    "  " + task_status.get("status") + overwrite_whitespace,
+            bar_output = f"[{bar.format_bar()}]  {percent_complete:3.0f}%"
-                    newline=False,
+            sys.stdout.write(
-                )
+                f"\r  {bar_output}  {task_status['status']}{overwrite_whitespace}"
            task_status = pvc.lib.common.task_status(
                CLI_CONFIG, task_id=task_id, is_watching=True
            )
            sys.stdout.flush()
        if task_status.get("state") == "SUCCESS":
            bar.update(total_task - last_task)
--- a/client-cli/pvc/lib/cluster.py
+++ b/client-cli/pvc/lib/cluster.py
@ -21,6 +21,8 @@
 import json
 from time import sleep
 from pvc.lib.common import call_api
@ -114,3 +116,22 @@ def get_info(config):
        return True, response.json()
    else:
        return False, response.json().get("message", "")
 def get_primary_node(config):
    """
    Get the current primary node of the PVC cluster
    API endpoint: GET /api/v1/status/primary_node
    API arguments:
    API schema: {json_data_object}
    """
    while True:
        response = call_api(config, "get", "/status/primary_node")
        resp_code = response.status_code
        if resp_code == 200:
            break
        else:
            sleep(1)
    return True, response.json()["primary_node"]
--- a/client-cli/pvc/lib/common.py
+++ b/client-cli/pvc/lib/common.py
@ -83,7 +83,7 @@ class UploadProgressBar(object):
        else:
            self.end_suffix = ""
-        self.bar = click.progressbar(length=self.length, show_eta=True)
+        self.bar = click.progressbar(length=self.length, width=20, show_eta=True)
    def update(self, monitor):
        bytes_cur = monitor.bytes_read
@ -108,9 +108,10 @@ class UploadProgressBar(object):
 class ErrorResponse(requests.Response):
-    def __init__(self, json_data, status_code):
+    def __init__(self, json_data, status_code, headers):
        self.json_data = json_data
        self.status_code = status_code
        self.headers = headers
    def json(self):
        return self.json_data
@ -140,7 +141,12 @@ def call_api(
    # Determine the request type and hit the API
    disable_warnings()
    try:
        response = None
        if operation == "get":
            retry_on_code = [429, 500, 502, 503, 504]
            for i in range(3):
                failed = False
                try:
                    response = requests.get(
                        uri,
                        timeout=timeout,
@ -149,6 +155,18 @@ def call_api(
                        data=data,
                        verify=config["verify_ssl"],
                    )
                    if response.status_code in retry_on_code:
                        failed = True
                        continue
                    break
                except requests.exceptions.ConnectionError:
                    failed = True
                    continue
            if failed:
                error = f"Code {response.status_code}" if response else "Timeout"
                raise requests.exceptions.ConnectionError(
                    f"Failed to connect after 3 tries ({error})"
                )
        if operation == "post":
            response = requests.post(
                uri,
@ -189,7 +207,8 @@ def call_api(
            )
    except Exception as e:
        message = "Failed to connect to the API: {}".format(e)
-        response = ErrorResponse({"message": message}, 500)
+        code = response.status_code if response else 504
        response = ErrorResponse({"message": message}, code, None)
    # Display debug output
    if config["debug"]:
--- a/client-cli/pvc/lib/provisioner.py
+++ b/client-cli/pvc/lib/provisioner.py
@ -779,7 +779,8 @@ def format_list_template_system(template_data):
    template_node_limit_length = 6
    template_node_selector_length = 9
    template_node_autostart_length = 10
-    template_migration_method_length = 10
+    template_migration_method_length = 12
    template_migration_max_downtime_length = 13
    for template in template_data:
        # template_name column
@ -826,6 +827,17 @@ def format_list_template_system(template_data):
        _template_migration_method_length = len(str(template["migration_method"])) + 1
        if _template_migration_method_length > template_migration_method_length:
            template_migration_method_length = _template_migration_method_length
        # template_migration_max_downtime column
        _template_migration_max_downtime_length = (
            len(str(template["migration_max_downtime"])) + 1
        )
        if (
            _template_migration_max_downtime_length
            > template_migration_max_downtime_length
        ):
            template_migration_max_downtime_length = (
                _template_migration_max_downtime_length
            )
    # Format the string (header)
    template_list_output.append(
@ -842,7 +854,8 @@ def format_list_template_system(template_data):
            + template_node_selector_length
            + template_node_autostart_length
            + template_migration_method_length
-            + 3,
+            + template_migration_max_downtime_length
            + 4,
            template_header="System Templates "
            + "".join(
                ["-" for _ in range(17, template_name_length + template_id_length)]
@ -874,7 +887,8 @@ def format_list_template_system(template_data):
                        + template_node_selector_length
                        + template_node_autostart_length
                        + template_migration_method_length
-                        + 2,
+                        + template_migration_max_downtime_length
                        + 3,
                    )
                ]
            ),
@ -891,7 +905,8 @@ def format_list_template_system(template_data):
 {template_node_limit: <{template_node_limit_length}} \
 {template_node_selector: <{template_node_selector_length}} \
 {template_node_autostart: <{template_node_autostart_length}} \
-{template_migration_method: <{template_migration_method_length}}{end_bold}".format(
+{template_migration_method: <{template_migration_method_length}} \
 {template_migration_max_downtime: <{template_migration_max_downtime_length}}{end_bold}".format(
            bold=ansiprint.bold(),
            end_bold=ansiprint.end(),
            template_name_length=template_name_length,
@ -905,6 +920,7 @@ def format_list_template_system(template_data):
            template_node_selector_length=template_node_selector_length,
            template_node_autostart_length=template_node_autostart_length,
            template_migration_method_length=template_migration_method_length,
            template_migration_max_downtime_length=template_migration_max_downtime_length,
            template_name="Name",
            template_id="ID",
            template_vcpu="vCPUs",
@ -915,7 +931,8 @@ def format_list_template_system(template_data):
            template_node_limit="Limit",
            template_node_selector="Selector",
            template_node_autostart="Autostart",
-            template_migration_method="Migration",
+            template_migration_method="Mig. Method",
            template_migration_max_downtime="Max Downtime",
        )
    )
@ -931,7 +948,8 @@ def format_list_template_system(template_data):
 {template_node_limit: <{template_node_limit_length}} \
 {template_node_selector: <{template_node_selector_length}} \
 {template_node_autostart: <{template_node_autostart_length}} \
-{template_migration_method: <{template_migration_method_length}}{end_bold}".format(
+{template_migration_method: <{template_migration_method_length}} \
 {template_migration_max_downtime: <{template_migration_max_downtime_length}}{end_bold}".format(
                template_name_length=template_name_length,
                template_id_length=template_id_length,
                template_vcpu_length=template_vcpu_length,
@ -943,6 +961,7 @@ def format_list_template_system(template_data):
                template_node_selector_length=template_node_selector_length,
                template_node_autostart_length=template_node_autostart_length,
                template_migration_method_length=template_migration_method_length,
                template_migration_max_downtime_length=template_migration_max_downtime_length,
                bold="",
                end_bold="",
                template_name=str(template["name"]),
@ -956,6 +975,7 @@ def format_list_template_system(template_data):
                template_node_selector=str(template["node_selector"]),
                template_node_autostart=str(template["node_autostart"]),
                template_migration_method=str(template["migration_method"]),
                template_migration_max_downtime=f"{str(template['migration_max_downtime'])} ms",
            )
        )
--- a/client-cli/pvc/lib/storage.py
+++ b/client-cli/pvc/lib/storage.py
@ -30,6 +30,7 @@ from requests_toolbelt.multipart.encoder import (
 import pvc.lib.ansiprint as ansiprint
 from pvc.lib.common import UploadProgressBar, call_api, get_wait_retdata
 from pvc.cli.helpers import MAX_CONTENT_WIDTH
 #
 # Supplemental functions
@ -430,7 +431,9 @@ def format_list_osd(config, osd_list):
            )
            continue
-        if osd_information["is_split"]:
+        if osd_information.get("is_split") is not None and osd_information.get(
            "is_split"
        ):
            osd_information["device"] = f"{osd_information['device']} [s]"
        # Deal with the size to human readable
@ -1172,15 +1175,15 @@ def ceph_volume_list(config, limit, pool):
        return False, response.json().get("message", "")
-def ceph_volume_add(config, pool, volume, size):
+def ceph_volume_add(config, pool, volume, size, force_flag=False):
    """
    Add new Ceph volume
    API endpoint: POST /api/v1/storage/ceph/volume
-    API arguments: volume={volume}, pool={pool}, size={size}
+    API arguments: volume={volume}, pool={pool}, size={size}, force={force_flag}
    API schema: {"message":"{data}"}
    """
-    params = {"volume": volume, "pool": pool, "size": size}
+    params = {"volume": volume, "pool": pool, "size": size, "force": force_flag}
    response = call_api(config, "post", "/storage/ceph/volume", params=params)
    if response.status_code == 200:
@ -1261,12 +1264,14 @@ def ceph_volume_remove(config, pool, volume):
    return retstatus, response.json().get("message", "")
-def ceph_volume_modify(config, pool, volume, new_name=None, new_size=None):
+def ceph_volume_modify(
    config, pool, volume, new_name=None, new_size=None, force_flag=False
 ):
    """
    Modify Ceph volume
    API endpoint: PUT /api/v1/storage/ceph/volume/{pool}/{volume}
-    API arguments:
+    API arguments: [new_name={new_name}], [new_size={new_size}], force_flag={force_flag}
    API schema: {"message":"{data}"}
    """
@ -1275,6 +1280,7 @@ def ceph_volume_modify(config, pool, volume, new_name=None, new_size=None):
        params["new_name"] = new_name
    if new_size:
        params["new_size"] = new_size
        params["force"] = force_flag
    response = call_api(
        config,
@ -1291,15 +1297,15 @@ def ceph_volume_modify(config, pool, volume, new_name=None, new_size=None):
    return retstatus, response.json().get("message", "")
-def ceph_volume_clone(config, pool, volume, new_volume):
+def ceph_volume_clone(config, pool, volume, new_volume, force_flag=False):
    """
    Clone Ceph volume
    API endpoint: POST /api/v1/storage/ceph/volume/{pool}/{volume}
-    API arguments: new_volume={new_volume
+    API arguments: new_volume={new_volume, force_flag={force_flag}
    API schema: {"message":"{data}"}
    """
-    params = {"new_volume": new_volume}
+    params = {"new_volume": new_volume, "force_flag": force_flag}
    response = call_api(
        config,
        "post",
@ -1539,6 +1545,30 @@ def ceph_snapshot_add(config, pool, volume, snapshot):
    return retstatus, response.json().get("message", "")
 def ceph_snapshot_rollback(config, pool, volume, snapshot):
    """
    Roll back Ceph volume to snapshot
    API endpoint: POST /api/v1/storage/ceph/snapshot/{pool}/{volume}/{snapshot}/rollback
    API arguments:
    API schema: {"message":"{data}"}
    """
    response = call_api(
        config,
        "post",
        "/storage/ceph/snapshot/{pool}/{volume}/{snapshot}/rollback".format(
            snapshot=snapshot, volume=volume, pool=pool
        ),
    )
    if response.status_code == 200:
        retstatus = True
    else:
        retstatus = False
    return retstatus, response.json().get("message", "")
 def ceph_snapshot_remove(config, pool, volume, snapshot):
    """
    Remove Ceph snapshot
@ -1695,15 +1725,17 @@ def format_list_snapshot(config, snapshot_list):
 #
 # Benchmark functions
 #
-def ceph_benchmark_run(config, pool, wait_flag):
+def ceph_benchmark_run(config, pool, name, wait_flag):
    """
    Run a storage benchmark against {pool}
    API endpoint: POST /api/v1/storage/ceph/benchmark
-    API arguments: pool={pool}
+    API arguments: pool={pool}, name={name}
    API schema: {message}
    """
    params = {"pool": pool}
    if name:
        params["name"] = name
    response = call_api(config, "post", "/storage/ceph/benchmark", params=params)
    return get_wait_retdata(response, wait_flag)
@ -1775,7 +1807,7 @@ def get_benchmark_list_results(benchmark_format, benchmark_data):
        benchmark_bandwidth, benchmark_iops = get_benchmark_list_results_legacy(
            benchmark_data
        )
-    elif benchmark_format == 1:
+    elif benchmark_format == 1 or benchmark_format == 2:
        benchmark_bandwidth, benchmark_iops = get_benchmark_list_results_json(
            benchmark_data
        )
@ -1977,6 +2009,7 @@ def format_info_benchmark(config, benchmark_information):
    benchmark_matrix = {
        0: format_info_benchmark_legacy,
        1: format_info_benchmark_json,
        2: format_info_benchmark_json,
    }
    benchmark_version = benchmark_information[0]["test_format"]
@ -2311,12 +2344,15 @@ def format_info_benchmark_json(config, benchmark_information):
    if benchmark_information["benchmark_result"] == "Running":
        return "Benchmark test is still running."
    benchmark_format = benchmark_information["test_format"]
    benchmark_details = benchmark_information["benchmark_result"]
    # Format a nice output; do this line-by-line then concat the elements at the end
    ainformation = []
    ainformation.append(
-        "{}Storage Benchmark details:{}".format(ansiprint.bold(), ansiprint.end())
+        "{}Storage Benchmark details (format {}):{}".format(
            ansiprint.bold(), benchmark_format, ansiprint.end()
        )
    )
    nice_test_name_map = {
@ -2364,7 +2400,7 @@ def format_info_benchmark_json(config, benchmark_information):
            if element[1] != 0:
                useful_latency_tree.append(element)
-        max_rows = 9
+        max_rows = 5
        if len(useful_latency_tree) > 9:
            max_rows = len(useful_latency_tree)
        elif len(useful_latency_tree) < 9:
@ -2373,15 +2409,10 @@ def format_info_benchmark_json(config, benchmark_information):
        # Format the static data
        overall_label = [
-            "Overall BW/s:",
+            "BW/s:",
-            "Overall IOPS:",
+            "IOPS:",
-            "Total I/O:",
+            "I/O:",
-            "Runtime (s):",
+            "Time:",
            "User CPU %:",
            "System CPU %:",
            "Ctx Switches:",
            "Major Faults:",
            "Minor Faults:",
        ]
        while len(overall_label) < max_rows:
            overall_label.append("")
@ -2390,68 +2421,149 @@ def format_info_benchmark_json(config, benchmark_information):
            format_bytes_tohuman(int(job_details[io_class]["bw_bytes"])),
            format_ops_tohuman(int(job_details[io_class]["iops"])),
            format_bytes_tohuman(int(job_details[io_class]["io_bytes"])),
-            job_details["job_runtime"] / 1000,
+            str(job_details["job_runtime"] / 1000) + "s",
            job_details["usr_cpu"],
            job_details["sys_cpu"],
            job_details["ctx"],
            job_details["majf"],
            job_details["minf"],
        ]
        while len(overall_data) < max_rows:
            overall_data.append("")
        cpu_label = [
            "Total:",
            "User:",
            "Sys:",
            "OSD:",
            "MON:",
        ]
        while len(cpu_label) < max_rows:
            cpu_label.append("")
        cpu_data = [
            (
                benchmark_details[test]["avg_cpu_util_percent"]["total"]
                if benchmark_format > 1
                else "N/A"
            ),
            round(job_details["usr_cpu"], 2),
            round(job_details["sys_cpu"], 2),
            (
                benchmark_details[test]["avg_cpu_util_percent"]["ceph-osd"]
                if benchmark_format > 1
                else "N/A"
            ),
            (
                benchmark_details[test]["avg_cpu_util_percent"]["ceph-mon"]
                if benchmark_format > 1
                else "N/A"
            ),
        ]
        while len(cpu_data) < max_rows:
            cpu_data.append("")
        memory_label = [
            "Total:",
            "OSD:",
            "MON:",
        ]
        while len(memory_label) < max_rows:
            memory_label.append("")
        memory_data = [
            (
                benchmark_details[test]["avg_memory_util_percent"]["total"]
                if benchmark_format > 1
                else "N/A"
            ),
            (
                benchmark_details[test]["avg_memory_util_percent"]["ceph-osd"]
                if benchmark_format > 1
                else "N/A"
            ),
            (
                benchmark_details[test]["avg_memory_util_percent"]["ceph-mon"]
                if benchmark_format > 1
                else "N/A"
            ),
        ]
        while len(memory_data) < max_rows:
            memory_data.append("")
        network_label = [
            "Total:",
            "Sent:",
            "Recv:",
        ]
        while len(network_label) < max_rows:
            network_label.append("")
        network_data = [
            (
                format_bytes_tohuman(
                    int(benchmark_details[test]["avg_network_util_bps"]["total"])
                )
                if benchmark_format > 1
                else "N/A"
            ),
            (
                format_bytes_tohuman(
                    int(benchmark_details[test]["avg_network_util_bps"]["sent"])
                )
                if benchmark_format > 1
                else "N/A"
            ),
            (
                format_bytes_tohuman(
                    int(benchmark_details[test]["avg_network_util_bps"]["recv"])
                )
                if benchmark_format > 1
                else "N/A"
            ),
        ]
        while len(network_data) < max_rows:
            network_data.append("")
        bandwidth_label = [
            "Min:",
            "Max:",
            "Mean:",
            "StdDev:",
            "Samples:",
            "",
            "",
            "",
            "",
        ]
        while len(bandwidth_label) < max_rows:
            bandwidth_label.append("")
        bandwidth_data = [
-            format_bytes_tohuman(int(job_details[io_class]["bw_min"]) * 1024),
+            format_bytes_tohuman(int(job_details[io_class]["bw_min"]) * 1024)
-            format_bytes_tohuman(int(job_details[io_class]["bw_max"]) * 1024),
+            + " / "
-            format_bytes_tohuman(int(job_details[io_class]["bw_mean"]) * 1024),
+            + format_ops_tohuman(int(job_details[io_class]["iops_min"])),
-            format_bytes_tohuman(int(job_details[io_class]["bw_dev"]) * 1024),
+            format_bytes_tohuman(int(job_details[io_class]["bw_max"]) * 1024)
-            job_details[io_class]["bw_samples"],
+            + " / "
-            "",
+            + format_ops_tohuman(int(job_details[io_class]["iops_max"])),
-            "",
+            format_bytes_tohuman(int(job_details[io_class]["bw_mean"]) * 1024)
-            "",
+            + " / "
-            "",
+            + format_ops_tohuman(int(job_details[io_class]["iops_mean"])),
            format_bytes_tohuman(int(job_details[io_class]["bw_dev"]) * 1024)
            + " / "
            + format_ops_tohuman(int(job_details[io_class]["iops_stddev"])),
            str(job_details[io_class]["bw_samples"])
            + " / "
            + str(job_details[io_class]["iops_samples"]),
        ]
        while len(bandwidth_data) < max_rows:
            bandwidth_data.append("")
-        iops_data = [
+        lat_label = [
-            format_ops_tohuman(int(job_details[io_class]["iops_min"])),
+            "Min:",
-            format_ops_tohuman(int(job_details[io_class]["iops_max"])),
+            "Max:",
-            format_ops_tohuman(int(job_details[io_class]["iops_mean"])),
+            "Mean:",
-            format_ops_tohuman(int(job_details[io_class]["iops_stddev"])),
+            "StdDev:",
            job_details[io_class]["iops_samples"],
            "",
            "",
            "",
            "",
        ]
-        while len(iops_data) < max_rows:
+        while len(lat_label) < max_rows:
-            iops_data.append("")
+            lat_label.append("")
        lat_data = [
            int(job_details[io_class]["lat_ns"]["min"]) / 1000,
            int(job_details[io_class]["lat_ns"]["max"]) / 1000,
            int(job_details[io_class]["lat_ns"]["mean"]) / 1000,
            int(job_details[io_class]["lat_ns"]["stddev"]) / 1000,
            "",
            "",
            "",
            "",
            "",
        ]
        while len(lat_data) < max_rows:
            lat_data.append("")
@ -2460,98 +2572,119 @@ def format_info_benchmark_json(config, benchmark_information):
        lat_bucket_label = list()
        lat_bucket_data = list()
        for element in useful_latency_tree:
-            lat_bucket_label.append(element[0])
+            lat_bucket_label.append(element[0] + ":" if element[0] else "")
-            lat_bucket_data.append(element[1])
+            lat_bucket_data.append(round(float(element[1]), 2) if element[1] else "")
        while len(lat_bucket_label) < max_rows:
            lat_bucket_label.append("")
        while len(lat_bucket_data) < max_rows:
            lat_bucket_label.append("")
        # Column default widths
-        overall_label_length = 0
+        overall_label_length = 5
        overall_column_length = 0
-        bandwidth_label_length = 0
+        cpu_label_length = 6
-        bandwidth_column_length = 11
+        cpu_column_length = 0
-        iops_column_length = 4
+        memory_label_length = 6
-        latency_column_length = 12
+        memory_column_length = 0
        network_label_length = 6
        network_column_length = 6
        bandwidth_label_length = 8
        bandwidth_column_length = 0
        latency_label_length = 7
        latency_column_length = 0
        latency_bucket_label_length = 0
        latency_bucket_column_length = 0
        # Column layout:
-        #    General    Bandwidth   IOPS      Latency   Percentiles
+        #    Overall    CPU   Memory  Network  Bandwidth/IOPS  Latency   Percentiles
-        #    ---------  ----------  --------  --------  ---------------
+        #    ---------  ----- ------- -------- --------------  --------  ---------------
-        #    Size       Min         Min       Min       A
+        #    BW         Total Total   Total    Min             Min       A
-        #    BW         Max         Max       Max       B
+        #    IOPS       Usr   OSD     Send     Max             Max       B
-        #    IOPS       Mean        Mean      Mean      ...
+        #    Time       Sys   MON     Recv     Mean            Mean      ...
-        #    Runtime    StdDev      StdDev    StdDev    Z
+        #    Size       OSD                    StdDev          StdDev    Z
-        #    UsrCPU     Samples     Samples
+        #               MON                    Samples
        #    SysCPU
        #    CtxSw
        #    MajFault
        #    MinFault
        # Set column widths
        for item in overall_label:
            _item_length = len(str(item))
            if _item_length > overall_label_length:
                overall_label_length = _item_length
        for item in overall_data:
            _item_length = len(str(item))
            if _item_length > overall_column_length:
                overall_column_length = _item_length
-        test_name_length = len(nice_test_name_map[test])
+        for item in cpu_data:
        if test_name_length > overall_label_length + overall_column_length:
            _diff = test_name_length - (overall_label_length + overall_column_length)
            overall_column_length += _diff
        for item in bandwidth_label:
            _item_length = len(str(item))
-            if _item_length > bandwidth_label_length:
+            if _item_length > cpu_column_length:
-                bandwidth_label_length = _item_length
+                cpu_column_length = _item_length
        for item in memory_data:
            _item_length = len(str(item))
            if _item_length > memory_column_length:
                memory_column_length = _item_length
        for item in network_data:
            _item_length = len(str(item))
            if _item_length > network_column_length:
                network_column_length = _item_length
        for item in bandwidth_data:
            _item_length = len(str(item))
            if _item_length > bandwidth_column_length:
                bandwidth_column_length = _item_length
        for item in iops_data:
            _item_length = len(str(item))
            if _item_length > iops_column_length:
                iops_column_length = _item_length
        for item in lat_data:
            _item_length = len(str(item))
            if _item_length > latency_column_length:
                latency_column_length = _item_length
-        for item in lat_bucket_label:
+        for item in lat_bucket_data:
            _item_length = len(str(item))
-            if _item_length > latency_bucket_label_length:
+            if _item_length > latency_bucket_column_length:
-                latency_bucket_label_length = _item_length
+                latency_bucket_column_length = _item_length
        # Top row (Headers)
        ainformation.append(
-            "{bold}\
+            "{bold}{overall_label: <{overall_label_length}} {header_fill}{end_bold}".format(
 {overall_label: <{overall_label_length}}    \
 {bandwidth_label: <{bandwidth_label_length}} \
 {bandwidth: <{bandwidth_length}}   \
 {iops: <{iops_length}}   \
 {latency: <{latency_length}}   \
 {latency_bucket_label: <{latency_bucket_label_length}} \
 {latency_bucket} \
 {end_bold}".format(
                bold=ansiprint.bold(),
                end_bold=ansiprint.end(),
                overall_label=nice_test_name_map[test],
                overall_label_length=overall_label_length,
-                bandwidth_label="",
+                header_fill="-"
-                bandwidth_label_length=bandwidth_label_length,
+                * (
-                bandwidth="Bandwidth/s",
+                    (MAX_CONTENT_WIDTH if MAX_CONTENT_WIDTH <= 120 else 120)
-                bandwidth_length=bandwidth_column_length,
+                    - len(nice_test_name_map[test])
-                iops="IOPS",
+                    - 4
-                iops_length=iops_column_length,
+                ),
-                latency="Latency (μs)",
+            )
-                latency_length=latency_column_length,
+        )
-                latency_bucket_label="Latency Buckets (μs/%)",
+
-                latency_bucket_label_length=latency_bucket_label_length,
+        ainformation.append(
-                latency_bucket="",
+            "{bold}\
 {overall_label: <{overall_label_length}}  \
 {cpu_label: <{cpu_label_length}}  \
 {memory_label: <{memory_label_length}}  \
 {network_label: <{network_label_length}}  \
 {bandwidth_label: <{bandwidth_label_length}}  \
 {latency_label: <{latency_label_length}}  \
 {latency_bucket_label: <{latency_bucket_label_length}}\
 {end_bold}".format(
                bold=ansiprint.bold(),
                end_bold=ansiprint.end(),
                overall_label="Overall",
                overall_label_length=overall_label_length + overall_column_length + 1,
                cpu_label="CPU (%)",
                cpu_label_length=cpu_label_length + cpu_column_length + 1,
                memory_label="Memory (%)",
                memory_label_length=memory_label_length + memory_column_length + 1,
                network_label="Network (bps)",
                network_label_length=network_label_length + network_column_length + 1,
                bandwidth_label="Bandwidth / IOPS",
                bandwidth_label_length=bandwidth_label_length
                + bandwidth_column_length
                + 1,
                latency_label="Latency (μs)",
                latency_label_length=latency_label_length + latency_column_length + 1,
                latency_bucket_label="Buckets (μs/%)",
                latency_bucket_label_length=latency_bucket_label_length
                + latency_bucket_column_length,
            )
        )
@ -2559,14 +2692,20 @@ def format_info_benchmark_json(config, benchmark_information):
            # Top row (Headers)
            ainformation.append(
                "{bold}\
-{overall_label: >{overall_label_length}} \
+{overall_label: <{overall_label_length}} \
 {overall: <{overall_length}}  \
-{bandwidth_label: >{bandwidth_label_length}} \
+{cpu_label: <{cpu_label_length}} \
 {cpu: <{cpu_length}}  \
 {memory_label: <{memory_label_length}} \
 {memory: <{memory_length}}  \
 {network_label: <{network_label_length}} \
 {network: <{network_length}}  \
 {bandwidth_label: <{bandwidth_label_length}} \
 {bandwidth: <{bandwidth_length}}  \
-{iops: <{iops_length}}   \
+{latency_label: <{latency_label_length}} \
 {latency: <{latency_length}}  \
-{latency_bucket_label: >{latency_bucket_label_length}} \
+{latency_bucket_label: <{latency_bucket_label_length}} \
-{latency_bucket} \
+{latency_bucket}\
 {end_bold}".format(
                    bold="",
                    end_bold="",
@ -2574,12 +2713,24 @@ def format_info_benchmark_json(config, benchmark_information):
                    overall_label_length=overall_label_length,
                    overall=overall_data[idx],
                    overall_length=overall_column_length,
                    cpu_label=cpu_label[idx],
                    cpu_label_length=cpu_label_length,
                    cpu=cpu_data[idx],
                    cpu_length=cpu_column_length,
                    memory_label=memory_label[idx],
                    memory_label_length=memory_label_length,
                    memory=memory_data[idx],
                    memory_length=memory_column_length,
                    network_label=network_label[idx],
                    network_label_length=network_label_length,
                    network=network_data[idx],
                    network_length=network_column_length,
                    bandwidth_label=bandwidth_label[idx],
                    bandwidth_label_length=bandwidth_label_length,
                    bandwidth=bandwidth_data[idx],
                    bandwidth_length=bandwidth_column_length,
-                    iops=iops_data[idx],
+                    latency_label=lat_label[idx],
-                    iops_length=iops_column_length,
+                    latency_label_length=latency_label_length,
                    latency=lat_data[idx],
                    latency_length=latency_column_length,
                    latency_bucket_label=lat_bucket_label[idx],
@ -2588,4 +2739,4 @@ def format_info_benchmark_json(config, benchmark_information):
                )
            )
-    return "\n".join(ainformation)
+    return "\n".join(ainformation) + "\n"
--- a/client-cli/pvc/lib/vm.py
+++ b/client-cli/pvc/lib/vm.py
@ -89,6 +89,7 @@ def vm_define(
    node_selector,
    node_autostart,
    migration_method,
    migration_max_downtime,
    user_tags,
    protected_tags,
 ):
@ -96,7 +97,7 @@ def vm_define(
    Define a new VM on the cluster
    API endpoint: POST /vm
-    API arguments: xml={xml}, node={node}, limit={node_limit}, selector={node_selector}, autostart={node_autostart}, migration_method={migration_method}, user_tags={user_tags}, protected_tags={protected_tags}
+    API arguments: xml={xml}, node={node}, limit={node_limit}, selector={node_selector}, autostart={node_autostart}, migration_method={migration_method}, migration_max_downtime={migration_max_downtime}, user_tags={user_tags}, protected_tags={protected_tags}
    API schema: {"message":"{data}"}
    """
    params = {
@ -105,6 +106,7 @@ def vm_define(
        "selector": node_selector,
        "autostart": node_autostart,
        "migration_method": migration_method,
        "migration_max_downtime": migration_max_downtime,
        "user_tags": user_tags,
        "protected_tags": protected_tags,
    }
@ -205,6 +207,7 @@ def vm_metadata(
    node_selector,
    node_autostart,
    migration_method,
    migration_max_downtime,
    provisioner_profile,
 ):
    """
@ -229,6 +232,9 @@ def vm_metadata(
    if migration_method is not None:
        params["migration_method"] = migration_method
    if migration_max_downtime is not None:
        params["migration_max_downtime"] = migration_max_downtime
    if provisioner_profile is not None:
        params["profile"] = provisioner_profile
@ -377,8 +383,8 @@ def vm_state(config, vm, target_state, force=False, wait=False):
    """
    params = {
        "state": target_state,
-        "force": str(force).lower(),
+        "force": force,
-        "wait": str(wait).lower(),
+        "wait": wait,
    }
    response = call_api(config, "post", "/vm/{vm}/state".format(vm=vm), params=params)
@ -415,7 +421,7 @@ def vm_node(config, vm, target_node, action, force=False, wait=False, force_live
    return retstatus, response.json().get("message", "")
-def vm_locks(config, vm, wait_flag):
+def vm_locks(config, vm, wait_flag=True):
    """
    Flush RBD locks of (stopped) VM
@ -492,6 +498,222 @@ def vm_restore(config, vm, backup_path, backup_datestring, retain_snapshot=False
        return True, response.json().get("message", "")
 def vm_create_snapshot(config, vm, snapshot_name=None, wait_flag=True):
    """
    Take a snapshot of a VM's disks and configuration
    API endpoint: POST /vm/{vm}/snapshot
    API arguments: snapshot_name=snapshot_name
    API schema: {"message":"{data}"}
    """
    params = dict()
    if snapshot_name is not None:
        params["snapshot_name"] = snapshot_name
    response = call_api(
        config, "post", "/vm/{vm}/snapshot".format(vm=vm), params=params
    )
    return get_wait_retdata(response, wait_flag)
 def vm_remove_snapshot(config, vm, snapshot_name, wait_flag=True):
    """
    Remove a snapshot of a VM's disks and configuration
    API endpoint: DELETE /vm/{vm}/snapshot
    API arguments: snapshot_name=snapshot_name
    API schema: {"message":"{data}"}
    """
    params = {"snapshot_name": snapshot_name}
    response = call_api(
        config, "delete", "/vm/{vm}/snapshot".format(vm=vm), params=params
    )
    return get_wait_retdata(response, wait_flag)
 def vm_rollback_snapshot(config, vm, snapshot_name, wait_flag=True):
    """
    Roll back to a snapshot of a VM's disks and configuration
    API endpoint: POST /vm/{vm}/snapshot/rollback
    API arguments: snapshot_name=snapshot_name
    API schema: {"message":"{data}"}
    """
    params = {"snapshot_name": snapshot_name}
    response = call_api(
        config, "post", "/vm/{vm}/snapshot/rollback".format(vm=vm), params=params
    )
    return get_wait_retdata(response, wait_flag)
 def vm_export_snapshot(
    config, vm, snapshot_name, export_path, incremental_parent=None, wait_flag=True
 ):
    """
    Export an (existing) snapshot of a VM's disks and configuration to export_path, optionally
    incremental with incremental_parent
    API endpoint: POST /vm/{vm}/snapshot/export
    API arguments: snapshot_name=snapshot_name, export_path=export_path, incremental_parent=incremental_parent
    API schema: {"message":"{data}"}
    """
    params = {
        "snapshot_name": snapshot_name,
        "export_path": export_path,
    }
    if incremental_parent is not None:
        params["incremental_parent"] = incremental_parent
    response = call_api(
        config, "post", "/vm/{vm}/snapshot/export".format(vm=vm), params=params
    )
    return get_wait_retdata(response, wait_flag)
 def vm_import_snapshot(
    config, vm, snapshot_name, import_path, retain_snapshot=False, wait_flag=True
 ):
    """
    Import a snapshot of {vm} and its volumes from a local primary coordinator filesystem path
    API endpoint: POST /vm/{vm}/snapshot/import
    API arguments: snapshot_name={snapshot_name}, import_path={import_path}, retain_snapshot={retain_snapshot}
    API schema: {"message":"{data}"}
    """
    params = {
        "snapshot_name": snapshot_name,
        "import_path": import_path,
        "retain_snapshot": retain_snapshot,
    }
    response = call_api(
        config, "post", "/vm/{vm}/snapshot/import".format(vm=vm), params=params
    )
    return get_wait_retdata(response, wait_flag)
 def vm_send_snapshot(
    config,
    vm,
    snapshot_name,
    destination_api_uri,
    destination_api_key,
    destination_api_verify_ssl=True,
    destination_storage_pool=None,
    incremental_parent=None,
    wait_flag=True,
 ):
    """
    Send an (existing) snapshot of a VM's disks and configuration to a destination PVC cluster, optionally
    incremental with incremental_parent
    API endpoint: POST /vm/{vm}/snapshot/send
    API arguments: snapshot_name=snapshot_name, destination_api_uri=destination_api_uri, destination_api_key=destination_api_key, destination_api_verify_ssl=destination_api_verify_ssl, incremental_parent=incremental_parent, destination_storage_pool=destination_storage_pool
    API schema: {"message":"{data}"}
    """
    params = {
        "snapshot_name": snapshot_name,
        "destination_api_uri": destination_api_uri,
        "destination_api_key": destination_api_key,
        "destination_api_verify_ssl": destination_api_verify_ssl,
    }
    if destination_storage_pool is not None:
        params["destination_storage_pool"] = destination_storage_pool
    if incremental_parent is not None:
        params["incremental_parent"] = incremental_parent
    response = call_api(
        config, "post", "/vm/{vm}/snapshot/send".format(vm=vm), params=params
    )
    return get_wait_retdata(response, wait_flag)
 def vm_create_mirror(
    config,
    vm,
    destination_api_uri,
    destination_api_key,
    destination_api_verify_ssl=True,
    destination_storage_pool=None,
    wait_flag=True,
 ):
    """
    Create a new snapshot and send the snapshot to a destination PVC cluster, with automatic incremental handling
    API endpoint: POST /vm/{vm}/mirror/create
    API arguments: destination_api_uri=destination_api_uri, destination_api_key=destination_api_key, destination_api_verify_ssl=destination_api_verify_ssl, destination_storage_pool=destination_storage_pool
    API schema: {"message":"{data}"}
    """
    params = {
        "destination_api_uri": destination_api_uri,
        "destination_api_key": destination_api_key,
        "destination_api_verify_ssl": destination_api_verify_ssl,
    }
    if destination_storage_pool is not None:
        params["destination_storage_pool"] = destination_storage_pool
    response = call_api(
        config, "post", "/vm/{vm}/mirror/create".format(vm=vm), params=params
    )
    return get_wait_retdata(response, wait_flag)
 def vm_promote_mirror(
    config,
    vm,
    destination_api_uri,
    destination_api_key,
    destination_api_verify_ssl=True,
    destination_storage_pool=None,
    remove_on_source=False,
    wait_flag=True,
 ):
    """
    Shut down a VM, create a new snapshot, send the snapshot to a destination PVC cluster, start the VM on the remote cluster, and optionally remove the local VM, with automatic incremental handling
    API endpoint: POST /vm/{vm}/mirror/promote
    API arguments: destination_api_uri=destination_api_uri, destination_api_key=destination_api_key, destination_api_verify_ssl=destination_api_verify_ssl, destination_storage_pool=destination_storage_pool, remove_on_source=remove_on_source
    API schema: {"message":"{data}"}
    """
    params = {
        "destination_api_uri": destination_api_uri,
        "destination_api_key": destination_api_key,
        "destination_api_verify_ssl": destination_api_verify_ssl,
        "remove_on_source": remove_on_source,
    }
    if destination_storage_pool is not None:
        params["destination_storage_pool"] = destination_storage_pool
    response = call_api(
        config, "post", "/vm/{vm}/mirror/promote".format(vm=vm), params=params
    )
    return get_wait_retdata(response, wait_flag)
 def vm_autobackup(config, email_recipients=None, force_full_flag=False, wait_flag=True):
    """
    Perform a cluster VM autobackup
    API endpoint: POST /vm//autobackup
    API arguments: email_recipients=email_recipients, force_full_flag=force_full_flag
    API schema: {"message":"{data}"}
    """
    params = {
        "email_recipients": email_recipients,
        "force_full": force_full_flag,
    }
    response = call_api(config, "post", "/vm/autobackup", params=params)
    return get_wait_retdata(response, wait_flag)
 def vm_vcpus_set(config, vm, vcpus, topology, restart):
    """
    Set the vCPU count of the VM with topology
@ -1516,6 +1738,7 @@ def format_info(config, domain_information, long_output):
            ansiprint.purple(), ansiprint.end(), domain_information["vcpu"]
        )
    )
    if long_output:
        ainformation.append(
            "{}Topology (S/C/T):{}   {}".format(
                ansiprint.purple(), ansiprint.end(), domain_information["vcpu_topology"]
@ -1523,22 +1746,32 @@ def format_info(config, domain_information, long_output):
        )
    if (
-        domain_information["vnc"].get("listen", "None") != "None"
+        domain_information["vnc"].get("listen")
-        and domain_information["vnc"].get("port", "None") != "None"
+        and domain_information["vnc"].get("port")
-    ):
+    ) or long_output:
        listen = (
            domain_information["vnc"]["listen"]
            if domain_information["vnc"].get("listen")
            else "N/A"
        )
        port = (
            domain_information["vnc"]["port"]
            if domain_information["vnc"].get("port")
            else "N/A"
        )
        ainformation.append("")
        ainformation.append(
            "{}VNC listen:{}         {}".format(
-                ansiprint.purple(), ansiprint.end(), domain_information["vnc"]["listen"]
+                ansiprint.purple(), ansiprint.end(), listen
            )
        )
        ainformation.append(
            "{}VNC port:{}           {}".format(
-                ansiprint.purple(), ansiprint.end(), domain_information["vnc"]["port"]
+                ansiprint.purple(), ansiprint.end(), port
            )
        )
-    if long_output is True:
+    if long_output:
        # Virtualization information
        ainformation.append("")
        ainformation.append(
@ -1626,6 +1859,9 @@ def format_info(config, domain_information, long_output):
        "migrate": ansiprint.blue(),
        "unmigrate": ansiprint.blue(),
        "provision": ansiprint.blue(),
        "restore": ansiprint.blue(),
        "import": ansiprint.blue(),
        "mirror": ansiprint.purple(),
    }
    ainformation.append(
        "{}State:{}              {}{}{}".format(
@ -1637,14 +1873,14 @@ def format_info(config, domain_information, long_output):
        )
    )
    ainformation.append(
-        "{}Current Node:{}       {}".format(
+        "{}Current node:{}       {}".format(
            ansiprint.purple(), ansiprint.end(), domain_information["node"]
        )
    )
    if not domain_information["last_node"]:
        domain_information["last_node"] = "N/A"
    ainformation.append(
-        "{}Previous Node:{}      {}".format(
+        "{}Previous node:{}      {}".format(
            ansiprint.purple(), ansiprint.end(), domain_information["last_node"]
        )
    )
@ -1658,12 +1894,18 @@ def format_info(config, domain_information, long_output):
            )
        )
-    if not domain_information.get("node_selector"):
+    if (
        not domain_information.get("node_selector")
        or domain_information.get("node_selector") == "None"
    ):
        formatted_node_selector = "Default"
    else:
        formatted_node_selector = str(domain_information["node_selector"]).title()
-    if not domain_information.get("node_limit"):
+    if (
        not domain_information.get("node_limit")
        or domain_information.get("node_limit") == "None"
    ):
        formatted_node_limit = "Any"
    else:
        formatted_node_limit = ", ".join(domain_information["node_limit"])
@ -1675,16 +1917,16 @@ def format_info(config, domain_information, long_output):
        autostart_colour = ansiprint.green()
        formatted_node_autostart = "True"
-    if not domain_information.get("migration_method"):
+    if (
-        formatted_migration_method = "Any"
+        not domain_information.get("migration_method")
        or domain_information.get("migration_method") == "None"
    ):
        formatted_migration_method = "Live, Shutdown"
    else:
-        formatted_migration_method = str(domain_information["migration_method"]).title()
+        formatted_migration_method = (
            f"{str(domain_information['migration_method']).title()} only"
        )
    ainformation.append(
        "{}Migration selector:{} {}".format(
            ansiprint.purple(), ansiprint.end(), formatted_node_selector
        )
    )
    ainformation.append(
        "{}Node limit:{}         {}".format(
            ansiprint.purple(), ansiprint.end(), formatted_node_limit
@ -1700,10 +1942,22 @@ def format_info(config, domain_information, long_output):
        )
    )
    ainformation.append(
-        "{}Migration Method:{}   {}".format(
+        "{}Migration method:{}   {}".format(
            ansiprint.purple(), ansiprint.end(), formatted_migration_method
        )
    )
    ainformation.append(
        "{}Migration selector:{} {}".format(
            ansiprint.purple(), ansiprint.end(), formatted_node_selector
        )
    )
    ainformation.append(
        "{}Max live downtime:{}  {}".format(
            ansiprint.purple(),
            ansiprint.end(),
            f"{domain_information.get('migration_max_downtime')} ms",
        )
    )
    # Tag list
    tags_name_length = 5
@ -1749,9 +2003,9 @@ def format_info(config, domain_information, long_output):
                    tags_name=tag["name"],
                    tags_type=tag["type"],
                    tags_protected=str(tag["protected"]),
-                    tags_protected_colour=ansiprint.green()
+                    tags_protected_colour=(
-                    if tag["protected"]
+                        ansiprint.green() if tag["protected"] else ansiprint.blue()
-                    else ansiprint.blue(),
+                    ),
                    end=ansiprint.end(),
                )
            )
@ -1764,6 +2018,78 @@ def format_info(config, domain_information, long_output):
            )
        )
    # Snapshot list
    snapshots_name_length = 5
    snapshots_age_length = 4
    snapshots_xml_changes_length = 12
    for snapshot in domain_information.get("snapshots", list()):
        xml_diff_plus = 0
        xml_diff_minus = 0
        for line in snapshot["xml_diff_lines"]:
            if re.match(r"^\+ ", line):
                xml_diff_plus += 1
            elif re.match(r"^- ", line):
                xml_diff_minus += 1
        xml_diff_counts = f"+{xml_diff_plus}/-{xml_diff_minus}"
        _snapshots_name_length = len(snapshot["name"]) + 1
        if _snapshots_name_length > snapshots_name_length:
            snapshots_name_length = _snapshots_name_length
        _snapshots_age_length = len(snapshot["age"]) + 1
        if _snapshots_age_length > snapshots_age_length:
            snapshots_age_length = _snapshots_age_length
        _snapshots_xml_changes_length = len(xml_diff_counts) + 1
        if _snapshots_xml_changes_length > snapshots_xml_changes_length:
            snapshots_xml_changes_length = _snapshots_xml_changes_length
    if len(domain_information.get("snapshots", list())) > 0:
        ainformation.append("")
        ainformation.append(
            "{purple}Snapshots:{end}          {bold}{snapshots_name: <{snapshots_name_length}} {snapshots_age: <{snapshots_age_length}} {snapshots_xml_changes: <{snapshots_xml_changes_length}}{end}".format(
                purple=ansiprint.purple(),
                bold=ansiprint.bold(),
                end=ansiprint.end(),
                snapshots_name_length=snapshots_name_length,
                snapshots_age_length=snapshots_age_length,
                snapshots_xml_changes_length=snapshots_xml_changes_length,
                snapshots_name="Name",
                snapshots_age="Age",
                snapshots_xml_changes="XML Changes",
            )
        )
        for snapshot in domain_information.get("snapshots", list()):
            xml_diff_plus = 0
            xml_diff_minus = 0
            for line in snapshot["xml_diff_lines"]:
                if re.match(r"^\+ ", line):
                    xml_diff_plus += 1
                elif re.match(r"^- ", line):
                    xml_diff_minus += 1
            xml_diff_counts = f"{ansiprint.green()}+{xml_diff_plus}{ansiprint.end()}/{ansiprint.red()}-{xml_diff_minus}{ansiprint.end()}"
            ainformation.append(
                "                    {snapshots_name: <{snapshots_name_length}} {snapshots_age: <{snapshots_age_length}} {snapshots_xml_changes: <{snapshots_xml_changes_length}}{end}".format(
                    snapshots_name_length=snapshots_name_length,
                    snapshots_age_length=snapshots_age_length,
                    snapshots_xml_changes_length=snapshots_xml_changes_length,
                    snapshots_name=snapshot["name"],
                    snapshots_age=snapshot["age"],
                    snapshots_xml_changes=xml_diff_counts,
                    end=ansiprint.end(),
                )
            )
    else:
        ainformation.append("")
        ainformation.append(
            "{purple}Snapshots:{end}          N/A".format(
                purple=ansiprint.purple(),
                end=ansiprint.end(),
            )
        )
    # Network list
    net_list = []
    cluster_net_list = call_api(config, "get", "/network").json()
@ -1790,7 +2116,7 @@ def format_info(config, domain_information, long_output):
        )
    )
-    if long_output is True:
+    if long_output:
        # Disk list
        ainformation.append("")
        name_length = 0
@ -1926,6 +2252,7 @@ def format_list(config, vm_list):
    vm_name_length = 5
    vm_state_length = 6
    vm_tags_length = 5
    vm_snapshots_length = 10
    vm_nets_length = 9
    vm_ram_length = 8
    vm_vcpu_length = 6
@ -1946,6 +2273,12 @@ def format_list(config, vm_list):
        _vm_tags_length = len(",".join(tag_list)) + 1
        if _vm_tags_length > vm_tags_length:
            vm_tags_length = _vm_tags_length
        # vm_snapshots column
        _vm_snapshots_length = (
            len(str(len(domain_information.get("snapshots", list())))) + 1
        )
        if _vm_snapshots_length > vm_snapshots_length:
            vm_snapshots_length = _vm_snapshots_length
        # vm_nets column
        _vm_nets_length = len(",".join(net_list)) + 1
        if _vm_nets_length > vm_nets_length:
@ -1962,7 +2295,11 @@ def format_list(config, vm_list):
    # Format the string (header)
    vm_list_output.append(
        "{bold}{vm_header: <{vm_header_length}} {resource_header: <{resource_header_length}} {node_header: <{node_header_length}}{end_bold}".format(
-            vm_header_length=vm_name_length + vm_state_length + vm_tags_length + 2,
+            vm_header_length=vm_name_length
            + vm_state_length
            + vm_tags_length
            + vm_snapshots_length
            + 3,
            resource_header_length=vm_nets_length + vm_ram_length + vm_vcpu_length + 2,
            node_header_length=vm_node_length + vm_migrated_length + 1,
            bold=ansiprint.bold(),
@ -1972,7 +2309,12 @@ def format_list(config, vm_list):
                [
                    "-"
                    for _ in range(
-                        4, vm_name_length + vm_state_length + vm_tags_length + 1
+                        4,
                        vm_name_length
                        + vm_state_length
                        + vm_tags_length
                        + +vm_snapshots_length
                        + 2,
                    )
                ]
            ),
@ -1994,6 +2336,7 @@ def format_list(config, vm_list):
        "{bold}{vm_name: <{vm_name_length}} \
 {vm_state_colour}{vm_state: <{vm_state_length}}{end_colour} \
 {vm_tags: <{vm_tags_length}} \
 {vm_snapshots: <{vm_snapshots_length}} \
 {vm_networks: <{vm_nets_length}} \
 {vm_memory: <{vm_ram_length}} {vm_vcpu: <{vm_vcpu_length}} \
 {vm_node: <{vm_node_length}} \
@ -2001,6 +2344,7 @@ def format_list(config, vm_list):
            vm_name_length=vm_name_length,
            vm_state_length=vm_state_length,
            vm_tags_length=vm_tags_length,
            vm_snapshots_length=vm_snapshots_length,
            vm_nets_length=vm_nets_length,
            vm_ram_length=vm_ram_length,
            vm_vcpu_length=vm_vcpu_length,
@ -2013,6 +2357,7 @@ def format_list(config, vm_list):
            vm_name="Name",
            vm_state="State",
            vm_tags="Tags",
            vm_snapshots="Snapshots",
            vm_networks="Networks",
            vm_memory="RAM (M)",
            vm_vcpu="vCPUs",
@ -2026,16 +2371,14 @@ def format_list(config, vm_list):
    # Format the string (elements)
    for domain_information in sorted(vm_list, key=lambda v: v["name"]):
-        if domain_information["state"] == "start":
+        if domain_information["state"] in ["start"]:
            vm_state_colour = ansiprint.green()
-        elif domain_information["state"] == "restart":
+        elif domain_information["state"] in ["restart", "shutdown"]:
            vm_state_colour = ansiprint.yellow()
-        elif domain_information["state"] == "shutdown":
+        elif domain_information["state"] in ["stop", "fail"]:
            vm_state_colour = ansiprint.yellow()
        elif domain_information["state"] == "stop":
            vm_state_colour = ansiprint.red()
        elif domain_information["state"] == "fail":
            vm_state_colour = ansiprint.red()
        elif domain_information["state"] in ["mirror"]:
            vm_state_colour = ansiprint.purple()
        else:
            vm_state_colour = ansiprint.blue()
@ -2059,8 +2402,10 @@ def format_list(config, vm_list):
            else:
                net_invalid_list.append(False)
        display_net_string_list = []
        net_string_list = []
        for net_idx, net_vni in enumerate(net_list):
            display_net_string_list.append(net_vni)
            if net_invalid_list[net_idx]:
                net_string_list.append(
                    "{}{}{}".format(
@ -2069,9 +2414,6 @@ def format_list(config, vm_list):
                        ansiprint.end(),
                    )
                )
                # Fix the length due to the extra fake characters
                vm_nets_length -= len(net_vni)
                vm_nets_length += len(net_string_list[net_idx])
            else:
                net_string_list.append(net_vni)
@ -2079,6 +2421,7 @@ def format_list(config, vm_list):
            "{bold}{vm_name: <{vm_name_length}} \
 {vm_state_colour}{vm_state: <{vm_state_length}}{end_colour} \
 {vm_tags: <{vm_tags_length}} \
 {vm_snapshots: <{vm_snapshots_length}} \
 {vm_networks: <{vm_nets_length}} \
 {vm_memory: <{vm_ram_length}} {vm_vcpu: <{vm_vcpu_length}} \
 {vm_node: <{vm_node_length}} \
@ -2086,7 +2429,10 @@ def format_list(config, vm_list):
                vm_name_length=vm_name_length,
                vm_state_length=vm_state_length,
                vm_tags_length=vm_tags_length,
-                vm_nets_length=vm_nets_length,
+                vm_snapshots_length=vm_snapshots_length,
                vm_nets_length=vm_nets_length
                + len(",".join(net_string_list))
                - len(",".join(display_net_string_list)),
                vm_ram_length=vm_ram_length,
                vm_vcpu_length=vm_vcpu_length,
                vm_node_length=vm_node_length,
@ -2098,7 +2444,9 @@ def format_list(config, vm_list):
                vm_name=domain_information["name"],
                vm_state=domain_information["state"],
                vm_tags=",".join(tag_list),
-                vm_networks=",".join(net_string_list),
+                vm_snapshots=len(domain_information.get("snapshots", list())),
                vm_networks=",".join(net_string_list)
                + ("" if all(net_invalid_list) else " "),
                vm_memory=domain_information["memory"],
                vm_vcpu=domain_information["vcpu"],
                vm_node=domain_information["node"],
--- a/client-cli/setup.py
+++ b/client-cli/setup.py
@ -2,7 +2,7 @@ from setuptools import setup
 setup(
    name="pvc",
-    version="0.9.89",
+    version="0.9.103",
    packages=["pvc.cli", "pvc.lib"],
    install_requires=[
        "Click",
--- a/daemon-common/autobackup.py
+++ b/daemon-common/autobackup.py
@ -0,0 +1,695 @@
 #!/usr/bin/env python3
 # autobackup.py - PVC API Autobackup functions
 # Part of the Parallel Virtual Cluster (PVC) system
 #
 #    Copyright (C) 2018-2024 Joshua M. Boniface <joshua@boniface.me>
 #
 #    This program is free software: you can redistribute it and/or modify
 #    it under the terms of the GNU General Public License as published by
 #    the Free Software Foundation, version 3.
 #
 #    This program is distributed in the hope that it will be useful,
 #    but WITHOUT ANY WARRANTY; without even the implied warranty of
 #    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 #    GNU General Public License for more details.
 #
 #    You should have received a copy of the GNU General Public License
 #    along with this program.  If not, see <https://www.gnu.org/licenses/>.
 #
 ###############################################################################
 from datetime import datetime
 from json import load as jload
 from json import dump as jdump
 from os import popen, makedirs, path, scandir
 from shutil import rmtree
 from subprocess import run, PIPE
 from daemon_lib.common import run_os_command
 from daemon_lib.config import get_autobackup_configuration
 from daemon_lib.celery import start, fail, log_info, log_err, update, finish
 import daemon_lib.ceph as ceph
 import daemon_lib.vm as vm
 def send_execution_failure_report(
    celery_conf, config, recipients=None, total_time=0, error=None
 ):
    if recipients is None:
        return
    from email.utils import formatdate
    from socket import gethostname
    log_message = f"Sending email failure report to {', '.join(recipients)}"
    log_info(celery_conf[0], log_message)
    update(
        celery_conf[0],
        log_message,
        current=celery_conf[1] + 1,
        total=celery_conf[2],
    )
    current_datetime = datetime.now()
    email_datetime = formatdate(float(current_datetime.strftime("%s")))
    email = list()
    email.append(f"Date: {email_datetime}")
    email.append(
        f"Subject: PVC Autobackup execution failure for cluster '{config['cluster']}'"
    )
    email_to = list()
    for recipient in recipients:
        email_to.append(f"<{recipient}>")
    email.append(f"To: {', '.join(email_to)}")
    email.append(f"From: PVC Autobackup System <pvc@{gethostname()}>")
    email.append("")
    email.append(
        f"A PVC autobackup has FAILED at {current_datetime} in {total_time}s due to an execution error."
    )
    email.append("")
    email.append("The reported error message is:")
    email.append(f"  {error}")
    try:
        with popen("/usr/sbin/sendmail -t", "w") as p:
            p.write("\n".join(email))
    except Exception as e:
        log_err(f"Failed to send report email: {e}")
 def send_execution_summary_report(
    celery_conf, config, recipients=None, total_time=0, summary=dict()
 ):
    if recipients is None:
        return
    from email.utils import formatdate
    from socket import gethostname
    log_message = f"Sending email summary report to {', '.join(recipients)}"
    log_info(celery_conf[0], log_message)
    update(
        celery_conf[0],
        log_message,
        current=celery_conf[1] + 1,
        total=celery_conf[2],
    )
    current_datetime = datetime.now()
    email_datetime = formatdate(float(current_datetime.strftime("%s")))
    email = list()
    email.append(f"Date: {email_datetime}")
    email.append(f"Subject: PVC Autobackup report for cluster '{config['cluster']}'")
    email_to = list()
    for recipient in recipients:
        email_to.append(f"<{recipient}>")
    email.append(f"To: {', '.join(email_to)}")
    email.append(f"From: PVC Autobackup System <pvc@{gethostname()}>")
    email.append("")
    email.append(
        f"A PVC autobackup has been completed at {current_datetime} in {total_time}."
    )
    email.append("")
    email.append(
        "The following is a summary of all current VM backups after cleanups, most recent first:"
    )
    email.append("")
    for vm_name in summary.keys():
        email.append(f"VM: {vm_name}:")
        for backup in summary[vm_name]:
            datestring = backup.get("datestring")
            backup_date = datetime.strptime(datestring, "%Y%m%d%H%M%S")
            if backup.get("result", False):
                email.append(
                    f"    {backup_date}: Success in {backup.get('runtime_secs', 0)} seconds, ID {backup.get('snapshot_name')}, type {backup.get('type', 'unknown')}"
                )
                email.append(
                    f"                         Backup contains {len(backup.get('export_files'))} files totaling {ceph.format_bytes_tohuman(backup.get('export_size_bytes', 0))} ({backup.get('export_size_bytes', 0)} bytes)"
                )
            else:
                email.append(
                    f"    {backup_date}: Failure in {backup.get('runtime_secs', 0)} seconds, ID {backup.get('snapshot_name')}, type {backup.get('type', 'unknown')}"
                )
                email.append(f"                         {backup.get('result_message')}")
    try:
        with popen("/usr/sbin/sendmail -t", "w") as p:
            p.write("\n".join(email))
    except Exception as e:
        log_err(f"Failed to send report email: {e}")
 def run_vm_backup(zkhandler, celery, config, vm_detail, force_full=False):
    vm_name = vm_detail["name"]
    dom_uuid = vm_detail["uuid"]
    backup_suffixed_path = f"{config['backup_root_path']}{config['backup_root_suffix']}"
    vm_backup_path = f"{backup_suffixed_path}/{vm_name}"
    autobackup_state_file = f"{vm_backup_path}/.autobackup.json"
    full_interval = config["backup_schedule"]["full_interval"]
    full_retention = config["backup_schedule"]["full_retention"]
    if not path.exists(vm_backup_path) or not path.exists(autobackup_state_file):
        # There are no existing backups so the list is empty
        state_data = dict()
        tracked_backups = list()
    else:
        with open(autobackup_state_file) as fh:
            state_data = jload(fh)
        tracked_backups = state_data["tracked_backups"]
    full_backups = [b for b in tracked_backups if b["type"] == "full"]
    if len(full_backups) > 0:
        last_full_backup = full_backups[0]
        last_full_backup_idx = tracked_backups.index(last_full_backup)
        if force_full:
            this_backup_incremental_parent = None
            this_backup_retain_snapshot = True
        elif last_full_backup_idx >= full_interval - 1:
            this_backup_incremental_parent = None
            this_backup_retain_snapshot = True
        else:
            this_backup_incremental_parent = last_full_backup["snapshot_name"]
            this_backup_retain_snapshot = False
    else:
        # The very first ackup must be full to start the tree
        this_backup_incremental_parent = None
        this_backup_retain_snapshot = True
    export_type = (
        "incremental" if this_backup_incremental_parent is not None else "full"
    )
    now = datetime.now()
    datestring = now.strftime("%Y%m%d%H%M%S")
    snapshot_name = f"ab{datestring}"
    # Take the VM snapshot (vm.vm_worker_create_snapshot)
    snap_list = list()
    failure = False
    export_files = None
    export_files_size = 0
    def update_tracked_backups():
        # Read export file to get details
        backup_json_file = (
            f"{backup_suffixed_path}/{vm_name}/{snapshot_name}/snapshot.json"
        )
        try:
            with open(backup_json_file) as fh:
                backup_json = jload(fh)
            tracked_backups.insert(0, backup_json)
        except Exception as e:
            log_err(celery, f"Could not open export JSON: {e}")
            return list()
        state_data["tracked_backups"] = tracked_backups
        with open(autobackup_state_file, "w") as fh:
            jdump(state_data, fh)
        return tracked_backups
    def write_backup_summary(success=False, message=""):
        ttotal = (datetime.now() - now).total_seconds()
        export_details = {
            "type": export_type,
            "result": success,
            "message": message,
            "datestring": datestring,
            "runtime_secs": ttotal,
            "snapshot_name": snapshot_name,
            "incremental_parent": this_backup_incremental_parent,
            "vm_detail": vm_detail,
            "export_files": export_files,
            "export_size_bytes": export_files_size,
        }
        try:
            with open(
                f"{backup_suffixed_path}/{vm_name}/{snapshot_name}/snapshot.json",
                "w",
            ) as fh:
                jdump(export_details, fh)
        except Exception as e:
            log_err(celery, f"Error exporting snapshot details: {e}")
            return False, e
        return True, ""
    def cleanup_failure():
        for snapshot in snap_list:
            rbd, snapshot_name = snapshot.split("@")
            pool, volume = rbd.split("/")
            # We capture no output here, because if this fails too we're in a deep
            # error chain and will just ignore it
            ceph.remove_snapshot(zkhandler, pool, volume, snapshot_name)
    rbd_list = zkhandler.read(("domain.storage.volumes", dom_uuid)).split(",")
    for rbd in rbd_list:
        pool, volume = rbd.split("/")
        ret, msg = ceph.add_snapshot(
            zkhandler, pool, volume, snapshot_name, zk_only=False
        )
        if not ret:
            cleanup_failure()
            error_message = msg.replace("ERROR: ", "")
            log_err(celery, error_message)
            failure = True
            break
        else:
            snap_list.append(f"{pool}/{volume}@{snapshot_name}")
    if failure:
        error_message = (f"[{vm_name}] Error in snapshot export, skipping",)
        write_backup_summary(message=error_message)
        tracked_backups = update_tracked_backups()
        return tracked_backups
    # Get the current domain XML
    vm_config = zkhandler.read(("domain.xml", dom_uuid))
    # Add the snapshot entry to Zookeeper
    ret = zkhandler.write(
        [
            (
                (
                    "domain.snapshots",
                    dom_uuid,
                    "domain_snapshot.name",
                    snapshot_name,
                ),
                snapshot_name,
            ),
            (
                (
                    "domain.snapshots",
                    dom_uuid,
                    "domain_snapshot.timestamp",
                    snapshot_name,
                ),
                now.strftime("%s"),
            ),
            (
                (
                    "domain.snapshots",
                    dom_uuid,
                    "domain_snapshot.xml",
                    snapshot_name,
                ),
                vm_config,
            ),
            (
                (
                    "domain.snapshots",
                    dom_uuid,
                    "domain_snapshot.rbd_snapshots",
                    snapshot_name,
                ),
                ",".join(snap_list),
            ),
        ]
    )
    if not ret:
        error_message = (f"[{vm_name}] Error in snapshot export, skipping",)
        log_err(celery, error_message)
        write_backup_summary(message=error_message)
        tracked_backups = update_tracked_backups()
        return tracked_backups
    # Export the snapshot (vm.vm_worker_export_snapshot)
    export_target_path = f"{backup_suffixed_path}/{vm_name}/{snapshot_name}/images"
    try:
        makedirs(export_target_path)
    except Exception as e:
        error_message = (
            f"[{vm_name}] Failed to create target directory '{export_target_path}': {e}",
        )
        log_err(celery, error_message)
        return tracked_backups
    def export_cleanup():
        from shutil import rmtree
        rmtree(f"{backup_suffixed_path}/{vm_name}/{snapshot_name}")
    # Set the export filetype
    if this_backup_incremental_parent is not None:
        export_fileext = "rbddiff"
    else:
        export_fileext = "rbdimg"
    snapshot_volumes = list()
    for rbdsnap in snap_list:
        pool, _volume = rbdsnap.split("/")
        volume, name = _volume.split("@")
        ret, snapshots = ceph.get_list_snapshot(
            zkhandler, pool, volume, limit=name, is_fuzzy=False
        )
        if ret:
            snapshot_volumes += snapshots
    export_files = list()
    for snapshot_volume in snapshot_volumes:
        snap_pool = snapshot_volume["pool"]
        snap_volume = snapshot_volume["volume"]
        snap_snapshot_name = snapshot_volume["snapshot"]
        snap_size = snapshot_volume["stats"]["size"]
        if this_backup_incremental_parent is not None:
            retcode, stdout, stderr = run_os_command(
                f"rbd export-diff --from-snap {this_backup_incremental_parent} {snap_pool}/{snap_volume}@{snap_snapshot_name} {export_target_path}/{snap_pool}.{snap_volume}.{export_fileext}"
            )
            if retcode:
                error_message = (
                    f"[{vm_name}] Failed to export snapshot for volume(s) '{snap_pool}/{snap_volume}'",
                )
                failure = True
                break
            else:
                export_files.append(
                    (
                        f"images/{snap_pool}.{snap_volume}.{export_fileext}",
                        snap_size,
                    )
                )
        else:
            retcode, stdout, stderr = run_os_command(
                f"rbd export --export-format 2 {snap_pool}/{snap_volume}@{snap_snapshot_name} {export_target_path}/{snap_pool}.{snap_volume}.{export_fileext}"
            )
            if retcode:
                error_message = (
                    f"[{vm_name}] Failed to export snapshot for volume(s) '{snap_pool}/{snap_volume}'",
                )
                failure = True
                break
            else:
                export_files.append(
                    (
                        f"images/{snap_pool}.{snap_volume}.{export_fileext}",
                        snap_size,
                    )
                )
    if failure:
        log_err(celery, error_message)
        write_backup_summary(message=error_message)
        tracked_backups = update_tracked_backups()
        return tracked_backups
    def get_dir_size(pathname):
        total = 0
        with scandir(pathname) as it:
            for entry in it:
                if entry.is_file():
                    total += entry.stat().st_size
                elif entry.is_dir():
                    total += get_dir_size(entry.path)
        return total
    export_files_size = get_dir_size(export_target_path)
    ret, e = write_backup_summary(success=True)
    if not ret:
        error_message = (f"[{vm_name}] Failed to export configuration snapshot: {e}",)
        log_err(celery, error_message)
        write_backup_summary(message=error_message)
        tracked_backups = update_tracked_backups()
        return tracked_backups
    # Clean up the snapshot (vm.vm_worker_remove_snapshot)
    if not this_backup_retain_snapshot:
        for snap in snap_list:
            rbd, name = snap.split("@")
            pool, volume = rbd.split("/")
            ret, msg = ceph.remove_snapshot(zkhandler, pool, volume, name)
            if not ret:
                error_message = msg.replace("ERROR: ", f"[{vm_name}] ")
                failure = True
                break
        if failure:
            log_err(celery, error_message)
            write_backup_summary(message=error_message)
            tracked_backups = update_tracked_backups()
            return tracked_backups
        ret = zkhandler.delete(
            ("domain.snapshots", dom_uuid, "domain_snapshot.name", snapshot_name)
        )
        if not ret:
            error_message = (f"[{vm_name}] Failed to remove VM snapshot; continuing",)
            log_err(celery, error_message)
    marked_for_deletion = list()
    # Find any full backups that are expired
    found_full_count = 0
    for backup in tracked_backups:
        if backup["type"] == "full":
            found_full_count += 1
            if found_full_count > full_retention:
                marked_for_deletion.append(backup)
    # Find any incremental backups that depend on marked parents
    for backup in tracked_backups:
        if backup["type"] == "incremental" and backup["incremental_parent"] in [
            b["snapshot_name"] for b in marked_for_deletion
        ]:
            marked_for_deletion.append(backup)
    if len(marked_for_deletion) > 0:
        for backup_to_delete in marked_for_deletion:
            ret = vm.vm_worker_remove_snapshot(
                zkhandler, None, vm_name, backup_to_delete["snapshot_name"]
            )
            if ret is False:
                error_message = f"Failed to remove obsolete backup snapshot '{backup_to_delete['snapshot_name']}', leaving in tracked backups"
                log_err(celery, error_message)
            else:
                rmtree(f"{vm_backup_path}/{backup_to_delete['snapshot_name']}")
                tracked_backups.remove(backup_to_delete)
    tracked_backups = update_tracked_backups()
    return tracked_backups
 def worker_cluster_autobackup(
    zkhandler, celery, force_full=False, email_recipients=None
 ):
    config = get_autobackup_configuration()
    backup_summary = dict()
    current_stage = 0
    total_stages = 1
    if email_recipients is not None:
        total_stages += 1
    start(
        celery,
        f"Starting cluster '{config['cluster']}' VM autobackup",
        current=current_stage,
        total=total_stages,
    )
    if not config["autobackup_enabled"]:
        message = "Autobackups are not configured on this cluster."
        log_info(celery, message)
        return finish(
            celery,
            message,
            current=total_stages,
            total=total_stages,
        )
    autobackup_start_time = datetime.now()
    retcode, vm_list = vm.get_list(zkhandler)
    if not retcode:
        error_message = f"Failed to fetch VM list: {vm_list}"
        log_err(celery, error_message)
        send_execution_failure_report(
            (celery, current_stage, total_stages),
            config,
            recipients=email_recipients,
            error=error_message,
        )
        fail(celery, error_message)
        return False
    backup_suffixed_path = f"{config['backup_root_path']}{config['backup_root_suffix']}"
    if not path.exists(backup_suffixed_path):
        makedirs(backup_suffixed_path)
    full_interval = config["backup_schedule"]["full_interval"]
    backup_vms = list()
    for vm_detail in vm_list:
        vm_tag_names = [t["name"] for t in vm_detail["tags"]]
        matching_tags = (
            True
            if len(set(vm_tag_names).intersection(set(config["backup_tags"]))) > 0
            else False
        )
        if matching_tags:
            backup_vms.append(vm_detail)
    if len(backup_vms) < 1:
        message = "Found no VMs tagged for autobackup."
        log_info(celery, message)
        return finish(
            celery,
            message,
            current=total_stages,
            total=total_stages,
        )
    if config["auto_mount_enabled"]:
        total_stages += len(config["mount_cmds"])
        total_stages += len(config["unmount_cmds"])
    total_stages += len(backup_vms)
    log_info(
        celery,
        f"Found {len(backup_vms)} suitable VM(s) for autobackup: {', '.join([b['name'] for b in backup_vms])}",
    )
    # Handle automount mount commands
    if config["auto_mount_enabled"]:
        for cmd in config["mount_cmds"]:
            current_stage += 1
            update(
                celery,
                f"Executing mount command '{cmd.split()[0]}'",
                current=current_stage,
                total=total_stages,
            )
            ret = run(
                cmd.split(),
                stdout=PIPE,
                stderr=PIPE,
            )
            if ret.returncode != 0:
                error_message = f"Failed to execute mount command '{cmd.split()[0]}': {ret.stderr.decode().strip()}"
                log_err(celery, error_message)
                send_execution_failure_report(
                    (celery, current_stage, total_stages),
                    config,
                    recipients=email_recipients,
                    total_time=datetime.now() - autobackup_start_time,
                    error=error_message,
                )
                fail(celery, error_message)
                return False
    # Execute the backup: take a snapshot, then export the snapshot
    for vm_detail in backup_vms:
        vm_backup_path = f"{backup_suffixed_path}/{vm_detail['name']}"
        autobackup_state_file = f"{vm_backup_path}/.autobackup.json"
        if not path.exists(vm_backup_path) or not path.exists(autobackup_state_file):
            # There are no existing backups so the list is empty
            state_data = dict()
            tracked_backups = list()
        else:
            with open(autobackup_state_file) as fh:
                state_data = jload(fh)
            tracked_backups = state_data["tracked_backups"]
        full_backups = [b for b in tracked_backups if b["type"] == "full"]
        if len(full_backups) > 0:
            last_full_backup = full_backups[0]
            last_full_backup_idx = tracked_backups.index(last_full_backup)
            if force_full:
                this_backup_incremental_parent = None
            elif last_full_backup_idx >= full_interval - 1:
                this_backup_incremental_parent = None
            else:
                this_backup_incremental_parent = last_full_backup["snapshot_name"]
        else:
            # The very first ackup must be full to start the tree
            this_backup_incremental_parent = None
        export_type = (
            "incremental" if this_backup_incremental_parent is not None else "full"
        )
        current_stage += 1
        update(
            celery,
            f"Performing autobackup of VM {vm_detail['name']} ({export_type})",
            current=current_stage,
            total=total_stages,
        )
        summary = run_vm_backup(
            zkhandler,
            celery,
            config,
            vm_detail,
            force_full=force_full,
        )
        backup_summary[vm_detail["name"]] = summary
    # Handle automount unmount commands
    if config["auto_mount_enabled"]:
        for cmd in config["unmount_cmds"]:
            current_stage += 1
            update(
                celery,
                f"Executing unmount command '{cmd.split()[0]}'",
                current=current_stage,
                total=total_stages,
            )
            ret = run(
                cmd.split(),
                stdout=PIPE,
                stderr=PIPE,
            )
            if ret.returncode != 0:
                error_message = f"Failed to execute unmount command '{cmd.split()[0]}': {ret.stderr.decode().strip()}"
                log_err(celery, error_message)
                send_execution_failure_report(
                    (celery, current_stage, total_stages),
                    config,
                    recipients=email_recipients,
                    total_time=datetime.now() - autobackup_start_time,
                    error=error_message,
                )
                fail(celery, error_message)
                return False
    autobackup_end_time = datetime.now()
    autobackup_total_time = autobackup_end_time - autobackup_start_time
    if email_recipients is not None:
        send_execution_summary_report(
            (celery, current_stage, total_stages),
            config,
            recipients=email_recipients,
            total_time=autobackup_total_time,
            summary=backup_summary,
        )
        current_stage += 1
    current_stage += 1
    return finish(
        celery,
        f"Successfully completed cluster '{config['cluster']}' VM autobackup",
        current=current_stage,
        total=total_stages,
    )
--- a/daemon-common/benchmark.py
+++ b/daemon-common/benchmark.py
@ -19,31 +19,34 @@
 #
 ###############################################################################
 import os
 import psutil
 import psycopg2
 import psycopg2.extras
 import subprocess
 from datetime import datetime
 from json import loads, dumps
 from time import sleep
 from daemon_lib.celery import start, fail, log_info, update, finish
 import daemon_lib.common as pvc_common
 import daemon_lib.ceph as pvc_ceph
 # Define the current test format
-TEST_FORMAT = 1
+TEST_FORMAT = 2
 # We run a total of 8 tests, to give a generalized idea of performance on the cluster:
-#   1. A sequential read test of 8GB with a 4M block size
+#   1. A sequential read test of 64GB with a 4M block size
-#   2. A sequential write test of 8GB with a 4M block size
+#   2. A sequential write test of 64GB with a 4M block size
-#   3. A random read test of 8GB with a 4M block size
+#   3. A random read test of 64GB with a 4M block size
-#   4. A random write test of 8GB with a 4M block size
+#   4. A random write test of 64GB with a 4M block size
-#   5. A random read test of 8GB with a 256k block size
+#   5. A random read test of 64GB with a 256k block size
-#   6. A random write test of 8GB with a 256k block size
+#   6. A random write test of 64GB with a 256k block size
-#   7. A random read test of 8GB with a 4k block size
+#   7. A random read test of 64GB with a 4k block size
-#   8. A random write test of 8GB with a 4k block size
+#   8. A random write test of 64GB with a 4k block size
 # Taken together, these 8 results should give a very good indication of the overall storage performance
 # for a variety of workloads.
 test_matrix = {
@ -100,7 +103,7 @@ test_matrix = {
 # Specify the benchmark volume name and size
 benchmark_volume_name = "pvcbenchmark"
-benchmark_volume_size = "8G"
+benchmark_volume_size = "64G"
 #
@ -115,9 +118,10 @@ class BenchmarkError(Exception):
 #
-def cleanup(job_name, db_conn=None, db_cur=None, zkhandler=None):
+def cleanup(job_name, db_conn=None, db_cur=None, zkhandler=None, final=False):
    if db_conn is not None and db_cur is not None:
-        # Clean up our dangling result
+        if not final:
            # Clean up our dangling result (non-final runs only)
            query = "DELETE FROM storage_benchmarks WHERE job = %s;"
            args = (job_name,)
            db_cur.execute(query, args)
@ -225,7 +229,7 @@ def cleanup_benchmark_volume(
 def run_benchmark_job(
-    test, pool, job_name=None, db_conn=None, db_cur=None, zkhandler=None
+    config, test, pool, job_name=None, db_conn=None, db_cur=None, zkhandler=None
 ):
    test_spec = test_matrix[test]
    log_info(None, f"Running test '{test}'")
@ -255,31 +259,165 @@ def run_benchmark_job(
    )
    log_info(None, "Running fio job: {}".format(" ".join(fio_cmd.split())))
-    retcode, stdout, stderr = pvc_common.run_os_command(fio_cmd)
+
    # Run the fio command manually instead of using our run_os_command wrapper
    # This will help us gather statistics about this node while it's running
    process = subprocess.Popen(
        fio_cmd.split(),
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        text=True,
    )
    # Wait 15 seconds for the test to start
    log_info(None, "Waiting 15 seconds for test resource stabilization")
    sleep(15)
    # Set up function to get process CPU utilization by name
    def get_cpu_utilization_by_name(process_name):
        cpu_usage = 0
        for proc in psutil.process_iter(["name", "cpu_percent"]):
            if proc.info["name"] == process_name:
                cpu_usage += proc.info["cpu_percent"]
        return cpu_usage
    # Set up function to get process memory utilization by name
    def get_memory_utilization_by_name(process_name):
        memory_usage = 0
        for proc in psutil.process_iter(["name", "memory_percent"]):
            if proc.info["name"] == process_name:
                memory_usage += proc.info["memory_percent"]
        return memory_usage
    # Set up function to get network traffic utilization in bps
    def get_network_traffic_bps(interface, duration=1):
        # Get initial network counters
        net_io_start = psutil.net_io_counters(pernic=True)
        if interface not in net_io_start:
            return None, None
        stats_start = net_io_start[interface]
        bytes_sent_start = stats_start.bytes_sent
        bytes_recv_start = stats_start.bytes_recv
        # Wait for the specified duration
        sleep(duration)
        # Get final network counters
        net_io_end = psutil.net_io_counters(pernic=True)
        stats_end = net_io_end[interface]
        bytes_sent_end = stats_end.bytes_sent
        bytes_recv_end = stats_end.bytes_recv
        # Calculate bytes per second
        bytes_sent_per_sec = (bytes_sent_end - bytes_sent_start) / duration
        bytes_recv_per_sec = (bytes_recv_end - bytes_recv_start) / duration
        # Convert to bits per second (bps)
        bits_sent_per_sec = bytes_sent_per_sec * 8
        bits_recv_per_sec = bytes_recv_per_sec * 8
        bits_total_per_sec = bits_sent_per_sec + bits_recv_per_sec
        return bits_sent_per_sec, bits_recv_per_sec, bits_total_per_sec
    log_info(None, f"Starting system resource polling for test '{test}'")
    storage_interface = config["storage_dev"]
    total_cpus = psutil.cpu_count(logical=True)
    ticks = 1
    osd_cpu_utilization = 0
    osd_memory_utilization = 0
    mon_cpu_utilization = 0
    mon_memory_utilization = 0
    total_cpu_utilization = 0
    total_memory_utilization = 0
    storage_sent_bps = 0
    storage_recv_bps = 0
    storage_total_bps = 0
    while process.poll() is None:
        # Do collection of statistics like network bandwidth and cpu utilization
        current_osd_cpu_utilization = get_cpu_utilization_by_name("ceph-osd")
        current_osd_memory_utilization = get_memory_utilization_by_name("ceph-osd")
        current_mon_cpu_utilization = get_cpu_utilization_by_name("ceph-mon")
        current_mon_memory_utilization = get_memory_utilization_by_name("ceph-mon")
        current_total_cpu_utilization = psutil.cpu_percent(interval=1)
        current_total_memory_utilization = psutil.virtual_memory().percent
        (
            current_storage_sent_bps,
            current_storage_recv_bps,
            current_storage_total_bps,
        ) = get_network_traffic_bps(storage_interface)
        # Recheck if the process is done yet; if it's not, we add the values and increase the ticks
        # This helps ensure that if the process finishes earlier than the longer polls above,
        # this particular tick isn't counted which can skew the average
        if process.poll() is None:
            osd_cpu_utilization += current_osd_cpu_utilization
            osd_memory_utilization += current_osd_memory_utilization
            mon_cpu_utilization += current_mon_cpu_utilization
            mon_memory_utilization += current_mon_memory_utilization
            total_cpu_utilization += current_total_cpu_utilization
            total_memory_utilization += current_total_memory_utilization
            storage_sent_bps += current_storage_sent_bps
            storage_recv_bps += current_storage_recv_bps
            storage_total_bps += current_storage_total_bps
            ticks += 1
    # Get the 1-minute load average and CPU utilization, which covers the test duration
    load1, _, _ = os.getloadavg()
    load1 = round(load1, 2)
    # Calculate the average CPU utilization values over the runtime
    # Divide the OSD and MON CPU utilization by the total number of CPU cores, because
    # the total is divided this way
    avg_osd_cpu_utilization = round(osd_cpu_utilization / ticks / total_cpus, 2)
    avg_osd_memory_utilization = round(osd_memory_utilization / ticks, 2)
    avg_mon_cpu_utilization = round(mon_cpu_utilization / ticks / total_cpus, 2)
    avg_mon_memory_utilization = round(mon_memory_utilization / ticks, 2)
    avg_total_cpu_utilization = round(total_cpu_utilization / ticks, 2)
    avg_total_memory_utilization = round(total_memory_utilization / ticks, 2)
    avg_storage_sent_bps = round(storage_sent_bps / ticks, 2)
    avg_storage_recv_bps = round(storage_recv_bps / ticks, 2)
    avg_storage_total_bps = round(storage_total_bps / ticks, 2)
    stdout, stderr = process.communicate()
    retcode = process.returncode
    resource_data = {
        "avg_cpu_util_percent": {
            "total": avg_total_cpu_utilization,
            "ceph-mon": avg_mon_cpu_utilization,
            "ceph-osd": avg_osd_cpu_utilization,
        },
        "avg_memory_util_percent": {
            "total": avg_total_memory_utilization,
            "ceph-mon": avg_mon_memory_utilization,
            "ceph-osd": avg_osd_memory_utilization,
        },
        "avg_network_util_bps": {
            "sent": avg_storage_sent_bps,
            "recv": avg_storage_recv_bps,
            "total": avg_storage_total_bps,
        },
    }
    try:
        jstdout = loads(stdout)
        if retcode:
            raise
    except Exception:
-        cleanup(
+        return None, None
            job_name,
            db_conn=db_conn,
            db_cur=db_cur,
            zkhandler=zkhandler,
        )
        fail(
            None,
            f"Failed to run fio test '{test}': {stderr}",
        )
-    return jstdout
+    return resource_data, jstdout
-def worker_run_benchmark(zkhandler, celery, config, pool):
+def worker_run_benchmark(zkhandler, celery, config, pool, name):
    # Phase 0 - connect to databases
    if not name:
        cur_time = datetime.now().isoformat(timespec="seconds")
        cur_primary = zkhandler.read("base.config.primary_node")
        job_name = f"{cur_time}_{cur_primary}"
    else:
        job_name = name
    current_stage = 0
    total_stages = 13
@ -357,7 +495,8 @@ def worker_run_benchmark(zkhandler, celery, config, pool):
            total=total_stages,
        )
-        results[test] = run_benchmark_job(
+        resource_data, fio_data = run_benchmark_job(
            config,
            test,
            pool,
            job_name=job_name,
@ -365,6 +504,25 @@ def worker_run_benchmark(zkhandler, celery, config, pool):
            db_cur=db_cur,
            zkhandler=zkhandler,
        )
        if resource_data is None or fio_data is None:
            cleanup_benchmark_volume(
                pool,
                job_name=job_name,
                db_conn=db_conn,
                db_cur=db_cur,
                zkhandler=zkhandler,
            )
            cleanup(
                job_name,
                db_conn=db_conn,
                db_cur=db_cur,
                zkhandler=zkhandler,
            )
            fail(
                None,
                f"Failed to run fio test '{test}'",
            )
        results[test] = {**resource_data, **fio_data}
    # Phase 3 - cleanup
    current_stage += 1
@ -410,6 +568,7 @@ def worker_run_benchmark(zkhandler, celery, config, pool):
        db_conn=db_conn,
        db_cur=db_cur,
        zkhandler=zkhandler,
        final=True,
    )
    current_stage += 1
--- a/daemon-common/ceph.py
+++ b/daemon-common/ceph.py
@ -320,7 +320,11 @@ def get_list_osd(zkhandler, limit=None, is_fuzzy=True):
 #
 def getPoolInformation(zkhandler, pool):
    # Parse the stats data
-    (pool_stats_raw, tier, pgs,) = zkhandler.read_many(
+    (
        pool_stats_raw,
        tier,
        pgs,
    ) = zkhandler.read_many(
        [
            ("pool.stats", pool),
            ("pool.tier", pool),
@ -536,7 +540,10 @@ def getCephVolumes(zkhandler, pool):
        pool_list = [pool]
    for pool_name in pool_list:
-        for volume_name in zkhandler.children(("volume", pool_name)):
+        children = zkhandler.children(("volume", pool_name))
        if children is None:
            continue
        for volume_name in children:
            volume_list.append("{}/{}".format(pool_name, volume_name))
    return volume_list
@ -553,7 +560,21 @@ def getVolumeInformation(zkhandler, pool, volume):
    return volume_information
-def add_volume(zkhandler, pool, name, size):
+def scan_volume(zkhandler, pool, name):
    retcode, stdout, stderr = common.run_os_command(
        "rbd info --format json {}/{}".format(pool, name)
    )
    volstats = stdout
    # 3. Add the new volume to Zookeeper
    zkhandler.write(
        [
            (("volume.stats", f"{pool}/{name}"), volstats),
        ]
    )
 def add_volume(zkhandler, pool, name, size, force_flag=False, zk_only=False):
    # 1. Verify the size of the volume
    pool_information = getPoolInformation(zkhandler, pool)
    size_bytes = format_bytes_fromhuman(size)
@ -563,46 +584,88 @@ def add_volume(zkhandler, pool, name, size):
            f"ERROR: Requested volume size '{size}' does not have a valid SI unit",
        )
-    if size_bytes >= int(pool_information["stats"]["free_bytes"]):
+    pool_total_free_bytes = int(pool_information["stats"]["free_bytes"])
    if size_bytes >= pool_total_free_bytes:
        return (
            False,
            f"ERROR: Requested volume size '{format_bytes_tohuman(size_bytes)}' is greater than the available free space in the pool ('{format_bytes_tohuman(pool_information['stats']['free_bytes'])}')",
        )
    # Check if we're greater than 80% utilization after the create; error if so unless we have the force flag
    pool_total_bytes = (
        int(pool_information["stats"]["used_bytes"]) + pool_total_free_bytes
    )
    pool_safe_total_bytes = int(pool_total_bytes * 0.80)
    pool_safe_free_bytes = pool_safe_total_bytes - int(
        pool_information["stats"]["used_bytes"]
    )
    if size_bytes >= pool_safe_free_bytes and not force_flag:
        return (
            False,
            f"ERROR: Requested volume size '{format_bytes_tohuman(size_bytes)}' is greater than the safe free space in the pool ('{format_bytes_tohuman(pool_safe_free_bytes)}' for 80% full); retry with force to ignore this error",
        )
    # 2. Create the volume
    # zk_only flag skips actually creating the volume - this would be done by some other mechanism
    if not zk_only:
        retcode, stdout, stderr = common.run_os_command(
            "rbd create --size {}B {}/{}".format(size_bytes, pool, name)
        )
        if retcode:
-        return False, 'ERROR: Failed to create RBD volume "{}": {}'.format(name, stderr)
+            return False, 'ERROR: Failed to create RBD volume "{}": {}'.format(
-
+                name, stderr
    # 2. Get volume stats
    retcode, stdout, stderr = common.run_os_command(
        "rbd info --format json {}/{}".format(pool, name)
            )
    volstats = stdout
    # 3. Add the new volume to Zookeeper
    zkhandler.write(
        [
            (("volume", f"{pool}/{name}"), ""),
-            (("volume.stats", f"{pool}/{name}"), volstats),
+            (("volume.stats", f"{pool}/{name}"), ""),
            (("snapshot", f"{pool}/{name}"), ""),
        ]
    )
    # 4. Scan the volume stats
    scan_volume(zkhandler, pool, name)
    return True, 'Created RBD volume "{}" of size "{}" in pool "{}".'.format(
        name, format_bytes_tohuman(size_bytes), pool
    )
-def clone_volume(zkhandler, pool, name_src, name_new):
+def clone_volume(zkhandler, pool, name_src, name_new, force_flag=False):
    # 1. Verify the volume
    if not verifyVolume(zkhandler, pool, name_src):
        return False, 'ERROR: No volume with name "{}" is present in pool "{}".'.format(
            name_src, pool
        )
-    # 1. Clone the volume
+    volume_stats_raw = zkhandler.read(("volume.stats", f"{pool}/{name_src}"))
    volume_stats = dict(json.loads(volume_stats_raw))
    size_bytes = volume_stats["size"]
    pool_information = getPoolInformation(zkhandler, pool)
    pool_total_free_bytes = int(pool_information["stats"]["free_bytes"])
    if size_bytes >= pool_total_free_bytes:
        return (
            False,
            f"ERROR: Clone volume size '{format_bytes_tohuman(size_bytes)}' is greater than the available free space in the pool ('{format_bytes_tohuman(pool_information['stats']['free_bytes'])}')",
        )
    # Check if we're greater than 80% utilization after the create; error if so unless we have the force flag
    pool_total_bytes = (
        int(pool_information["stats"]["used_bytes"]) + pool_total_free_bytes
    )
    pool_safe_total_bytes = int(pool_total_bytes * 0.80)
    pool_safe_free_bytes = pool_safe_total_bytes - int(
        pool_information["stats"]["used_bytes"]
    )
    if size_bytes >= pool_safe_free_bytes and not force_flag:
        return (
            False,
            f"ERROR: Clone volume size '{format_bytes_tohuman(size_bytes)}' is greater than the safe free space in the pool ('{format_bytes_tohuman(pool_safe_free_bytes)}' for 80% full); retry with force to ignore this error",
        )
    # 2. Clone the volume
    retcode, stdout, stderr = common.run_os_command(
        "rbd copy {}/{} {}/{}".format(pool, name_src, pool, name_new)
    )
@ -614,27 +677,24 @@ def clone_volume(zkhandler, pool, name_src, name_new):
            ),
        )
    # 2. Get volume stats
    retcode, stdout, stderr = common.run_os_command(
        "rbd info --format json {}/{}".format(pool, name_new)
    )
    volstats = stdout
    # 3. Add the new volume to Zookeeper
    zkhandler.write(
        [
            (("volume", f"{pool}/{name_new}"), ""),
-            (("volume.stats", f"{pool}/{name_new}"), volstats),
+            (("volume.stats", f"{pool}/{name_new}"), ""),
            (("snapshot", f"{pool}/{name_new}"), ""),
        ]
    )
    # 4. Scan the volume stats
    scan_volume(zkhandler, pool, name_new)
    return True, 'Cloned RBD volume "{}" to "{}" in pool "{}"'.format(
        name_src, name_new, pool
    )
-def resize_volume(zkhandler, pool, name, size):
+def resize_volume(zkhandler, pool, name, size, force_flag=False):
    if not verifyVolume(zkhandler, pool, name):
        return False, 'ERROR: No volume with name "{}" is present in pool "{}".'.format(
            name, pool
@ -649,12 +709,27 @@ def resize_volume(zkhandler, pool, name, size):
            f"ERROR: Requested volume size '{size}' does not have a valid SI unit",
        )
-    if size_bytes >= int(pool_information["stats"]["free_bytes"]):
+    pool_total_free_bytes = int(pool_information["stats"]["free_bytes"])
    if size_bytes >= pool_total_free_bytes:
        return (
            False,
            f"ERROR: Requested volume size '{format_bytes_tohuman(size_bytes)}' is greater than the available free space in the pool ('{format_bytes_tohuman(pool_information['stats']['free_bytes'])}')",
        )
    # Check if we're greater than 80% utilization after the create; error if so unless we have the force flag
    pool_total_bytes = (
        int(pool_information["stats"]["used_bytes"]) + pool_total_free_bytes
    )
    pool_safe_total_bytes = int(pool_total_bytes * 0.80)
    pool_safe_free_bytes = pool_safe_total_bytes - int(
        pool_information["stats"]["used_bytes"]
    )
    if size_bytes >= pool_safe_free_bytes and not force_flag:
        return (
            False,
            f"ERROR: Requested volume size '{format_bytes_tohuman(size_bytes)}' is greater than the safe free space in the pool ('{format_bytes_tohuman(pool_safe_free_bytes)}' for 80% full); retry with force to ignore this error",
        )
    # 2. Resize the volume
    retcode, stdout, stderr = common.run_os_command(
        "rbd resize --size {} {}/{}".format(
@ -698,20 +773,8 @@ def resize_volume(zkhandler, pool, name, size):
        except Exception:
            pass
-    # 4. Get volume stats
+    # 4. Scan the volume stats
-    retcode, stdout, stderr = common.run_os_command(
+    scan_volume(zkhandler, pool, name)
        "rbd info --format json {}/{}".format(pool, name)
    )
    volstats = stdout
    # 5. Update the volume in Zookeeper
    zkhandler.write(
        [
            (("volume", f"{pool}/{name}"), ""),
            (("volume.stats", f"{pool}/{name}"), volstats),
            (("snapshot", f"{pool}/{name}"), ""),
        ]
    )
    return True, 'Resized RBD volume "{}" to size "{}" in pool "{}".'.format(
        name, format_bytes_tohuman(size_bytes), pool
@ -744,18 +807,8 @@ def rename_volume(zkhandler, pool, name, new_name):
        ]
    )
-    # 3. Get volume stats
+    # 3. Scan the volume stats
-    retcode, stdout, stderr = common.run_os_command(
+    scan_volume(zkhandler, pool, new_name)
        "rbd info --format json {}/{}".format(pool, new_name)
    )
    volstats = stdout
    # 4. Update the volume stats in Zookeeper
    zkhandler.write(
        [
            (("volume.stats", f"{pool}/{new_name}"), volstats),
        ]
    )
    return True, 'Renamed RBD volume "{}" to "{}" in pool "{}".'.format(
        name, new_name, pool
@ -768,10 +821,22 @@ def remove_volume(zkhandler, pool, name):
            name, pool
        )
-    # 1. Remove volume snapshots
+    # 1a. Remove PVC-managed volume snapshots
    for snapshot in zkhandler.children(("snapshot", f"{pool}/{name}")):
        remove_snapshot(zkhandler, pool, name, snapshot)
    # 1b. Purge any remaining volume snapshots
    retcode, stdout, stderr = common.run_os_command(
        "rbd snap purge {}/{}".format(pool, name)
    )
    if retcode:
        return (
            False,
            'ERROR: Failed to purge snapshots from RBD volume "{}" in pool "{}": {}'.format(
                name, pool, stderr
            ),
        )
    # 2. Remove the volume
    retcode, stdout, stderr = common.run_os_command("rbd rm {}/{}".format(pool, name))
    if retcode:
@ -940,23 +1005,27 @@ def add_snapshot(zkhandler, pool, volume, name, zk_only=False):
                ),
            )
-    # 2. Add the snapshot to Zookeeper
+    # 2. Get snapshot stats
    retcode, stdout, stderr = common.run_os_command(
        "rbd info --format json {}/{}@{}".format(pool, volume, name)
    )
    snapstats = stdout
    # 3. Add the snapshot to Zookeeper
    zkhandler.write(
        [
            (("snapshot", f"{pool}/{volume}/{name}"), ""),
-            (("snapshot.stats", f"{pool}/{volume}/{name}"), "{}"),
+            (("snapshot.stats", f"{pool}/{volume}/{name}"), snapstats),
        ]
    )
-    # 3. Update the count of snapshots on this volume
+    # 4. Update the count of snapshots on this volume
    volume_stats_raw = zkhandler.read(("volume.stats", f"{pool}/{volume}"))
    volume_stats = dict(json.loads(volume_stats_raw))
    # Format the size to something nicer
    volume_stats["snapshot_count"] = volume_stats["snapshot_count"] + 1
    volume_stats_raw = json.dumps(volume_stats)
    zkhandler.write(
        [
-            (("volume.stats", f"{pool}/{volume}"), volume_stats_raw),
+            (("volume.stats", f"{pool}/{volume}"), json.dumps(volume_stats)),
        ]
    )
@ -1010,6 +1079,36 @@ def rename_snapshot(zkhandler, pool, volume, name, new_name):
    )
 def rollback_snapshot(zkhandler, pool, volume, name):
    if not verifyVolume(zkhandler, pool, volume):
        return False, 'ERROR: No volume with name "{}" is present in pool "{}".'.format(
            volume, pool
        )
    if not verifySnapshot(zkhandler, pool, volume, name):
        return (
            False,
            'ERROR: No snapshot with name "{}" is present for volume "{}" in pool "{}".'.format(
                name, volume, pool
            ),
        )
    # 1. Roll back the snapshot
    retcode, stdout, stderr = common.run_os_command(
        "rbd snap rollback {}/{}@{}".format(pool, volume, name)
    )
    if retcode:
        return (
            False,
            'ERROR: Failed to roll back RBD volume "{}" in pool "{}" to snapshot "{}": {}'.format(
                volume, pool, name, stderr
            ),
        )
    return True, 'Rolled back RBD volume "{}" in pool "{}" to snapshot "{}".'.format(
        volume, pool, name
    )
 def remove_snapshot(zkhandler, pool, volume, name):
    if not verifyVolume(zkhandler, pool, volume):
        return False, 'ERROR: No volume with name "{}" is present in pool "{}".'.format(
@ -1051,20 +1150,9 @@ def remove_snapshot(zkhandler, pool, volume, name):
    )
-def get_list_snapshot(zkhandler, pool, volume, limit=None, is_fuzzy=True):
+def get_list_snapshot(zkhandler, target_pool, target_volume, limit=None, is_fuzzy=True):
    snapshot_list = []
-    if pool and not verifyPool(zkhandler, pool):
+    full_snapshot_list = getCephSnapshots(zkhandler, target_pool, target_volume)
        return False, 'ERROR: No pool with name "{}" is present in the cluster.'.format(
            pool
        )
    if volume and not verifyPool(zkhandler, volume):
        return (
            False,
            'ERROR: No volume with name "{}" is present in the cluster.'.format(volume),
        )
    full_snapshot_list = getCephSnapshots(zkhandler, pool, volume)
    if is_fuzzy and limit:
        # Implicitly assume fuzzy limits
@ -1076,6 +1164,18 @@ def get_list_snapshot(zkhandler, pool, volume, limit=None, is_fuzzy=True):
    for snapshot in full_snapshot_list:
        volume, snapshot_name = snapshot.split("@")
        pool_name, volume_name = volume.split("/")
        if target_pool and pool_name != target_pool:
            continue
        if target_volume and volume_name != target_volume:
            continue
        try:
            snapshot_stats = json.loads(
                zkhandler.read(
                    ("snapshot.stats", f"{pool_name}/{volume_name}/{snapshot_name}")
                )
            )
        except Exception:
            snapshot_stats = []
        if limit:
            try:
                if re.fullmatch(limit, snapshot_name):
@ -1084,13 +1184,19 @@ def get_list_snapshot(zkhandler, pool, volume, limit=None, is_fuzzy=True):
                            "pool": pool_name,
                            "volume": volume_name,
                            "snapshot": snapshot_name,
                            "stats": snapshot_stats,
                        }
                    )
            except Exception as e:
                return False, "Regex Error: {}".format(e)
        else:
            snapshot_list.append(
-                {"pool": pool_name, "volume": volume_name, "snapshot": snapshot_name}
+                {
                    "pool": pool_name,
                    "volume": volume_name,
                    "snapshot": snapshot_name,
                    "stats": snapshot_stats,
                }
            )
    return True, sorted(snapshot_list, key=lambda x: str(x["snapshot"]))
@ -1125,16 +1231,16 @@ def osd_worker_add_osd(
    current_stage = 0
    total_stages = 5
    if split_count is None:
-        _split_count = 1
+        split_count = 1
    else:
-        _split_count = split_count
+        split_count = int(split_count)
-    total_stages = total_stages + 3 * int(_split_count)
+    total_stages = total_stages + 3 * int(split_count)
    if ext_db_ratio is not None or ext_db_size is not None:
-        total_stages = total_stages + 3 * int(_split_count) + 1
+        total_stages = total_stages + 3 * int(split_count) + 1
    start(
        celery,
-        f"Adding {_split_count} new OSD(s) on device {device} with weight {weight}",
+        f"Adding {split_count} new OSD(s) on device {device} with weight {weight}",
        current=current_stage,
        total=total_stages,
    )
@ -1175,7 +1281,7 @@ def osd_worker_add_osd(
    else:
        ext_db_flag = False
-    if split_count is not None:
+    if split_count > 1:
        split_flag = f"--osds-per-device {split_count}"
        is_split = True
        log_info(
--- a/daemon-common/cluster.py
+++ b/daemon-common/cluster.py
@ -262,6 +262,22 @@ def getClusterInformation(zkhandler):
    # Get cluster maintenance state
    maintenance_state = zkhandler.read("base.config.maintenance")
    # Prepare cluster total values
    cluster_total_node_memory = 0
    cluster_total_used_memory = 0
    cluster_total_free_memory = 0
    cluster_total_allocated_memory = 0
    cluster_total_provisioned_memory = 0
    cluster_total_average_memory_utilization = 0
    cluster_total_cpu_cores = 0
    cluster_total_cpu_load = 0
    cluster_total_average_cpu_utilization = 0
    cluster_total_allocated_cores = 0
    cluster_total_osd_space = 0
    cluster_total_used_space = 0
    cluster_total_free_space = 0
    cluster_total_average_osd_utilization = 0
    # Get primary node
    maintenance_state, primary_node = zkhandler.read_many(
        [
@ -276,19 +292,36 @@ def getClusterInformation(zkhandler):
    # Get the list of Nodes
    node_list = zkhandler.children("base.node")
    node_count = len(node_list)
-    # Get the daemon and domain states of all Nodes
+    # Get the information of all Nodes
    node_state_reads = list()
    node_memory_reads = list()
    node_cpu_reads = list()
    for node in node_list:
        node_state_reads += [
            ("node.state.daemon", node),
            ("node.state.domain", node),
        ]
        node_memory_reads += [
            ("node.memory.total", node),
            ("node.memory.used", node),
            ("node.memory.free", node),
            ("node.memory.allocated", node),
            ("node.memory.provisioned", node),
        ]
        node_cpu_reads += [
            ("node.data.static", node),
            ("node.vcpu.allocated", node),
            ("node.cpu.load", node),
        ]
    all_node_states = zkhandler.read_many(node_state_reads)
    all_node_memory = zkhandler.read_many(node_memory_reads)
    all_node_cpu = zkhandler.read_many(node_cpu_reads)
    # Parse out the Node states
    node_data = list()
    formatted_node_states = {"total": node_count}
    for nidx, node in enumerate(node_list):
-        # Split the large list of return values by the IDX of this node
+        # Split the large list of return values by the IDX of this node (states)
        # Each node result is 2 fields long
        pos_start = nidx * 2
        pos_end = nidx * 2 + 2
@ -308,6 +341,46 @@ def getClusterInformation(zkhandler):
            else:
                formatted_node_states[node_state] = 1
        # Split the large list of return values by the IDX of this node (memory)
        # Each node result is 5 fields long
        pos_start = nidx * 5
        pos_end = nidx * 5 + 5
        (
            node_memory_total,
            node_memory_used,
            node_memory_free,
            node_memory_allocated,
            node_memory_provisioned,
        ) = tuple(all_node_memory[pos_start:pos_end])
        cluster_total_node_memory += int(node_memory_total)
        cluster_total_used_memory += int(node_memory_used)
        cluster_total_free_memory += int(node_memory_free)
        cluster_total_allocated_memory += int(node_memory_allocated)
        cluster_total_provisioned_memory += int(node_memory_provisioned)
        # Split the large list of return values by the IDX of this node (cpu)
        # Each nod result is 3 fields long
        pos_start = nidx * 3
        pos_end = nidx * 3 + 3
        node_static_data, node_vcpu_allocated, node_cpu_load = tuple(
            all_node_cpu[pos_start:pos_end]
        )
        cluster_total_cpu_cores += int(node_static_data.split()[0])
        cluster_total_cpu_load += round(float(node_cpu_load), 2)
        cluster_total_allocated_cores += int(node_vcpu_allocated)
    cluster_total_average_memory_utilization = (
        (round((cluster_total_used_memory / cluster_total_node_memory) * 100, 2))
        if cluster_total_node_memory > 0
        else 0.00
    )
    cluster_total_average_cpu_utilization = (
        (round((cluster_total_cpu_load / cluster_total_cpu_cores) * 100, 2))
        if cluster_total_cpu_cores > 0
        else 0.00
    )
    # Get the list of VMs
    vm_list = zkhandler.children("base.domain")
    vm_count = len(vm_list)
@ -380,6 +453,18 @@ def getClusterInformation(zkhandler):
            else:
                formatted_osd_states[osd_state] = 1
        # Add the OSD utilization
        cluster_total_osd_space += int(osd_stats["kb"])
        cluster_total_used_space += int(osd_stats["kb_used"])
        cluster_total_free_space += int(osd_stats["kb_avail"])
        cluster_total_average_osd_utilization += float(osd_stats["utilization"])
    cluster_total_average_osd_utilization = (
        (round(cluster_total_average_osd_utilization / len(ceph_osd_list), 2))
        if ceph_osd_list
        else 0.00
    )
    # Get the list of Networks
    network_list = zkhandler.children("base.network")
    network_count = len(network_list)
@ -424,6 +509,28 @@ def getClusterInformation(zkhandler):
        "pools": ceph_pool_count,
        "volumes": ceph_volume_count,
        "snapshots": ceph_snapshot_count,
        "resources": {
            "memory": {
                "total": cluster_total_node_memory,
                "free": cluster_total_free_memory,
                "used": cluster_total_used_memory,
                "allocated": cluster_total_allocated_memory,
                "provisioned": cluster_total_provisioned_memory,
                "utilization": cluster_total_average_memory_utilization,
            },
            "cpu": {
                "total": cluster_total_cpu_cores,
                "load": cluster_total_cpu_load,
                "allocated": cluster_total_allocated_cores,
                "utilization": cluster_total_average_cpu_utilization,
            },
            "disk": {
                "total": cluster_total_osd_space,
                "used": cluster_total_used_space,
                "free": cluster_total_free_space,
                "utilization": cluster_total_average_osd_utilization,
            },
        },
        "detail": {
            "node": node_data,
            "vm": vm_data,
@ -1051,6 +1158,9 @@ def get_resource_metrics(zkhandler):
            "restart": 6,
            "stop": 7,
            "fail": 8,
            "import": 9,
            "restore": 10,
            "mirror": 99,
        }
        state = vm["state"]
        output_lines.append(
@ -1201,7 +1311,7 @@ def get_resource_metrics(zkhandler):
        try:
            user_time = vm["vcpu_stats"]["user_time"] / 1000000
        except Exception:
-            cpu_time = 0
+            user_time = 0
        output_lines.append(
            f"pvc_vm_vcpus_user_time{{vm=\"{vm['name']}\"}} {user_time}"
        )
@ -1230,7 +1340,7 @@ def get_resource_metrics(zkhandler):
    )
    output_lines.append("# TYPE pvc_vm_memory_stats_actual gauge")
    for vm in vm_data:
-        actual_memory = vm["memory_stats"]["actual"]
+        actual_memory = vm["memory_stats"].get("actual", 0)
        output_lines.append(
            f"pvc_vm_memory_stats_actual{{vm=\"{vm['name']}\"}} {actual_memory}"
        )
@ -1238,7 +1348,7 @@ def get_resource_metrics(zkhandler):
    output_lines.append("# HELP pvc_vm_memory_stats_rss PVC VM RSS memory KB")
    output_lines.append("# TYPE pvc_vm_memory_stats_rss gauge")
    for vm in vm_data:
-        rss_memory = vm["memory_stats"]["rss"]
+        rss_memory = vm["memory_stats"].get("rss", 0)
        output_lines.append(
            f"pvc_vm_memory_stats_rss{{vm=\"{vm['name']}\"}} {rss_memory}"
        )
--- a/daemon-common/common.py
+++ b/daemon-common/common.py
@ -26,8 +26,10 @@ import subprocess
 import signal
 from json import loads
 from re import match as re_match
 from re import search as re_search
 from re import split as re_split
 from re import sub as re_sub
 from difflib import unified_diff
 from distutils.util import strtobool
 from threading import Thread
 from shlex import split as shlex_split
@ -81,6 +83,9 @@ vm_state_combinations = [
    "migrate",
    "unmigrate",
    "provision",
    "import",
    "restore",
    "mirror",
 ]
 ceph_osd_state_combinations = [
    "up,in",
@ -427,6 +432,96 @@ def getDomainTags(zkhandler, dom_uuid):
    return tags
 #
 # Get a list of domain snapshots
 #
 def getDomainSnapshots(zkhandler, dom_uuid):
    """
    Get a list of snapshots for domain dom_uuid
    The UUID must be validated before calling this function!
    """
    snapshots = list()
    all_snapshots = zkhandler.children(("domain.snapshots", dom_uuid))
    current_timestamp = time.time()
    current_dom_xml = zkhandler.read(("domain.xml", dom_uuid))
    snapshots = list()
    for snapshot in all_snapshots:
        (
            snap_name,
            snap_timestamp,
            _snap_rbd_snapshots,
            snap_dom_xml,
        ) = zkhandler.read_many(
            [
                ("domain.snapshots", dom_uuid, "domain_snapshot.name", snapshot),
                ("domain.snapshots", dom_uuid, "domain_snapshot.timestamp", snapshot),
                (
                    "domain.snapshots",
                    dom_uuid,
                    "domain_snapshot.rbd_snapshots",
                    snapshot,
                ),
                ("domain.snapshots", dom_uuid, "domain_snapshot.xml", snapshot),
            ]
        )
        snap_rbd_snapshots = _snap_rbd_snapshots.split(",")
        snap_dom_xml_diff = list(
            unified_diff(
                current_dom_xml.split("\n"),
                snap_dom_xml.split("\n"),
                fromfile="current",
                tofile="snapshot",
                fromfiledate="",
                tofiledate="",
                n=1,
                lineterm="",
            )
        )
        _snap_timestamp = float(snap_timestamp)
        snap_age_secs = int(current_timestamp) - int(_snap_timestamp)
        snap_age = f"{snap_age_secs} seconds"
        snap_age_minutes = int(snap_age_secs / 60)
        if snap_age_minutes > 0:
            if snap_age_minutes > 1:
                s = "s"
            else:
                s = ""
            snap_age = f"{snap_age_minutes} minute{s}"
        snap_age_hours = int(snap_age_secs / 3600)
        if snap_age_hours > 0:
            if snap_age_hours > 1:
                s = "s"
            else:
                s = ""
            snap_age = f"{snap_age_hours} hour{s}"
        snap_age_days = int(snap_age_secs / 86400)
        if snap_age_days > 0:
            if snap_age_days > 1:
                s = "s"
            else:
                s = ""
            snap_age = f"{snap_age_days} day{s}"
        snapshots.append(
            {
                "name": snap_name,
                "timestamp": snap_timestamp,
                "age": snap_age,
                "xml_diff_lines": snap_dom_xml_diff,
                "rbd_snapshots": snap_rbd_snapshots,
            }
        )
    return sorted(snapshots, key=lambda s: s["timestamp"], reverse=True)
 #
 # Get a set of domain metadata
 #
@ -441,12 +536,14 @@ def getDomainMetadata(zkhandler, dom_uuid):
        domain_node_selector,
        domain_node_autostart,
        domain_migration_method,
        domain_migration_max_downtime,
    ) = zkhandler.read_many(
        [
            ("domain.meta.node_limit", dom_uuid),
            ("domain.meta.node_selector", dom_uuid),
            ("domain.meta.autostart", dom_uuid),
            ("domain.meta.migrate_method", dom_uuid),
            ("domain.meta.migrate_max_downtime", dom_uuid),
        ]
    )
@ -464,11 +561,15 @@ def getDomainMetadata(zkhandler, dom_uuid):
    if not domain_migration_method or domain_migration_method == "none":
        domain_migration_method = None
    if not domain_migration_max_downtime or domain_migration_max_downtime == "none":
        domain_migration_max_downtime = 300
    return (
        domain_node_limit,
        domain_node_selector,
        domain_node_autostart,
        domain_migration_method,
        domain_migration_max_downtime,
    )
@ -505,9 +606,11 @@ def getInformationFromXML(zkhandler, uuid):
        domain_node_selector,
        domain_node_autostart,
        domain_migration_method,
        domain_migration_max_downtime,
    ) = getDomainMetadata(zkhandler, uuid)
    domain_tags = getDomainTags(zkhandler, uuid)
    domain_snapshots = getDomainSnapshots(zkhandler, uuid)
    if domain_vnc:
        domain_vnc_listen, domain_vnc_port = domain_vnc.split(":")
@ -565,7 +668,9 @@ def getInformationFromXML(zkhandler, uuid):
        "node_selector": domain_node_selector,
        "node_autostart": bool(strtobool(domain_node_autostart)),
        "migration_method": domain_migration_method,
        "migration_max_downtime": int(domain_migration_max_downtime),
        "tags": domain_tags,
        "snapshots": domain_snapshots,
        "description": domain_description,
        "profile": domain_profile,
        "memory": int(domain_memory),
@ -970,7 +1075,7 @@ def sortInterfaceNames(interface_names):
 #
 # Parse a "detect" device into a real block device name
 #
-def get_detect_device(detect_string):
+def get_detect_device_lsscsi(detect_string):
    """
    Parses a "detect:" string into a normalized block device path using lsscsi.
@ -1037,3 +1142,96 @@ def get_detect_device(detect_string):
            break
    return blockdev
 def get_detect_device_nvme(detect_string):
    """
    Parses a "detect:" string into a normalized block device path using nvme.
    A detect string is formatted "detect:<NAME>:<SIZE>:<ID>", where
    NAME is some unique identifier in lsscsi, SIZE is a human-readable
    size value to within +/- 3% of the real size of the device, and
    ID is the Nth (0-indexed) matching entry of that NAME and SIZE.
    """
    unit_map = {
        "kB": 1000,
        "MB": 1000 * 1000,
        "GB": 1000 * 1000 * 1000,
        "TB": 1000 * 1000 * 1000 * 1000,
        "PB": 1000 * 1000 * 1000 * 1000 * 1000,
        "EB": 1000 * 1000 * 1000 * 1000 * 1000 * 1000,
    }
    _, name, _size, idd = detect_string.split(":")
    if _ != "detect":
        return None
    size_re = re_search(r"([\d.]+)([kKMGTP]B)", _size)
    size_val = float(size_re.group(1))
    size_unit = size_re.group(2)
    size_bytes = int(size_val * unit_map[size_unit])
    retcode, stdout, stderr = run_os_command("nvme list --output-format json")
    if retcode:
        print(f"Failed to run nvme: {stderr}")
        return None
    # Parse the output with json
    nvme_data = loads(stdout).get("Devices", list())
    # Handle size determination (+/- 3%)
    size = None
    nvme_sizes = set()
    for entry in nvme_data:
        nvme_sizes.add(entry["PhysicalSize"])
    for l_size in nvme_sizes:
        plusthreepct = size_bytes * 1.03
        minusthreepct = size_bytes * 0.97
        if l_size > minusthreepct and l_size < plusthreepct:
            size = l_size
            break
    if size is None:
        return None
    blockdev = None
    matches = list()
    for entry in nvme_data:
        # Skip if name is not contained in the line (case-insensitive)
        if name.lower() not in entry["ModelNumber"].lower():
            continue
        # Skip if the size does not match
        if size != entry["PhysicalSize"]:
            continue
        # Get our blockdev and append to the list
        matches.append(entry["DevicePath"])
    blockdev = None
    # Find the blockdev at index {idd}
    for idx, _blockdev in enumerate(matches):
        if int(idx) == int(idd):
            blockdev = _blockdev
            break
    return blockdev
 def get_detect_device(detect_string):
    """
    Parses a "detect:" string into a normalized block device path.
    First tries to parse using "lsscsi" (get_detect_device_lsscsi). If this returns an invalid
    block device name, then try to parse using "nvme" (get_detect_device_nvme). This works around
    issues with more recent devices (e.g. the Dell R6615 series) not properly reporting block
    device paths for NVMe devices with "lsscsi".
    """
    device = get_detect_device_lsscsi(detect_string)
    if device is None or not re_match(r"^/dev", device):
        device = get_detect_device_nvme(detect_string)
    if device is not None and re_match(r"^/dev", device):
        return device
    else:
        return None
--- a/daemon-common/config.py
+++ b/daemon-common/config.py
@ -244,9 +244,9 @@ def get_parsed_configuration(config_file):
                    ]
                ][0]
-            config_cluster_networks_specific[
+            config_cluster_networks_specific[f"{network_type}_dev_ip"] = (
-                f"{network_type}_dev_ip"
+                f"{list(network.hosts())[address_id]}/{network.prefixlen}"
-            ] = f"{list(network.hosts())[address_id]}/{network.prefixlen}"
+            )
            config = {**config, **config_cluster_networks_specific}
@ -375,8 +375,11 @@ def get_parsed_configuration(config_file):
        config = {**config, **config_api_ssl}
        # Use coordinators as storage hosts if not explicitly specified
        # These are added as FQDNs in the storage domain
        if not config["storage_hosts"] or len(config["storage_hosts"]) < 1:
-            config["storage_hosts"] = config["coordinators"]
+            config["storage_hosts"] = []
            for host in config["coordinators"]:
                config["storage_hosts"].append(f"{host}.{config['storage_domain']}")
        # Set up our token list if specified
        if config["api_auth_source"] == "token":
@ -406,6 +409,78 @@ def get_configuration():
    return config
 def get_parsed_autobackup_configuration(config_file):
    """
    Load the configuration; this is the same main pvc.conf that the daemons read
    """
    print('Loading configuration from file "{}"'.format(config_file))
    with open(config_file, "r") as cfgfh:
        try:
            o_config = yaml.load(cfgfh, Loader=yaml.SafeLoader)
        except Exception as e:
            print(f"ERROR: Failed to parse configuration file: {e}")
            os._exit(1)
    config = dict()
    try:
        o_cluster = o_config["cluster"]
        config_cluster = {
            "cluster": o_cluster["name"],
            "autobackup_enabled": True,
        }
        config = {**config, **config_cluster}
        o_autobackup = o_config["autobackup"]
        if o_autobackup is None:
            config["autobackup_enabled"] = False
            return config
        config_autobackup = {
            "backup_root_path": o_autobackup["backup_root_path"],
            "backup_root_suffix": o_autobackup["backup_root_suffix"],
            "backup_tags": o_autobackup["backup_tags"],
            "backup_schedule": o_autobackup["backup_schedule"],
        }
        config = {**config, **config_autobackup}
        o_automount = o_autobackup["auto_mount"]
        config_automount = {
            "auto_mount_enabled": o_automount["enabled"],
        }
        config = {**config, **config_automount}
        if config["auto_mount_enabled"]:
            config["mount_cmds"] = list()
            for _mount_cmd in o_automount["mount_cmds"]:
                if "{backup_root_path}" in _mount_cmd:
                    _mount_cmd = _mount_cmd.format(
                        backup_root_path=config["backup_root_path"]
                    )
                config["mount_cmds"].append(_mount_cmd)
            config["unmount_cmds"] = list()
            for _unmount_cmd in o_automount["unmount_cmds"]:
                if "{backup_root_path}" in _unmount_cmd:
                    _unmount_cmd = _unmount_cmd.format(
                        backup_root_path=config["backup_root_path"]
                    )
                config["unmount_cmds"].append(_unmount_cmd)
    except Exception as e:
        raise MalformedConfigurationError(e)
    return config
 def get_autobackup_configuration():
    """
    Get the configuration.
    """
    pvc_config_file = get_configuration_path()
    config = get_parsed_autobackup_configuration(pvc_config_file)
    return config
 def validate_directories(config):
    if not os.path.exists(config["dynamic_directory"]):
        os.makedirs(config["dynamic_directory"])
--- a/daemon-common/migrations/versions/13.json
+++ b/daemon-common/migrations/versions/13.json
@ -0,0 +1 @@
 {"version": "13", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "logs": "/logs", "faults": "/faults", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.health": "/ceph/health", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "logs": {"node": "", "messages": "/messages"}, "faults": {"id": "", "last_time": "/last_time", "first_time": "/first_time", "ack_time": "/ack_time", "status": "/status", "delta": "/delta", "message": "/message"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "data.pvc_version": "/pvcversion", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword", "sriov": "/sriov", "sriov.pf": "/sriov/pf", "sriov.vf": "/sriov/vf", "monitoring.plugins": "/monitoring_plugins", "monitoring.data": "/monitoring_data", "monitoring.health": "/monitoring_health", "network.stats": "/network_stats"}, "monitoring_plugin": {"name": "", "last_run": "/last_run", "health_delta": "/health_delta", "message": "/message", "data": "/data", "runtime": "/runtime"}, "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"}, "sriov_vf": {"phy": "", "pf": "/pf", "mtu": "/mtu", "mac": "/mac", "phy_mac": "/phy_mac", "config": "/config", "config.vlan_id": "/config/vlan_id", "config.vlan_qos": "/config/vlan_qos", "config.tx_rate_min": "/config/tx_rate_min", "config.tx_rate_max": "/config/tx_rate_max", "config.spoof_check": "/config/spoof_check", "config.link_state": "/config/link_state", "config.trust": "/config/trust", "config.query_rss": "/config/query_rss", "pci": "/pci", "pci.domain": "/pci/domain", "pci.bus": "/pci/bus", "pci.slot": "/pci/slot", "pci.function": "/pci/function", "used": "/used", "used_by": "/used_by"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.migrate_max_downtime": "/migration_max_downtime", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "meta.tags": "/tags", "migrate.sync_lock": "/migrate_sync_lock"}, "tag": {"name": "", "type": "/type", "protected": "/protected"}, "network": {"vni": "", "type": "/nettype", "mtu": "/mtu", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "db_device": "/db_device", "fsid": "/fsid", "ofsid": "/fsid/osd", "cfsid": "/fsid/cluster", "lvm": "/lvm", "vg": "/lvm/vg", "lv": "/lvm/lv", "is_split": "/is_split", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "tier": "/tier", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}
--- a/daemon-common/migrations/versions/14.json
+++ b/daemon-common/migrations/versions/14.json
@ -0,0 +1 @@
 {"version": "14", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "logs": "/logs", "faults": "/faults", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.health": "/ceph/health", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "logs": {"node": "", "messages": "/messages"}, "faults": {"id": "", "last_time": "/last_time", "first_time": "/first_time", "ack_time": "/ack_time", "status": "/status", "delta": "/delta", "message": "/message"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "data.pvc_version": "/pvcversion", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword", "sriov": "/sriov", "sriov.pf": "/sriov/pf", "sriov.vf": "/sriov/vf", "monitoring.plugins": "/monitoring_plugins", "monitoring.data": "/monitoring_data", "monitoring.health": "/monitoring_health", "network.stats": "/network_stats"}, "monitoring_plugin": {"name": "", "last_run": "/last_run", "health_delta": "/health_delta", "message": "/message", "data": "/data", "runtime": "/runtime"}, "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"}, "sriov_vf": {"phy": "", "pf": "/pf", "mtu": "/mtu", "mac": "/mac", "phy_mac": "/phy_mac", "config": "/config", "config.vlan_id": "/config/vlan_id", "config.vlan_qos": "/config/vlan_qos", "config.tx_rate_min": "/config/tx_rate_min", "config.tx_rate_max": "/config/tx_rate_max", "config.spoof_check": "/config/spoof_check", "config.link_state": "/config/link_state", "config.trust": "/config/trust", "config.query_rss": "/config/query_rss", "pci": "/pci", "pci.domain": "/pci/domain", "pci.bus": "/pci/bus", "pci.slot": "/pci/slot", "pci.function": "/pci/function", "used": "/used", "used_by": "/used_by"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.migrate_max_downtime": "/migration_max_downtime", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "meta.tags": "/tags", "migrate.sync_lock": "/migrate_sync_lock", "snapshots": "/snapshots"}, "tag": {"name": "", "type": "/type", "protected": "/protected"}, "domain_snapshot": {"name": "", "timestamp": "/timestamp", "xml": "/xml", "rbd_snapshots": "/rbdsnaplist"}, "network": {"vni": "", "type": "/nettype", "mtu": "/mtu", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "db_device": "/db_device", "fsid": "/fsid", "ofsid": "/fsid/osd", "cfsid": "/fsid/cluster", "lvm": "/lvm", "vg": "/lvm/vg", "lv": "/lvm/lv", "is_split": "/is_split", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "tier": "/tier", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}
--- a/daemon-common/migrations/versions/15.json
+++ b/daemon-common/migrations/versions/15.json
@ -0,0 +1 @@
 {"version": "15", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.fence_lock": "/config/fence_lock", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "logs": "/logs", "faults": "/faults", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.health": "/ceph/health", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "logs": {"node": "", "messages": "/messages"}, "faults": {"id": "", "last_time": "/last_time", "first_time": "/first_time", "ack_time": "/ack_time", "status": "/status", "delta": "/delta", "message": "/message"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "data.pvc_version": "/pvcversion", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword", "sriov": "/sriov", "sriov.pf": "/sriov/pf", "sriov.vf": "/sriov/vf", "monitoring.plugins": "/monitoring_plugins", "monitoring.data": "/monitoring_data", "monitoring.health": "/monitoring_health", "network.stats": "/network_stats"}, "monitoring_plugin": {"name": "", "last_run": "/last_run", "health_delta": "/health_delta", "message": "/message", "data": "/data", "runtime": "/runtime"}, "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"}, "sriov_vf": {"phy": "", "pf": "/pf", "mtu": "/mtu", "mac": "/mac", "phy_mac": "/phy_mac", "config": "/config", "config.vlan_id": "/config/vlan_id", "config.vlan_qos": "/config/vlan_qos", "config.tx_rate_min": "/config/tx_rate_min", "config.tx_rate_max": "/config/tx_rate_max", "config.spoof_check": "/config/spoof_check", "config.link_state": "/config/link_state", "config.trust": "/config/trust", "config.query_rss": "/config/query_rss", "pci": "/pci", "pci.domain": "/pci/domain", "pci.bus": "/pci/bus", "pci.slot": "/pci/slot", "pci.function": "/pci/function", "used": "/used", "used_by": "/used_by"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.migrate_max_downtime": "/migration_max_downtime", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "meta.tags": "/tags", "migrate.sync_lock": "/migrate_sync_lock", "snapshots": "/snapshots"}, "tag": {"name": "", "type": "/type", "protected": "/protected"}, "domain_snapshot": {"name": "", "timestamp": "/timestamp", "xml": "/xml", "rbd_snapshots": "/rbdsnaplist"}, "network": {"vni": "", "type": "/nettype", "mtu": "/mtu", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "db_device": "/db_device", "fsid": "/fsid", "ofsid": "/fsid/osd", "cfsid": "/fsid/cluster", "lvm": "/lvm", "vg": "/lvm/vg", "lv": "/lvm/lv", "is_split": "/is_split", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "tier": "/tier", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}
--- a/daemon-common/node.py
+++ b/daemon-common/node.py
@ -69,6 +69,8 @@ def getNodeHealthDetails(zkhandler, node_name, node_health_plugins):
            plugin_message,
            plugin_data,
        ) = tuple(all_plugin_data[pos_start:pos_end])
        if plugin_data is None:
            continue
        plugin_output = {
            "name": plugin,
            "last_run": int(plugin_last_run) if plugin_last_run is not None else None,
@ -156,9 +158,9 @@ def getNodeInformation(zkhandler, node_name):
        zkhandler, node_name, node_health_plugins
    )
-    if _node_network_stats is not None:
+    try:
        node_network_stats = json.loads(_node_network_stats)
-    else:
+    except Exception:
        node_network_stats = dict()
    # Construct a data structure to represent the data
--- a/daemon-common/vm.py
+++ b/daemon-common/vm.py
--- a/daemon-common/vmbuilder.py
+++ b/daemon-common/vmbuilder.py
@ -258,6 +258,13 @@ def worker_create_vm(
        args = (vm_profile,)
        db_cur.execute(query, args)
        profile_data = db_cur.fetchone()
        if profile_data is None:
            fail(
                celery,
                f'Provisioner profile "{vm_profile}" is not present on the cluster',
                exception=ClusterError,
            )
        if profile_data.get("arguments"):
            vm_data["script_arguments"] = profile_data.get("arguments").split("|")
        else:
@ -329,11 +336,7 @@ def worker_create_vm(
    retcode, stdout, stderr = pvc_common.run_os_command("uname -m")
    vm_data["system_architecture"] = stdout.strip()
-    monitor_list = list()
+    vm_data["ceph_monitor_list"] = config["storage_hosts"]
    monitor_names = config["storage_hosts"]
    for monitor in monitor_names:
        monitor_list.append("{}.{}".format(monitor, config["storage_domain"]))
    vm_data["ceph_monitor_list"] = monitor_list
    vm_data["ceph_monitor_port"] = config["ceph_monitor_port"]
    vm_data["ceph_monitor_secret"] = config["ceph_secret_uuid"]
@ -744,6 +747,7 @@ def worker_create_vm(
        node_selector = vm_data["system_details"]["node_selector"]
        node_autostart = vm_data["system_details"]["node_autostart"]
        migration_method = vm_data["system_details"]["migration_method"]
        migration_max_downtime = vm_data["system_details"]["migration_max_downtime"]
        with open_zk(config) as zkhandler:
            retcode, retmsg = pvc_vm.define_vm(
                zkhandler,
@ -753,6 +757,7 @@ def worker_create_vm(
                node_selector,
                node_autostart,
                migration_method,
                migration_max_downtime,
                vm_profile,
                initial_state="provision",
            )
--- a/daemon-common/zkhandler.py
+++ b/daemon-common/zkhandler.py
@ -30,6 +30,10 @@ from kazoo.client import KazooClient, KazooState
 from kazoo.exceptions import NoNodeError
 DEFAULT_ROOT_PATH = "/usr/share/pvc"
 SCHEMA_PATH = "daemon_lib/migrations/versions"
 #
 # Function decorators
 #
@ -57,8 +61,9 @@ class ZKConnection(object):
                schema_version = 0
            zkhandler.schema.load(schema_version, quiet=True)
            try:
                ret = function(zkhandler, *args, **kwargs)
-
+            finally:
                zkhandler.disconnect()
                del zkhandler
@ -572,7 +577,7 @@ class ZKHandler(object):
 #
 class ZKSchema(object):
    # Current version
-    _version = 12
+    _version = 15
    # Root for doing nested keys
    _schema_root = ""
@ -588,6 +593,7 @@ class ZKSchema(object):
            "schema.version": f"{_schema_root}/schema/version",
            "config": f"{_schema_root}/config",
            "config.maintenance": f"{_schema_root}/config/maintenance",
            "config.fence_lock": f"{_schema_root}/config/fence_lock",
            "config.primary_node": f"{_schema_root}/config/primary_node",
            "config.primary_node.sync_lock": f"{_schema_root}/config/primary_node/sync_lock",
            "config.upstream_ip": f"{_schema_root}/config/upstream_ip",
@ -707,17 +713,26 @@ class ZKSchema(object):
            "console.vnc": "/vnc",
            "meta.autostart": "/node_autostart",
            "meta.migrate_method": "/migration_method",
            "meta.migrate_max_downtime": "/migration_max_downtime",
            "meta.node_selector": "/node_selector",
            "meta.node_limit": "/node_limit",
            "meta.tags": "/tags",
            "migrate.sync_lock": "/migrate_sync_lock",
            "snapshots": "/snapshots",
        },
        # The schema of an individual domain tag entry (/domains/{domain}/tags/{tag})
        "tag": {
-            "name": "",
+            "name": "",  # The root key
            "type": "/type",
            "protected": "/protected",
-        },  # The root key
+        },
        # The schema of an individual domain snapshot entry (/domains/{domain}/snapshots/{snapshot})
        "domain_snapshot": {
            "name": "",  # The root key
            "timestamp": "/timestamp",
            "xml": "/xml",
            "rbd_snapshots": "/rbdsnaplist",
        },
        # The schema of an individual network entry (/networks/{vni})
        "network": {
            "vni": "",  # The root key
@ -818,8 +833,8 @@ class ZKSchema(object):
    def schema(self, schema):
        self._schema = schema
-    def __init__(self):
+    def __init__(self, root_path=DEFAULT_ROOT_PATH):
-        pass
+        self.schema_path = f"{root_path}/{SCHEMA_PATH}"
    def __repr__(self):
        return f"ZKSchema({self.version})"
@ -859,7 +874,7 @@ class ZKSchema(object):
        if not quiet:
            print(f"Loading schema version {version}")
-        with open(f"daemon_lib/migrations/versions/{version}.json", "r") as sfh:
+        with open(f"{self.schema_path}/{version}.json", "r") as sfh:
            self.schema = json.load(sfh)
            self.version = self.schema.get("version")
@ -1026,6 +1041,8 @@ class ZKSchema(object):
                            default_data = "False"
                        elif elem == "pool" and ikey == "tier":
                            default_data = "default"
                        elif elem == "domain" and ikey == "meta.migrate_max_downtime":
                            default_data = "300"
                        else:
                            default_data = ""
                        zkhandler.zk_conn.create(
@ -1119,7 +1136,7 @@ class ZKSchema(object):
    # Migrate from older to newer schema
    def migrate(self, zkhandler, new_version):
        # Determine the versions in between
-        versions = ZKSchema.find_all(start=self.version, end=new_version)
+        versions = self.find_all(start=self.version, end=new_version)
        if versions is None:
            return
@ -1135,7 +1152,7 @@ class ZKSchema(object):
    # Rollback from newer to older schema
    def rollback(self, zkhandler, old_version):
        # Determine the versions in between
-        versions = ZKSchema.find_all(start=old_version - 1, end=self.version - 1)
+        versions = self.find_all(start=old_version - 1, end=self.version - 1)
        if versions is None:
            return
@ -1150,6 +1167,12 @@ class ZKSchema(object):
            # Apply those changes
            self.run_migrate(zkhandler, changes)
    # Write the latest schema to a file
    def write(self):
        schema_file = f"{self.schema_path}/{self._version}.json"
        with open(schema_file, "w") as sfh:
            json.dump(self._schema, sfh)
    @classmethod
    def key_diff(cls, schema_a, schema_b):
        # schema_a = current
@ -1195,26 +1218,10 @@ class ZKSchema(object):
        return {"add": diff_add, "remove": diff_remove, "rename": diff_rename}
    # Load in the schemal of the current cluster
    @classmethod
    def load_current(cls, zkhandler):
        new_instance = cls()
        version = new_instance.get_version(zkhandler)
        new_instance.load(version)
        return new_instance
    # Write the latest schema to a file
    @classmethod
    def write(cls):
        schema_file = "daemon_lib/migrations/versions/{}.json".format(cls._version)
        with open(schema_file, "w") as sfh:
            json.dump(cls._schema, sfh)
    # Static methods for reading information from the files
-    @staticmethod
+    def find_all(self, start=0, end=None):
    def find_all(start=0, end=None):
        versions = list()
-        for version in os.listdir("daemon_lib/migrations/versions"):
+        for version in os.listdir(self.schema_path):
            sequence_id = int(version.split(".")[0])
            if end is None:
                if sequence_id > start:
@ -1227,11 +1234,18 @@ class ZKSchema(object):
        else:
            return None
-    @staticmethod
+    def find_latest(self):
    def find_latest():
        latest_version = 0
-        for version in os.listdir("daemon_lib/migrations/versions"):
+        for version in os.listdir(self.schema_path):
            sequence_id = int(version.split(".")[0])
            if sequence_id > latest_version:
                latest_version = sequence_id
        return latest_version
    # Load in the schema of the current cluster
    @classmethod
    def load_current(cls, zkhandler):
        new_instance = cls()
        version = new_instance.get_version(zkhandler)
        new_instance.load(version)
        return new_instance
--- a/debian/changelog
+++ b/debian/changelog
@ -1,3 +1,154 @@
 pvc (0.9.103-0) unstable; urgency=high
  * [Provisioner] Fixes a bug with the change in `storage_hosts` to FQDNs affecting the VM Builder
  * [Monitoring] Fixes the Munin plugin to work properly with sudo
 -- Joshua M. Boniface <joshua@boniface.me>  Fri, 01 Nov 2024 17:19:44 -0400
 pvc (0.9.102-0) unstable; urgency=high
  * [API Daemon] Ensures that received config snapshots update storage hosts in addition to secret UUIDs
  * [CLI Client] Fixes several bugs around local connection handling and connection listings
 -- Joshua M. Boniface <joshua@boniface.me>  Thu, 17 Oct 2024 10:48:31 -0400
 pvc (0.9.101-0) unstable; urgency=high
  **New Feature**: Adds VM snapshot sending (`vm snapshot send`), VM mirroring (`vm mirror create`), and (offline) mirror promotion (`vm mirror promote`). Permits transferring VM snapshots to remote clusters, individually or repeatedly, and promoting them to active status, for disaster recovery and migration between clusters.
  **Breaking Change**: Migrates the API daemon into Gunicorn when in production mode. Permits more scalable and performant operation of the API. **Requires additional dependency packages on all coordinator nodes** (`gunicorn`, `python3-gunicorn`, `python3-setuptools`); upgrade via `pvc-ansible` is strongly recommended.
  **Enhancement**: Provides whole cluster utilization stats in the cluster status data. Permits better observability into the overall resource utilization of the cluster.
  **Enhancement**: Adds a new storage benchmark format (v2) which includes additional resource utilization statistics. This allows for better evaluation of storage performance impact on the cluster as a whole. The updated format also permits arbitrary benchmark job names for easier parsing and tracking.
  * [API Daemon] Allows scanning of new volumes added manually via other commands
  * [API Daemon/CLI Client] Adds whole cluster utilization statistics to cluster status
  * [API Daemon] Moves production API execution into Gunicorn
  * [API Daemon] Adds a new storage benchmark format (v2) with additional resource tracking
  * [API Daemon] Adds support for named storage benchmark jobs
  * [API Daemon] Fixes a bug in OSD creation which would create `split` OSDs if `--osd-count` was set to 1
  * [API Daemon] Adds support for the `mirror` VM state used by snapshot mirrors
  * [CLI Client] Fixes several output display bugs in various commands and in Worker task outputs
  * [CLI Client] Improves and shrinks the status progress bar output to support longer messages
  * [API Daemon] Adds support for sending snapshots to remote clusters
  * [API Daemon] Adds support for updating and promoting snapshot mirrors to remote clusters
  * [Node Daemon] Improves timeouts during primary/secondary coordinator transitions to avoid deadlocks
  * [Node Daemon] Improves timeouts during keepalive updates to avoid deadlocks
  * [Node Daemon] Refactors fencing thread structure to ensure a single fencing task per cluster and sequential node fences to avoid potential anomalies (e.g. fencing 2 nodes simultaneously)
  * [Node Daemon] Fixes a bug in fencing if VM locks were already freed, leaving VMs in an invalid state
  * [Node Daemon] Increases the wait time during system startup to ensure Zookeeper has more time to synchronize
 -- Joshua M. Boniface <joshua@boniface.me>  Tue, 15 Oct 2024 11:39:11 -0400
 pvc (0.9.100-0) unstable; urgency=high
  * [API Daemon] Improves the handling of "detect:" disk strings on newer systems by leveraging the "nvme" command
  * [Client CLI] Update help text about "detect:" disk strings
  * [Meta] Updates deprecation warnings and updates builder to only add this version for Debian 12 (Bookworm)
 -- Joshua M. Boniface <joshua@boniface.me>  Fri, 30 Aug 2024 11:03:33 -0400
 pvc (0.9.99-0) unstable; urgency=high
  **Deprecation Warning**: `pvc vm backup` commands are now deprecated and will be removed in **0.9.100**. Use `pvc vm snapshot` commands instead.
  **Breaking Change**: The on-disk format of VM snapshot exports differs from backup exports, and the PVC autobackup system now leverages these. It is recommended to start fresh with a new tree of backups for `pvc autobackup` for maximum compatibility.
  **Breaking Change**: VM autobackups now run in `pvcworkerd` instead of the CLI client directly, allowing them to be triggerd from any node (or externally). It is important to apply the timer unit changes from the `pvc-ansible` role after upgrading to 0.9.99 to avoid duplicate runs.
  **Usage Note**: VM snapshots are displayed in the `pvc vm list` and `pvc vm info` outputs, not in a unique "list" endpoint.
  * [API Daemon] Adds a proper error when an invalid provisioner profile is specified
  * [Node Daemon] Sorts Ceph pools properly in node keepalive to avoid incorrect ordering
  * [Health Daemon] Improves handling of IPMI checks by adding multiple tries but a shorter timeout
  * [API Daemon] Improves handling of XML parsing errors in VM configurations
  * [ALL] Adds support for whole VM snapshots, including configuration XML details, and direct rollback to snapshots
  * [ALL] Adds support for exporting and importing whole VM snapshots
  * [Client CLI] Removes vCPU topology from short VM info output
  * [Client CLI] Improves output format of VM info output
  * [API Daemon] Adds an endpoint to get the current primary node
  * [Client CLI] Fixes a bug where API requests were made 3 times
  * [Other] Improves the build-and-deploy.sh script
  * [API Daemon] Improves the "vm rename" command to avoid redefining VM, preserving history etc.
  * [API Daemon] Adds an indication when a task is run on the primary node
  * [API Daemon] Fixes a bug where the ZK schema relative path didn't work sometimes
 -- Joshua M. Boniface <joshua@boniface.me>  Wed, 28 Aug 2024 11:15:55 -0400
 pvc (0.9.98-0) unstable; urgency=high
  * [CLI Client] Fixed output when API call times out
  * [Node Daemon] Improves the handling of fence states
  * [API Daemon/CLI Client] Adds support for storage snapshot rollback
  * [CLI Client] Adds additional warning messages about snapshot consistency to help output
  * [API Daemon] Fixes a bug listing snapshots by pool/volume
  * [Node Daemon] Adds a --version flag for information gathering by update-motd.sh
 -- Joshua M. Boniface <joshua@boniface.me>  Wed, 05 Jun 2024 12:01:31 -0400
 pvc (0.9.97-0) unstable; urgency=high
  * [Client CLI] Ensures --lines is always an integer value
  * [Node Daemon] Fixes a bug if d_network changes during iteration
  * [Node Daemon] Moves to using allocated instead of free memory for node reporting
  * [API Daemon] Fixes a bug if lingering RBD snapshots exist when removing a volume (#180)
 -- Joshua M. Boniface <joshua@boniface.me>  Fri, 19 Apr 2024 10:32:16 -0400
 pvc (0.9.96-0) unstable; urgency=high
  * [API Daemon] Fixes a bug when reporting node stats
  * [API Daemon] Fixes a bug deleteing successful benchmark results
 -- Joshua M. Boniface <joshua@boniface.me>  Fri, 08 Mar 2024 14:23:06 -0500
 pvc (0.9.95-0) unstable; urgency=high
  * [API Daemon/CLI Client] Adds a flag to allow duplicate VNIs in network templates
  * [API Daemon] Ensures that storage template disks are returned in disk ID order
  * [Client CLI] Fixes a display bug showing all OSDs as split
 -- Joshua M. Boniface <joshua@boniface.me>  Fri, 09 Feb 2024 12:42:00 -0500
 pvc (0.9.94-0) unstable; urgency=high
  * [CLI Client] Fixes an incorrect ordering issue with autobackup summary emails
  * [API Daemon/CLI Client] Adds an additional safety check for 80% cluster fullness when doing volume adds or resizes
  * [API Daemon/CLI Client] Adds safety checks to volume clones as well
  * [API Daemon] Fixes a few remaining memory bugs for stopped/disabled VMs
 -- Joshua M. Boniface <joshua@boniface.me>  Mon, 05 Feb 2024 09:58:07 -0500
 pvc (0.9.93-0) unstable; urgency=high
  * [API Daemon] Fixes a bug where stuck zkhandler threads were not cleaned up on error
 -- Joshua M. Boniface <joshua@boniface.me>  Tue, 30 Jan 2024 09:51:21 -0500
 pvc (0.9.92-0) unstable; urgency=high
  * [CLI Client] Adds the new restore state to the colours list for VM status
  * [API Daemon] Fixes an incorrect variable assignment
  * [Provisioner] Improves the error handling of various steps in the debootstrap and rinse example scripts
  * [CLI Client] Fixes two bugs around missing keys that were added recently (uses get() instead direct dictionary refs)
  * [CLI Client] Improves API error handling via GET retries (x3) and better server status code handling
 -- Joshua M. Boniface <joshua@boniface.me>  Mon, 29 Jan 2024 09:39:10 -0500
 pvc (0.9.91-0) unstable; urgency=high
  * [Client CLI] Fixes a bug and improves output during cluster task events.
  * [Client CLI] Improves the output of the task list display.
  * [Provisioner] Fixes some missing cloud-init modules in the default debootstrap script.
  * [Client CLI] Fixes a bug with a missing argument to the vm_define helper function.
  * [All] Fixes inconsistent package find + rm commands to avoid errors in dpkg.
 -- Joshua M. Boniface <joshua@boniface.me>  Tue, 23 Jan 2024 10:02:19 -0500
 pvc (0.9.90-0) unstable; urgency=high
  * [Client CLI/API Daemon] Adds additional backup metainfo and an emailed report option to autobackups.
  * [All] Adds a live migration maximum downtime selector to help with busy VM migrations.
  * [API Daemon] Fixes a database migration bug on Debian 10/11.
  * [Node Daemon] Fixes a race condition when applying Zookeeper schema changes.
 -- Joshua M. Boniface <joshua@boniface.me>  Thu, 11 Jan 2024 00:14:49 -0500
 pvc (0.9.89-0) unstable; urgency=high
  * [API/Worker Daemons] Fixes a bug with the Celery result backends not being properly initialized on Debian 10/11.
--- a/debian/control
+++ b/debian/control
@ -32,7 +32,7 @@ Description: Parallel Virtual Cluster worker daemon
 Package: pvc-daemon-api
 Architecture: all
-Depends: systemd, pvc-daemon-common, python3-yaml, python3-flask, python3-flask-restful, python3-celery, python3-distutils, python3-redis, python3-lxml, python3-flask-migrate
+Depends: systemd, pvc-daemon-common, gunicorn, python3-gunicorn, python3-yaml, python3-flask, python3-flask-restful, python3-celery, python3-distutils, python3-redis, python3-lxml, python3-flask-migrate
 Description: Parallel Virtual Cluster API daemon
 A KVM/Zookeeper/Ceph-based VM and private cloud manager
 .
--- a/debian/pvc-client-cli.postinst
+++ b/debian/pvc-client-cli.postinst
@ -2,7 +2,12 @@
 # Generate the bash completion configuration
 if [ -d /etc/bash_completion.d ]; then
    echo "Installing BASH completion configuration"
    _PVC_COMPLETE=source_bash pvc > /etc/bash_completion.d/pvc
 fi
 # Remove any cached CPython directories or files
 echo "Cleaning up CPython caches"
 find /usr/lib/python3/dist-packages/pvc -type d -name "__pycache__" -exec rm -fr {} + &>/dev/null || true
 exit 0
--- a/debian/pvc-daemon-api.postinst
+++ b/debian/pvc-daemon-api.postinst
@ -9,11 +9,6 @@ if systemctl is-active --quiet pvcapid.service; then
    /usr/share/pvc/pvc-api-db-upgrade
    systemctl start pvcapid.service
 fi
 # Restart the worker daemon
 if systemctl is-active --quiet pvcworkerd.service; then
    systemctl stop pvcworkerd.service
    systemctl start pvcworkerd.service
 fi
 if [ ! -f /etc/pvc/pvc.conf ]; then
    echo "NOTE: The PVC client API daemon (pvcapid.service) and the PVC Worker daemon (pvcworkerd.service) have not been started; create a config file at /etc/pvc/pvc.conf, then run the database configuration (/usr/share/pvc/pvc-api-db-upgrade) and start them manually."
--- a/debian/pvc-daemon-api.preinst
+++ b/debian/pvc-daemon-api.preinst
@ -1,5 +1,5 @@
 #!/bin/sh
 # Remove any cached CPython directories or files
-echo "Cleaning up existing CPython files"
+echo "Cleaning up CPython caches"
-find /usr/share/pvc/pvcapid -type d -name "__pycache__" -exec rm -rf {} \; &>/dev/null || true
+find /usr/share/pvc/pvcapid -type d -name "__pycache__" -exec rm -fr {} + &>/dev/null || true
--- a/debian/pvc-daemon-common.preinst
+++ b/debian/pvc-daemon-common.preinst
@ -0,0 +1,5 @@
 #!/bin/sh
 # Remove any cached CPython directories or files
 echo "Cleaning up CPython caches"
 find /usr/share/pvc/daemon_lib -type d -name "__pycache__" -exec rm -fr {} + &>/dev/null || true
--- a/debian/pvc-daemon-health.preinst
+++ b/debian/pvc-daemon-health.preinst
@ -1,6 +1,6 @@
 #!/bin/sh
 # Remove any cached CPython directories or files
-echo "Cleaning up existing CPython files"
+echo "Cleaning up CPython caches"
-find /usr/share/pvc/pvchealthd -type d -name "__pycache__" -exec rm -rf {} \; &>/dev/null || true
+find /usr/share/pvc/pvchealthd -type d -name "__pycache__" -exec rm -fr {} + &>/dev/null || true
-find /usr/share/pvc/plugins -type d -name "__pycache__" -exec rm -rf {} \; &>/dev/null || true
+find /usr/share/pvc/plugins -type d -name "__pycache__" -exec rm -fr {} + &>/dev/null || true
--- a/debian/pvc-daemon-node.preinst
+++ b/debian/pvc-daemon-node.preinst
@ -1,5 +1,5 @@
 #!/bin/sh
 # Remove any cached CPython directories or files
-echo "Cleaning up existing CPython files"
+echo "Cleaning up CPython caches"
-find /usr/share/pvc/pvcnoded -type d -name "__pycache__" -exec rm -rf {} \; &>/dev/null || true
+find /usr/share/pvc/pvcnoded -type d -name "__pycache__" -exec rm -fr {} + &>/dev/null || true
--- a/debian/pvc-daemon-worker.preinst
+++ b/debian/pvc-daemon-worker.preinst
@ -1,5 +1,5 @@
 #!/bin/sh
 # Remove any cached CPython directories or files
-echo "Cleaning up existing CPython files"
+echo "Cleaning up CPython caches"
-find /usr/share/pvc/pvcworkerd -type d -name "__pycache__" -exec rm -rf {} \; &>/dev/null || true
+find /usr/share/pvc/pvcworkerd -type d -name "__pycache__" -exec rm -fr {} + &>/dev/null || true
--- a/debian/rules
+++ b/debian/rules
@ -13,7 +13,7 @@ override_dh_python3:
 	rm -r $(CURDIR)/client-cli/.pybuild $(CURDIR)/client-cli/pvc.egg-info
 override_dh_auto_clean:
-	find . -name "__pycache__" -o -name ".pybuild" -exec rm -r {} \; || true
+	find . -name "__pycache__" -o -name ".pybuild" -exec rm -fr {} + || true
 # If you need to rebuild the Sphinx documentation
 # Add spinxdoc to the dh --with line
--- a/11
+++ b/11
@ -2,12 +2,19 @@
 # Generate the database migration files
 set -o xtrace
 VERSION="$( head -1 debian/changelog | awk -F'[()-]' '{ print $2 }' )"
 sudo ip addr add 10.0.1.250/32 dev lo
 pushd $( git rev-parse --show-toplevel ) &>/dev/null
 pushd api-daemon &>/dev/null
 export PVC_CONFIG_FILE="../pvc.sample.conf"
-./pvcapid-manage_flask.py db migrate -m "PVC version ${VERSION}"
+export FLASK_APP=./pvcapid-manage_flask.py
-./pvcapid-manage_flask.py db upgrade
+flask db migrate -m "PVC version ${VERSION}"
 flask db upgrade
 popd &>/dev/null
 popd &>/dev/null
 sudo ip addr del 10.0.1.250/32 dev lo
--- a/health-daemon/plugins/ipmi
+++ b/health-daemon/plugins/ipmi
@ -69,26 +69,33 @@ class MonitoringPluginScript(MonitoringPlugin):
        # Run any imports first
        from daemon_lib.common import run_os_command
        from time import sleep
        # Check the node's IPMI interface
        ipmi_hostname = self.config["ipmi_hostname"]
        ipmi_username = self.config["ipmi_username"]
        ipmi_password = self.config["ipmi_password"]
        retcode = 1
        trycount = 0
        while retcode > 0 and trycount < 3:
            retcode, _, _ = run_os_command(
                f"/usr/bin/ipmitool -I lanplus -H {ipmi_hostname} -U {ipmi_username} -P {ipmi_password} chassis power status",
-            timeout=5
+                timeout=2
            )
            trycount += 1
            if retcode > 0 and trycount < 3:
                sleep(trycount)
        if retcode > 0:
            # Set the health delta to 10 (subtract 10 from the total of 100)
            health_delta = 10
            # Craft a message that can be used by the clients
-            message = f"IPMI via {ipmi_username}@{ipmi_hostname} is NOT responding"
+            message = f"IPMI via {ipmi_username}@{ipmi_hostname} is NOT responding after 3 attempts"
        else:
            # Set the health delta to 0 (no change)
            health_delta = 0
            # Craft a message that can be used by the clients
-            message = f"IPMI via {ipmi_username}@{ipmi_hostname} is responding"
+            message = f"IPMI via {ipmi_username}@{ipmi_hostname} is responding after {trycount} attempts"
        # Set the health delta in our local PluginResult object
        self.plugin_result.set_health_delta(health_delta)
--- a/health-daemon/pvchealthd/Daemon.py
+++ b/health-daemon/pvchealthd/Daemon.py
@ -33,7 +33,7 @@ import os
 import signal
 # Daemon version
-version = "0.9.89"
+version = "0.9.103"
 ##########################################################
--- a/monitoring/munin/pvc
+++ b/monitoring/munin/pvc
@ -34,7 +34,7 @@ warning=0.99
 critical=1.99
 export PVC_CLIENT_DIR="/run/shm/munin-pvc"
-PVC_CMD="/usr/bin/pvc --quiet --cluster local status --format json-pretty"
+PVC_CMD="/usr/bin/sudo -E /usr/bin/pvc --quiet cluster status --format json-pretty"
 JQ_CMD="/usr/bin/jq"
 output_usage() {
@ -126,7 +126,7 @@ output_values() {
    is_maintenance="$( $JQ_CMD ".maintenance" <<<"${PVC_OUTPUT}" | tr -d '"' )"
    cluster_health="$( $JQ_CMD ".cluster_health.health" <<<"${PVC_OUTPUT}" | tr -d '"' )"
-    cluster_health_messages="$( $JQ_CMD -r ".cluster_health.messages | @csv" <<<"${PVC_OUTPUT}" | tr -d '"' | sed 's/,/, /g' )"
+    cluster_health_messages="$( $JQ_CMD -r ".cluster_health.messages | map(.text) | join(\", \")" <<<"${PVC_OUTPUT}" )"
    echo 'multigraph pvc_cluster_health'
    echo "pvc_cluster_health.value ${cluster_health}"
    echo "pvc_cluster_health.extinfo ${cluster_health_messages}"
@ -142,7 +142,7 @@ output_values() {
    echo "pvc_cluster_alert.value ${cluster_health_alert}"
    node_health="$( $JQ_CMD ".node_health.${HOST}.health" <<<"${PVC_OUTPUT}" | tr -d '"' )"
-    node_health_messages="$( $JQ_CMD -r ".node_health.${HOST}.messages | @csv" <<<"${PVC_OUTPUT}" | tr -d '"' | sed 's/,/, /g' )"
+    node_health_messages="$( $JQ_CMD -r ".node_health.${HOST}.messages | join(\", \")" <<<"${PVC_OUTPUT}" )"
    echo 'multigraph pvc_node_health'
    echo "pvc_node_health.value ${node_health}"
    echo "pvc_node_health.extinfo ${node_health_messages}"
--- a/monitoring/prometheus/grafana-pvc-cluster-dashboard.json
+++ b/monitoring/prometheus/grafana-pvc-cluster-dashboard.json
--- a/monitoring/prometheus/grafana-pvc-vms-dashboard.json
+++ b/monitoring/prometheus/grafana-pvc-vms-dashboard.json
@ -15,7 +15,7 @@
      "type": "grafana",
      "id": "grafana",
      "name": "Grafana",
-      "version": "10.2.2"
+      "version": "11.1.4"
    },
    {
      "type": "datasource",
@ -112,6 +112,7 @@
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
@ -119,10 +120,11 @@
          "fields": "/^pvc_cluster_id$/",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
-      "pluginVersion": "10.2.2",
+      "pluginVersion": "11.1.4",
      "targets": [
        {
          "datasource": {
@ -144,7 +146,6 @@
        }
      ],
      "title": "Cluster",
      "transformations": [],
      "type": "stat"
    },
    {
@ -187,6 +188,7 @@
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
@ -194,10 +196,11 @@
          "fields": "/^vm$/",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
-      "pluginVersion": "10.2.2",
+      "pluginVersion": "11.1.4",
      "targets": [
        {
          "datasource": {
@ -219,7 +222,6 @@
        }
      ],
      "title": "VM Name",
      "transformations": [],
      "type": "stat"
    },
    {
@ -301,6 +303,21 @@
                        "color": "dark-red",
                        "index": 8,
                        "text": "fail"
                      },
                      "9": {
                        "color": "dark-blue",
                        "index": 9,
                        "text": "import"
                      },
                      "10": {
                        "color": "dark-blue",
                        "index": 10,
                        "text": "restore"
                      },
                      "99": {
                        "color": "dark-purple",
                        "index": 11,
                        "text": "mirror"
                      }
                    },
                    "type": "value"
@ -323,6 +340,7 @@
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
@ -330,10 +348,11 @@
          "fields": "/^Value$/",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
-      "pluginVersion": "10.2.2",
+      "pluginVersion": "11.1.4",
      "targets": [
        {
          "datasource": {
@ -355,7 +374,6 @@
        }
      ],
      "title": "State",
      "transformations": [],
      "type": "stat"
    },
    {
@ -398,6 +416,7 @@
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
@ -405,10 +424,11 @@
          "fields": "/^uuid$/",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
-      "pluginVersion": "10.2.2",
+      "pluginVersion": "11.1.4",
      "targets": [
        {
          "datasource": {
@ -430,7 +450,6 @@
        }
      ],
      "title": "UUID",
      "transformations": [],
      "type": "stat"
    },
    {
@ -473,6 +492,7 @@
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
@ -480,10 +500,11 @@
          "fields": "/^node$/",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
-      "pluginVersion": "10.2.2",
+      "pluginVersion": "11.1.4",
      "targets": [
        {
          "datasource": {
@ -505,7 +526,6 @@
        }
      ],
      "title": "Active Node",
      "transformations": [],
      "type": "stat"
    },
    {
@ -545,6 +565,7 @@
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
@ -552,10 +573,11 @@
          "fields": "/^last_node$/",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
-      "pluginVersion": "10.2.2",
+      "pluginVersion": "11.1.4",
      "targets": [
        {
          "datasource": {
@ -577,7 +599,6 @@
        }
      ],
      "title": "Migrated",
      "transformations": [],
      "type": "stat"
    },
    {
@ -646,6 +667,7 @@
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
@ -653,10 +675,11 @@
          "fields": "/^Value$/",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
-      "pluginVersion": "10.2.2",
+      "pluginVersion": "11.1.4",
      "targets": [
        {
          "datasource": {
@ -678,7 +701,6 @@
        }
      ],
      "title": "Autostart",
      "transformations": [],
      "type": "stat"
    },
    {
@ -721,6 +743,7 @@
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
@ -728,10 +751,11 @@
          "fields": "/^description$/",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
-      "pluginVersion": "10.2.2",
+      "pluginVersion": "11.1.4",
      "targets": [
        {
          "datasource": {
@ -753,7 +777,6 @@
        }
      ],
      "title": "Description",
      "transformations": [],
      "type": "stat"
    },
    {
@ -796,6 +819,7 @@
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
@ -803,10 +827,11 @@
          "fields": "/^Value$/",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
-      "pluginVersion": "10.2.2",
+      "pluginVersion": "11.1.4",
      "targets": [
        {
          "datasource": {
@ -828,7 +853,6 @@
        }
      ],
      "title": "vCPUs",
      "transformations": [],
      "type": "stat"
    },
    {
@ -871,6 +895,7 @@
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
@ -878,10 +903,11 @@
          "fields": "/^topology$/",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
-      "pluginVersion": "10.2.2",
+      "pluginVersion": "11.1.4",
      "targets": [
        {
          "datasource": {
@ -903,7 +929,6 @@
        }
      ],
      "title": "vCPU Topology",
      "transformations": [],
      "type": "stat"
    },
    {
@ -947,6 +972,7 @@
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
@ -954,10 +980,11 @@
          "fields": "/^Value$/",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
-      "pluginVersion": "10.2.2",
+      "pluginVersion": "11.1.4",
      "targets": [
        {
          "datasource": {
@ -979,7 +1006,6 @@
        }
      ],
      "title": "vRAM",
      "transformations": [],
      "type": "stat"
    },
    {
@ -1022,6 +1048,7 @@
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
@ -1029,10 +1056,11 @@
          "fields": "/^node_limit$/",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
-      "pluginVersion": "10.2.2",
+      "pluginVersion": "11.1.4",
      "targets": [
        {
          "datasource": {
@ -1054,7 +1082,6 @@
        }
      ],
      "title": "Node Limits",
      "transformations": [],
      "type": "stat"
    },
    {
@ -1097,6 +1124,7 @@
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "auto",
        "percentChangeColorMode": "standard",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
@ -1104,10 +1132,11 @@
          "fields": "failed_reason",
          "values": false
        },
        "showPercentChange": false,
        "textMode": "auto",
        "wideLayout": true
      },
-      "pluginVersion": "10.2.2",
+      "pluginVersion": "11.1.4",
      "targets": [
        {
          "datasource": {
@ -1129,11 +1158,10 @@
        }
      ],
      "title": "Failure Reason",
      "transformations": [],
      "type": "stat"
    },
    {
-      "collapsed": true,
+      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
@ -1141,7 +1169,10 @@
        "y": 10
      },
      "id": 14,
-      "panels": [
+      "panels": [],
      "title": "CPU & Memory Stats",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
@ -1664,21 +1695,20 @@
      ],
      "title": "Swap Utilization (+ in/- out)",
      "type": "timeseries"
        }
      ],
      "title": "CPU & Memory Stats",
      "type": "row"
    },
    {
-      "collapsed": true,
+      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
-        "y": 11
+        "y": 27
      },
      "id": 19,
-      "panels": [
+      "panels": [],
      "title": "NIC Stats",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
@ -1727,8 +1757,7 @@
            "mode": "absolute",
            "steps": [
              {
-                    "color": "green",
+                "color": "green"
                    "value": null
              },
              {
                "color": "red",
@ -1757,7 +1786,7 @@
        "h": 10,
        "w": 24,
        "x": 0,
-            "y": 12
+        "y": 28
      },
      "id": 20,
      "options": {
@ -1864,8 +1893,7 @@
            "mode": "absolute",
            "steps": [
              {
-                    "color": "green",
+                "color": "green"
                    "value": null
              },
              {
                "color": "red",
@ -1894,7 +1922,7 @@
        "h": 10,
        "w": 24,
        "x": 0,
-            "y": 22
+        "y": 38
      },
      "id": 21,
      "options": {
@ -2001,8 +2029,7 @@
            "mode": "absolute",
            "steps": [
              {
-                    "color": "green",
+                "color": "green"
                    "value": null
              }
            ]
          },
@ -2027,7 +2054,7 @@
        "h": 8,
        "w": 12,
        "x": 0,
-            "y": 32
+        "y": 48
      },
      "id": 22,
      "options": {
@ -2134,8 +2161,7 @@
            "mode": "absolute",
            "steps": [
              {
-                    "color": "green",
+                "color": "green"
                    "value": null
              }
            ]
          },
@ -2160,7 +2186,7 @@
        "h": 8,
        "w": 12,
        "x": 12,
-            "y": 32
+        "y": 48
      },
      "id": 23,
      "options": {
@ -2218,21 +2244,20 @@
      ],
      "title": "Errors (+ RX/- TX)",
      "type": "timeseries"
        }
      ],
      "title": "NIC Stats",
      "type": "row"
    },
    {
-      "collapsed": true,
+      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
-        "y": 12
+        "y": 56
      },
      "id": 24,
-      "panels": [
+      "panels": [],
      "title": "Disk Stats",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
@ -2281,8 +2306,7 @@
            "mode": "absolute",
            "steps": [
              {
-                    "color": "green",
+                "color": "green"
                    "value": null
              },
              {
                "color": "red",
@ -2311,7 +2335,7 @@
        "h": 9,
        "w": 24,
        "x": 0,
-            "y": 13
+        "y": 57
      },
      "id": 25,
      "options": {
@ -2368,7 +2392,6 @@
        }
      ],
      "title": "IOPS (+ Read/- Write)",
          "transformations": [],
      "type": "timeseries"
    },
    {
@ -2419,8 +2442,7 @@
            "mode": "absolute",
            "steps": [
              {
-                    "color": "green",
+                "color": "green"
                    "value": null
              },
              {
                "color": "red",
@ -2449,7 +2471,7 @@
        "h": 9,
        "w": 24,
        "x": 0,
-            "y": 22
+        "y": 66
      },
      "id": 26,
      "options": {
@ -2509,12 +2531,8 @@
      "type": "timeseries"
    }
  ],
      "title": "Disk Stats",
      "type": "row"
    }
  ],
  "refresh": "5s",
-  "schemaVersion": 38,
+  "schemaVersion": 39,
  "tags": [
    "pvc"
  ],
--- a/node-daemon/pvcnoded.py
+++ b/node-daemon/pvcnoded.py
@ -19,6 +19,11 @@
 #
 ###############################################################################
 from sys import argv
 import pvcnoded.Daemon  # noqa: F401
 if "--version" in argv:
    print(pvcnoded.Daemon.version)
    exit(0)
 pvcnoded.Daemon.entrypoint()
--- a/node-daemon/pvcnoded/Daemon.py
+++ b/node-daemon/pvcnoded/Daemon.py
@ -49,7 +49,7 @@ import re
 import json
 # Daemon version
-version = "0.9.89"
+version = "0.9.103"
 ##########################################################
@ -197,6 +197,8 @@ def entrypoint():
        os.execv(sys.argv[0], sys.argv)
    # Validate the schema
    with zkhandler.writelock("base.schema.version"):
        sleep(0.5)
        pvcnoded.util.zookeeper.validate_schema(logger, zkhandler)
    # Define a cleanup function
--- a/node-daemon/pvcnoded/objects/NetstatsInstance.py
+++ b/node-daemon/pvcnoded/objects/NetstatsInstance.py
@ -231,7 +231,7 @@ class NetstatsInstance(object):
        # Get a list of all active interfaces
        net_root_path = "/sys/class/net"
        all_ifaces = list()
-        for (_, dirnames, _) in walk(net_root_path):
+        for _, dirnames, _ in walk(net_root_path):
            all_ifaces.extend(dirnames)
        all_ifaces.sort()
--- a/node-daemon/pvcnoded/objects/NodeInstance.py
+++ b/node-daemon/pvcnoded/objects/NodeInstance.py
@ -438,8 +438,11 @@ class NodeInstance(object):
        # Synchronize nodes B (I am reader)
        lock = self.zkhandler.readlock("base.config.primary_node.sync_lock")
        self.logger.out("Acquiring read lock for synchronization phase B", state="i")
-        lock.acquire()
+        try:
-        self.logger.out("Acquired read lock for synchronization phase B", state="o")
+            lock.acquire(timeout=5)  # Don't wait forever and completely block us
            self.logger.out("Acquired read lock for synchronization phase G", state="o")
        except Exception:
            pass
        self.logger.out("Releasing read lock for synchronization phase B", state="i")
        lock.release()
        self.logger.out("Released read lock for synchronization phase B", state="o")
@ -521,7 +524,7 @@ class NodeInstance(object):
        self.logger.out("Acquired write lock for synchronization phase F", state="o")
        time.sleep(0.2)  # Time fir reader to acquire the lock
        # 4. Add gateway IPs
-        for network in self.d_network:
+        for network in self.d_network.copy():
            self.d_network[network].createGateways()
        self.logger.out("Releasing write lock for synchronization phase F", state="i")
        self.zkhandler.write([("base.config.primary_node.sync_lock", "")])
@ -648,8 +651,11 @@ class NodeInstance(object):
        # Synchronize nodes A (I am reader)
        lock = self.zkhandler.readlock("base.config.primary_node.sync_lock")
        self.logger.out("Acquiring read lock for synchronization phase A", state="i")
-        lock.acquire()
+        try:
-        self.logger.out("Acquired read lock for synchronization phase A", state="o")
+            lock.acquire(timeout=5)  # Don't wait forever and completely block us
            self.logger.out("Acquired read lock for synchronization phase G", state="o")
        except Exception:
            pass
        self.logger.out("Releasing read lock for synchronization phase A", state="i")
        lock.release()
        self.logger.out("Released read lock for synchronization phase A", state="o")
@ -682,8 +688,11 @@ class NodeInstance(object):
        # Synchronize nodes C (I am reader)
        lock = self.zkhandler.readlock("base.config.primary_node.sync_lock")
        self.logger.out("Acquiring read lock for synchronization phase C", state="i")
-        lock.acquire()
+        try:
-        self.logger.out("Acquired read lock for synchronization phase C", state="o")
+            lock.acquire(timeout=5)  # Don't wait forever and completely block us
            self.logger.out("Acquired read lock for synchronization phase G", state="o")
        except Exception:
            pass
        # 5. Remove Upstream floating IP
        self.logger.out(
            "Removing floating upstream IP {}/{} from interface {}".format(
@ -701,8 +710,11 @@ class NodeInstance(object):
        # Synchronize nodes D (I am reader)
        lock = self.zkhandler.readlock("base.config.primary_node.sync_lock")
        self.logger.out("Acquiring read lock for synchronization phase D", state="i")
-        lock.acquire()
+        try:
-        self.logger.out("Acquired read lock for synchronization phase D", state="o")
+            lock.acquire(timeout=5)  # Don't wait forever and completely block us
            self.logger.out("Acquired read lock for synchronization phase G", state="o")
        except Exception:
            pass
        # 6. Remove Cluster & Storage floating IP
        self.logger.out(
            "Removing floating management IP {}/{} from interface {}".format(
@ -729,8 +741,11 @@ class NodeInstance(object):
        # Synchronize nodes E (I am reader)
        lock = self.zkhandler.readlock("base.config.primary_node.sync_lock")
        self.logger.out("Acquiring read lock for synchronization phase E", state="i")
-        lock.acquire()
+        try:
-        self.logger.out("Acquired read lock for synchronization phase E", state="o")
+            lock.acquire(timeout=5)  # Don't wait forever and completely block us
            self.logger.out("Acquired read lock for synchronization phase G", state="o")
        except Exception:
            pass
        # 7. Remove Metadata link-local IP
        self.logger.out(
            "Removing Metadata link-local IP {}/{} from interface {}".format(
@ -746,8 +761,11 @@ class NodeInstance(object):
        # Synchronize nodes F (I am reader)
        lock = self.zkhandler.readlock("base.config.primary_node.sync_lock")
        self.logger.out("Acquiring read lock for synchronization phase F", state="i")
-        lock.acquire()
+        try:
-        self.logger.out("Acquired read lock for synchronization phase F", state="o")
+            lock.acquire(timeout=5)  # Don't wait forever and completely block us
            self.logger.out("Acquired read lock for synchronization phase G", state="o")
        except Exception:
            pass
        # 8. Remove gateway IPs
        for network in self.d_network:
            self.d_network[network].removeGateways()
@ -759,7 +777,7 @@ class NodeInstance(object):
        lock = self.zkhandler.readlock("base.config.primary_node.sync_lock")
        self.logger.out("Acquiring read lock for synchronization phase G", state="i")
        try:
-            lock.acquire(timeout=60)  # Don't wait forever and completely block us
+            lock.acquire(timeout=5)  # Don't wait forever and completely block us
            self.logger.out("Acquired read lock for synchronization phase G", state="o")
        except Exception:
            pass
--- a/node-daemon/pvcnoded/objects/VMInstance.py
+++ b/node-daemon/pvcnoded/objects/VMInstance.py
@ -687,6 +687,29 @@ class VMInstance(object):
            abort_migrate("Target node changed during preparation")
            return
        if not force_shutdown:
            # Set the maxdowntime value from Zookeeper
            try:
                max_downtime = self.zkhandler.read(
                    ("domain.meta.migrate_max_downtime", self.domuuid)
                )
            except Exception as e:
                self.logger.out(
                    f"Error fetching migrate max downtime; using default of 300s: {e}",
                    state="w",
                )
                self.max_downtime = 300
            self.logger.out(
                f"Running migrate-setmaxdowntime with downtime value {max_downtime}",
                state="i",
                prefix="Domain {}".format(self.domuuid),
            )
            retcode, stdout, stderr = common.run_os_command(
                f"virsh migrate-setmaxdowntime --downtime {max_downtime} {self.domuuid}"
            )
            if retcode:
                abort_migrate("Failed to set maxdowntime value on running VM")
                return
            # A live migrate is attemped 3 times in succession
            ticks = 0
            while True:
--- a/node-daemon/pvcnoded/util/fencing.py
+++ b/node-daemon/pvcnoded/util/fencing.py
@ -21,15 +21,72 @@
 import time
 from kazoo.exceptions import LockTimeout
 import daemon_lib.common as common
 from daemon_lib.vm import vm_worker_flush_locks
 #
-# Fence thread entry function
+# Fence monitor thread entrypoint
 #
-def fence_node(node_name, zkhandler, config, logger):
+def fence_monitor(zkhandler, config, logger):
    # Attempt to acquire an exclusive lock on the fence_lock key
    # If it is already held, we'll abort since another node is processing fences
    lock = zkhandler.exclusivelock("base.config.fence_lock")
    try:
        lock.acquire(timeout=config["keepalive_interval"] - 1)
        for node_name in zkhandler.children("base.node"):
            try:
                node_daemon_state = zkhandler.read(("node.state.daemon", node_name))
                node_keepalive = int(zkhandler.read(("node.keepalive", node_name)))
            except Exception:
                node_daemon_state = "unknown"
                node_keepalive = 0
            node_deadtime = int(time.time()) - (
                int(config["keepalive_interval"]) * int(config["fence_intervals"])
            )
            if node_keepalive < node_deadtime and node_daemon_state == "run":
                logger.out(
                    f"Node {node_name} seems dead; starting monitor for fencing",
                    state="w",
                )
                zk_lock = zkhandler.writelock(("node.state.daemon", node_name))
                with zk_lock:
                    # Ensures that, if we lost the lock race and come out of waiting,
                    # we won't try to trigger our own fence thread.
                    if zkhandler.read(("node.state.daemon", node_name)) != "dead":
                        # Write the updated data after we start the fence thread
                        zkhandler.write([(("node.state.daemon", node_name), "dead")])
                        # Start the fence monitoring task for this node
                        # NOTE: This is not a subthread and is designed to block this for loop
                        # This ensures that only one node is ever being fenced at a time
                        fence_node(zkhandler, config, logger, node_name)
            else:
                logger.out(
                    f"Node {node_name} is OK; last checkin is {node_deadtime - node_keepalive}s from threshold, node state is '{node_daemon_state}'",
                    state="d",
                    prefix="fence-thread",
                )
    except LockTimeout:
        logger.out(
            "Fence monitor thread failed to acquire exclusive lock; skipping", state="i"
        )
    except Exception as e:
        logger.out(f"Fence monitor thread failed: {e}", state="w")
    finally:
        # We're finished, so release the global lock
        lock.release()
 #
 # Fence action function
 #
 def fence_node(zkhandler, config, logger, node_name):
    # We allow exactly 6 saving throws (30 seconds) for the host to come back online or we kill it
    failcount_limit = 6
    failcount = 0
@ -190,7 +247,7 @@ def migrateFromFencedNode(zkhandler, node_name, config, logger):
            )
            zkhandler.write(
                {
-                    (("domain.state", dom_uuid), "stopped"),
+                    (("domain.state", dom_uuid), "stop"),
                    (("domain.meta.autostart", dom_uuid), "True"),
                }
            )
@ -202,6 +259,9 @@ def migrateFromFencedNode(zkhandler, node_name, config, logger):
    # Loop through the VMs
    for dom_uuid in dead_node_running_domains:
        if dom_uuid in ["0", 0]:
            # Skip the invalid "0" UUID we sometimes get
            continue
        try:
            fence_migrate_vm(dom_uuid)
        except Exception as e:
@ -253,12 +313,16 @@ def reboot_via_ipmi(node_name, ipmi_hostname, ipmi_user, ipmi_password, logger):
        state="i",
        prefix=f"fencing {node_name}",
    )
-    ipmi_status_retcode, ipmi_status_stdout, ipmi_status_stderr = common.run_os_command(
+    (
        ipmi_intermediate_status_retcode,
        ipmi_intermediate_status_stdout,
        ipmi_intermediate_status_stderr,
    ) = common.run_os_command(
        f"/usr/bin/ipmitool -I lanplus -H {ipmi_hostname} -U {ipmi_user} -P {ipmi_password} chassis power status"
    )
-    if ipmi_status_retcode == 0:
+    if ipmi_intermediate_status_retcode == 0:
        logger.out(
-            f"Current chassis power state is: {ipmi_status_stdout.strip()}",
+            f"Current chassis power state is: {ipmi_intermediate_status_stdout.strip()}",
            state="i",
            prefix=f"fencing {node_name}",
        )
@ -299,12 +363,14 @@ def reboot_via_ipmi(node_name, ipmi_hostname, ipmi_user, ipmi_password, logger):
        state="i",
        prefix=f"fencing {node_name}",
    )
-    ipmi_status_retcode, ipmi_status_stdout, ipmi_status_stderr = common.run_os_command(
+    ipmi_final_status_retcode, ipmi_final_status_stdout, ipmi_final_status_stderr = (
        common.run_os_command(
            f"/usr/bin/ipmitool -I lanplus -H {ipmi_hostname} -U {ipmi_user} -P {ipmi_password} chassis power status"
        )
    )
-    if ipmi_stop_retcode == 0:
+    if ipmi_intermediate_status_stdout.strip() == "Chassis power is off":
-        if ipmi_status_stdout.strip() == "Chassis Power is on":
+        if ipmi_final_status_stdout.strip() == "Chassis Power is on":
            # We successfully rebooted the node and it is powered on; this is a succeessful fence
            logger.out(
                "Successfully rebooted dead node; proceeding with fence recovery action",
@ -312,7 +378,7 @@ def reboot_via_ipmi(node_name, ipmi_hostname, ipmi_user, ipmi_password, logger):
                prefix=f"fencing {node_name}",
            )
            return True
-        elif ipmi_status_stdout.strip() == "Chassis Power is off":
+        elif ipmi_final_status_stdout.strip() == "Chassis Power is off":
            # We successfully rebooted the node but it is powered off; this might be expected or not, but the node is confirmed off so we can call it a successful fence
            logger.out(
                "Chassis power is in confirmed off state after successfuly IPMI reboot; proceeding with fence recovery action",
@ -323,13 +389,13 @@ def reboot_via_ipmi(node_name, ipmi_hostname, ipmi_user, ipmi_password, logger):
        else:
            # We successfully rebooted the node but it is in some unknown power state; since this might indicate a silent failure, we must call it a failed fence
            logger.out(
-                f"Chassis power is in an unknown state ({ipmi_status_stdout.strip()}) after successful IPMI reboot; NOT proceeding fence recovery action",
+                f"Chassis power is in an unknown state ({ipmi_final_status_stdout.strip()}) after successful IPMI reboot; NOT proceeding fence recovery action",
                state="e",
                prefix=f"fencing {node_name}",
            )
            return False
    else:
-        if ipmi_status_stdout.strip() == "Chassis Power is off":
+        if ipmi_final_status_stdout.strip() == "Chassis Power is off":
            # We failed to reboot the node but it is powered off; it has probably suffered a serious hardware failure, but the node is confirmed off so we can call it a successful fence
            logger.out(
                "Chassis power is in confirmed off state after failed IPMI reboot; proceeding with fence recovery action",
--- a/node-daemon/pvcnoded/util/keepalive.py
+++ b/node-daemon/pvcnoded/util/keepalive.py
@ -157,7 +157,9 @@ def collect_ceph_stats(logger, config, zkhandler, this_node, queue):
            1
        ].decode("ascii")
        try:
-            ceph_pool_df_raw = json.loads(ceph_df_output)["pools"]
+            ceph_pool_df_raw = sorted(
                json.loads(ceph_df_output)["pools"], key=lambda x: x["name"]
            )
        except Exception as e:
            logger.out("Failed to obtain Pool data (ceph df): {}".format(e), state="w")
            ceph_pool_df_raw = []
@ -166,7 +168,9 @@ def collect_ceph_stats(logger, config, zkhandler, this_node, queue):
            "rados df --format json", timeout=1
        )
        try:
-            rados_pool_df_raw = json.loads(stdout)["pools"]
+            rados_pool_df_raw = sorted(
                json.loads(stdout)["pools"], key=lambda x: x["name"]
            )
        except Exception as e:
            logger.out("Failed to obtain Pool data (rados df): {}".format(e), state="w")
            rados_pool_df_raw = []
@ -743,7 +747,7 @@ def node_keepalive(logger, config, zkhandler, this_node, netstats):
    # Get node performance statistics
    this_node.memtotal = int(psutil.virtual_memory().total / 1024 / 1024)
    this_node.memused = int(psutil.virtual_memory().used / 1024 / 1024)
-    this_node.memfree = int(psutil.virtual_memory().free / 1024 / 1024)
+    this_node.memfree = int(psutil.virtual_memory().available / 1024 / 1024)
    this_node.cpuload = round(os.getloadavg()[0], 2)
    # Get node network statistics via netstats instance
@ -752,29 +756,21 @@ def node_keepalive(logger, config, zkhandler, this_node, netstats):
    # Join against running threads
    if config["enable_hypervisor"]:
-        vm_stats_thread.join(timeout=config["keepalive_interval"])
+        vm_stats_thread.join(timeout=config["keepalive_interval"] - 1)
        if vm_stats_thread.is_alive():
            logger.out("VM stats gathering exceeded timeout, continuing", state="w")
    if config["enable_storage"]:
-        ceph_stats_thread.join(timeout=config["keepalive_interval"])
+        ceph_stats_thread.join(timeout=config["keepalive_interval"] - 1)
        if ceph_stats_thread.is_alive():
            logger.out("Ceph stats gathering exceeded timeout, continuing", state="w")
    # Get information from thread queues
    if config["enable_hypervisor"]:
        try:
-            this_node.domains_count = vm_thread_queue.get(
+            this_node.domains_count = vm_thread_queue.get(timeout=0.1)
-                timeout=config["keepalive_interval"]
+            this_node.memalloc = vm_thread_queue.get(timeout=0.1)
-            )
+            this_node.memprov = vm_thread_queue.get(timeout=0.1)
-            this_node.memalloc = vm_thread_queue.get(
+            this_node.vcpualloc = vm_thread_queue.get(timeout=0.1)
                timeout=config["keepalive_interval"]
            )
            this_node.memprov = vm_thread_queue.get(
                timeout=config["keepalive_interval"]
            )
            this_node.vcpualloc = vm_thread_queue.get(
                timeout=config["keepalive_interval"]
            )
        except Exception:
            logger.out("VM stats queue get exceeded timeout, continuing", state="w")
    else:
@ -785,9 +781,7 @@ def node_keepalive(logger, config, zkhandler, this_node, netstats):
    if config["enable_storage"]:
        try:
-            osds_this_node = ceph_thread_queue.get(
+            osds_this_node = ceph_thread_queue.get(timeout=0.1)
                timeout=(config["keepalive_interval"] - 1)
            )
        except Exception:
            logger.out("Ceph stats queue get exceeded timeout, continuing", state="w")
            osds_this_node = "?"
@ -883,44 +877,12 @@ def node_keepalive(logger, config, zkhandler, this_node, netstats):
            )
    # Look for dead nodes and fence them
-    if not this_node.maintenance:
+    if not this_node.maintenance and config["daemon_mode"] == "coordinator":
        logger.out(
            "Look for dead nodes and fence them", state="d", prefix="main-thread"
        )
-        if config["daemon_mode"] == "coordinator":
+        fence_monitor_thread = Thread(
-            for node_name in zkhandler.children("base.node"):
+            target=pvcnoded.util.fencing.fence_monitor,
-                try:
+            args=(zkhandler, config, logger),
                    node_daemon_state = zkhandler.read(("node.state.daemon", node_name))
                    node_keepalive = int(zkhandler.read(("node.keepalive", node_name)))
                except Exception:
                    node_daemon_state = "unknown"
                    node_keepalive = 0
                # Handle deadtime and fencng if needed
                # (A node is considered dead when its keepalive timer is >6*keepalive_interval seconds
                # out-of-date while in 'start' state)
                node_deadtime = int(time.time()) - (
                    int(config["keepalive_interval"]) * int(config["fence_intervals"])
                )
                if node_keepalive < node_deadtime and node_daemon_state == "run":
                    logger.out(
                        "Node {} seems dead - starting monitor for fencing".format(
                            node_name
                        ),
                        state="w",
                    )
                    zk_lock = zkhandler.writelock(("node.state.daemon", node_name))
                    with zk_lock:
                        # Ensures that, if we lost the lock race and come out of waiting,
                        # we won't try to trigger our own fence thread.
                        if zkhandler.read(("node.state.daemon", node_name)) != "dead":
                            fence_thread = Thread(
                                target=pvcnoded.util.fencing.fence_node,
                                args=(node_name, zkhandler, config, logger),
                                kwargs={},
                            )
                            fence_thread.start()
                            # Write the updated data after we start the fence thread
                            zkhandler.write(
                                [(("node.state.daemon", node_name), "dead")]
        )
        fence_monitor_thread.start()
--- a/node-daemon/pvcnoded/util/services.py
+++ b/node-daemon/pvcnoded/util/services.py
@ -102,5 +102,5 @@ def start_system_services(logger, config):
    start_workerd(logger, config)
    start_healthd(logger, config)
-    logger.out("Waiting 5 seconds for daemons to start", state="s")
+    logger.out("Waiting 10 seconds for daemons to start", state="s")
-    sleep(5)
+    sleep(10)
--- a/node-daemon/pvcnoded/util/zookeeper.py
+++ b/node-daemon/pvcnoded/util/zookeeper.py
@ -94,7 +94,10 @@ def validate_schema(logger, zkhandler):
    # Validate our schema against the active version
    if not zkhandler.schema.validate(zkhandler, logger):
        logger.out("Found schema violations, applying", state="i")
        try:
            zkhandler.schema.apply(zkhandler)
        except Exception as e:
            logger.out(f"Failed to apply schema updates: {e}", state="w")
    else:
        logger.out("Schema successfully validated", state="o")
@ -185,3 +188,6 @@ def setup_node(logger, config, zkhandler):
                (("node.count.networks", config["node_hostname"]), "0"),
            ]
        )
    logger.out("Waiting 5 seconds for Zookeeper to synchronize", state="s")
    time.sleep(5)
--- a/pvc.sample.conf
+++ b/pvc.sample.conf
@ -168,7 +168,7 @@ database:
    port: 6379
    # Hostname; use `cluster` network floating IP address
-    hostname: 10.0.1.250
+    hostname: 127.0.0.1
    # Path, usually "/0"
    path: "/0"
@ -180,7 +180,7 @@ database:
    port: 5432
    # Hostname; use `cluster` network floating IP address
-    hostname: 10.0.1.250
+    hostname: 127.0.0.1
    # Credentials
    credentials:
--- a/worker-daemon/pvcworkerd/Daemon.py
+++ b/worker-daemon/pvcworkerd/Daemon.py
@ -28,6 +28,14 @@ from daemon_lib.vm import (
    vm_worker_flush_locks,
    vm_worker_attach_device,
    vm_worker_detach_device,
    vm_worker_create_snapshot,
    vm_worker_remove_snapshot,
    vm_worker_rollback_snapshot,
    vm_worker_export_snapshot,
    vm_worker_import_snapshot,
    vm_worker_send_snapshot,
    vm_worker_create_mirror,
    vm_worker_promote_mirror,
 )
 from daemon_lib.ceph import (
    osd_worker_add_osd,
@ -42,9 +50,12 @@ from daemon_lib.benchmark import (
 from daemon_lib.vmbuilder import (
    worker_create_vm,
 )
 from daemon_lib.autobackup import (
    worker_cluster_autobackup,
 )
 # Daemon version
-version = "0.9.89"
+version = "0.9.103"
 config = cfg.get_configuration()
@ -88,12 +99,27 @@ def create_vm(
@celery.task(name="storage.benchmark", bind=True, routing_key="run_on")
-def storage_benchmark(self, pool=None, run_on="primary"):
+def storage_benchmark(self, pool=None, name=None, run_on="primary"):
    @ZKConnection(config)
-    def run_storage_benchmark(zkhandler, self, pool):
+    def run_storage_benchmark(zkhandler, self, pool, name):
-        return worker_run_benchmark(zkhandler, self, config, pool)
+        return worker_run_benchmark(zkhandler, self, config, pool, name)
-    return run_storage_benchmark(self, pool)
+    return run_storage_benchmark(self, pool, name)
@celery.task(name="cluster.autobackup", bind=True, routing_key="run_on")
 def cluster_autobackup(self, force_full=False, email_recipients=None, run_on="primary"):
    @ZKConnection(config)
    def run_cluster_autobackup(
        zkhandler, self, force_full=False, email_recipients=None
    ):
        return worker_cluster_autobackup(
            zkhandler, self, force_full=force_full, email_recipients=email_recipients
        )
    return run_cluster_autobackup(
        self, force_full=force_full, email_recipients=email_recipients
    )
@celery.task(name="vm.flush_locks", bind=True, routing_key="run_on")
@ -123,6 +149,219 @@ def vm_device_detach(self, domain=None, xml=None, run_on=None):
    return run_vm_device_detach(self, domain, xml)
@celery.task(name="vm.create_snapshot", bind=True, routing_key="run_on")
 def vm_create_snapshot(self, domain=None, snapshot_name=None, run_on="primary"):
    @ZKConnection(config)
    def run_vm_create_snapshot(zkhandler, self, domain, snapshot_name):
        return vm_worker_create_snapshot(zkhandler, self, domain, snapshot_name)
    return run_vm_create_snapshot(self, domain, snapshot_name)
@celery.task(name="vm.remove_snapshot", bind=True, routing_key="run_on")
 def vm_remove_snapshot(self, domain=None, snapshot_name=None, run_on="primary"):
    @ZKConnection(config)
    def run_vm_remove_snapshot(zkhandler, self, domain, snapshot_name):
        return vm_worker_remove_snapshot(zkhandler, self, domain, snapshot_name)
    return run_vm_remove_snapshot(self, domain, snapshot_name)
@celery.task(name="vm.rollback_snapshot", bind=True, routing_key="run_on")
 def vm_rollback_snapshot(self, domain=None, snapshot_name=None, run_on="primary"):
    @ZKConnection(config)
    def run_vm_rollback_snapshot(zkhandler, self, domain, snapshot_name):
        return vm_worker_rollback_snapshot(zkhandler, self, domain, snapshot_name)
    return run_vm_rollback_snapshot(self, domain, snapshot_name)
@celery.task(name="vm.export_snapshot", bind=True, routing_key="run_on")
 def vm_export_snapshot(
    self,
    domain=None,
    snapshot_name=None,
    export_path=None,
    incremental_parent=None,
    run_on="primary",
 ):
    @ZKConnection(config)
    def run_vm_export_snapshot(
        zkhandler, self, domain, snapshot_name, export_path, incremental_parent=None
    ):
        return vm_worker_export_snapshot(
            zkhandler,
            self,
            domain,
            snapshot_name,
            export_path,
            incremental_parent=incremental_parent,
        )
    return run_vm_export_snapshot(
        self, domain, snapshot_name, export_path, incremental_parent=incremental_parent
    )
@celery.task(name="vm.import_snapshot", bind=True, routing_key="run_on")
 def vm_import_snapshot(
    self,
    domain=None,
    snapshot_name=None,
    import_path=None,
    retain_snapshot=True,
    run_on="primary",
 ):
    @ZKConnection(config)
    def run_vm_import_snapshot(
        zkhandler, self, domain, snapshot_name, import_path, retain_snapshot=True
    ):
        return vm_worker_import_snapshot(
            zkhandler,
            self,
            domain,
            snapshot_name,
            import_path,
            retain_snapshot=retain_snapshot,
        )
    return run_vm_import_snapshot(
        self, domain, snapshot_name, import_path, retain_snapshot=retain_snapshot
    )
@celery.task(name="vm.send_snapshot", bind=True, routing_key="run_on")
 def vm_send_snapshot(
    self,
    domain=None,
    snapshot_name=None,
    destination_api_uri="",
    destination_api_key="",
    destination_api_verify_ssl=True,
    incremental_parent=None,
    destination_storage_pool=None,
    run_on="primary",
 ):
    @ZKConnection(config)
    def run_vm_send_snapshot(
        zkhandler,
        self,
        domain,
        snapshot_name,
        destination_api_uri,
        destination_api_key,
        destination_api_verify_ssl=True,
        incremental_parent=None,
        destination_storage_pool=None,
    ):
        return vm_worker_send_snapshot(
            zkhandler,
            self,
            domain,
            snapshot_name,
            destination_api_uri,
            destination_api_key,
            destination_api_verify_ssl=destination_api_verify_ssl,
            incremental_parent=incremental_parent,
            destination_storage_pool=destination_storage_pool,
        )
    return run_vm_send_snapshot(
        self,
        domain,
        snapshot_name,
        destination_api_uri,
        destination_api_key,
        destination_api_verify_ssl=destination_api_verify_ssl,
        incremental_parent=incremental_parent,
        destination_storage_pool=destination_storage_pool,
    )
@celery.task(name="vm.create_mirror", bind=True, routing_key="run_on")
 def vm_create_mirror(
    self,
    domain=None,
    destination_api_uri="",
    destination_api_key="",
    destination_api_verify_ssl=True,
    destination_storage_pool=None,
    run_on="primary",
 ):
    @ZKConnection(config)
    def run_vm_create_mirror(
        zkhandler,
        self,
        domain,
        destination_api_uri,
        destination_api_key,
        destination_api_verify_ssl=True,
        destination_storage_pool=None,
    ):
        return vm_worker_create_mirror(
            zkhandler,
            self,
            domain,
            destination_api_uri,
            destination_api_key,
            destination_api_verify_ssl=destination_api_verify_ssl,
            destination_storage_pool=destination_storage_pool,
        )
    return run_vm_create_mirror(
        self,
        domain,
        destination_api_uri,
        destination_api_key,
        destination_api_verify_ssl=destination_api_verify_ssl,
        destination_storage_pool=destination_storage_pool,
    )
@celery.task(name="vm.promote_mirror", bind=True, routing_key="run_on")
 def vm_promote_mirror(
    self,
    domain=None,
    destination_api_uri="",
    destination_api_key="",
    destination_api_verify_ssl=True,
    destination_storage_pool=None,
    remove_on_source=False,
    run_on="primary",
 ):
    @ZKConnection(config)
    def run_vm_promote_mirror(
        zkhandler,
        self,
        domain,
        destination_api_uri,
        destination_api_key,
        destination_api_verify_ssl=True,
        destination_storage_pool=None,
        remove_on_source=False,
    ):
        return vm_worker_promote_mirror(
            zkhandler,
            self,
            domain,
            destination_api_uri,
            destination_api_key,
            destination_api_verify_ssl=destination_api_verify_ssl,
            destination_storage_pool=destination_storage_pool,
            remove_on_source=remove_on_source,
        )
    return run_vm_promote_mirror(
        self,
        domain,
        destination_api_uri,
        destination_api_key,
        destination_api_verify_ssl=destination_api_verify_ssl,
        destination_storage_pool=destination_storage_pool,
        remove_on_source=remove_on_source,
    )
@celery.task(name="osd.add", bind=True, routing_key="run_on")
 def osd_add(
    self,
		`@ -0,0 +1 @@`
							{"version": "13", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "logs": "/logs", "faults": "/faults", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.health": "/ceph/health", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "logs": {"node": "", "messages": "/messages"}, "faults": {"id": "", "last_time": "/last_time", "first_time": "/first_time", "ack_time": "/ack_time", "status": "/status", "delta": "/delta", "message": "/message"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "data.pvc_version": "/pvcversion", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword", "sriov": "/sriov", "sriov.pf": "/sriov/pf", "sriov.vf": "/sriov/vf", "monitoring.plugins": "/monitoring_plugins", "monitoring.data": "/monitoring_data", "monitoring.health": "/monitoring_health", "network.stats": "/network_stats"}, "monitoring_plugin": {"name": "", "last_run": "/last_run", "health_delta": "/health_delta", "message": "/message", "data": "/data", "runtime": "/runtime"}, "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"}, "sriov_vf": {"phy": "", "pf": "/pf", "mtu": "/mtu", "mac": "/mac", "phy_mac": "/phy_mac", "config": "/config", "config.vlan_id": "/config/vlan_id", "config.vlan_qos": "/config/vlan_qos", "config.tx_rate_min": "/config/tx_rate_min", "config.tx_rate_max": "/config/tx_rate_max", "config.spoof_check": "/config/spoof_check", "config.link_state": "/config/link_state", "config.trust": "/config/trust", "config.query_rss": "/config/query_rss", "pci": "/pci", "pci.domain": "/pci/domain", "pci.bus": "/pci/bus", "pci.slot": "/pci/slot", "pci.function": "/pci/function", "used": "/used", "used_by": "/used_by"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.migrate_max_downtime": "/migration_max_downtime", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "meta.tags": "/tags", "migrate.sync_lock": "/migrate_sync_lock"}, "tag": {"name": "", "type": "/type", "protected": "/protected"}, "network": {"vni": "", "type": "/nettype", "mtu": "/mtu", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "db_device": "/db_device", "fsid": "/fsid", "ofsid": "/fsid/osd", "cfsid": "/fsid/cluster", "lvm": "/lvm", "vg": "/lvm/vg", "lv": "/lvm/lv", "is_split": "/is_split", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "tier": "/tier", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}
		`@ -0,0 +1 @@`
							{"version": "14", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "logs": "/logs", "faults": "/faults", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.health": "/ceph/health", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "logs": {"node": "", "messages": "/messages"}, "faults": {"id": "", "last_time": "/last_time", "first_time": "/first_time", "ack_time": "/ack_time", "status": "/status", "delta": "/delta", "message": "/message"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "data.pvc_version": "/pvcversion", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword", "sriov": "/sriov", "sriov.pf": "/sriov/pf", "sriov.vf": "/sriov/vf", "monitoring.plugins": "/monitoring_plugins", "monitoring.data": "/monitoring_data", "monitoring.health": "/monitoring_health", "network.stats": "/network_stats"}, "monitoring_plugin": {"name": "", "last_run": "/last_run", "health_delta": "/health_delta", "message": "/message", "data": "/data", "runtime": "/runtime"}, "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"}, "sriov_vf": {"phy": "", "pf": "/pf", "mtu": "/mtu", "mac": "/mac", "phy_mac": "/phy_mac", "config": "/config", "config.vlan_id": "/config/vlan_id", "config.vlan_qos": "/config/vlan_qos", "config.tx_rate_min": "/config/tx_rate_min", "config.tx_rate_max": "/config/tx_rate_max", "config.spoof_check": "/config/spoof_check", "config.link_state": "/config/link_state", "config.trust": "/config/trust", "config.query_rss": "/config/query_rss", "pci": "/pci", "pci.domain": "/pci/domain", "pci.bus": "/pci/bus", "pci.slot": "/pci/slot", "pci.function": "/pci/function", "used": "/used", "used_by": "/used_by"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.migrate_max_downtime": "/migration_max_downtime", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "meta.tags": "/tags", "migrate.sync_lock": "/migrate_sync_lock", "snapshots": "/snapshots"}, "tag": {"name": "", "type": "/type", "protected": "/protected"}, "domain_snapshot": {"name": "", "timestamp": "/timestamp", "xml": "/xml", "rbd_snapshots": "/rbdsnaplist"}, "network": {"vni": "", "type": "/nettype", "mtu": "/mtu", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "db_device": "/db_device", "fsid": "/fsid", "ofsid": "/fsid/osd", "cfsid": "/fsid/cluster", "lvm": "/lvm", "vg": "/lvm/vg", "lv": "/lvm/lv", "is_split": "/is_split", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "tier": "/tier", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}
		`@ -0,0 +1 @@`
							{"version": "15", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.fence_lock": "/config/fence_lock", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "logs": "/logs", "faults": "/faults", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.health": "/ceph/health", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "logs": {"node": "", "messages": "/messages"}, "faults": {"id": "", "last_time": "/last_time", "first_time": "/first_time", "ack_time": "/ack_time", "status": "/status", "delta": "/delta", "message": "/message"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "data.pvc_version": "/pvcversion", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword", "sriov": "/sriov", "sriov.pf": "/sriov/pf", "sriov.vf": "/sriov/vf", "monitoring.plugins": "/monitoring_plugins", "monitoring.data": "/monitoring_data", "monitoring.health": "/monitoring_health", "network.stats": "/network_stats"}, "monitoring_plugin": {"name": "", "last_run": "/last_run", "health_delta": "/health_delta", "message": "/message", "data": "/data", "runtime": "/runtime"}, "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"}, "sriov_vf": {"phy": "", "pf": "/pf", "mtu": "/mtu", "mac": "/mac", "phy_mac": "/phy_mac", "config": "/config", "config.vlan_id": "/config/vlan_id", "config.vlan_qos": "/config/vlan_qos", "config.tx_rate_min": "/config/tx_rate_min", "config.tx_rate_max": "/config/tx_rate_max", "config.spoof_check": "/config/spoof_check", "config.link_state": "/config/link_state", "config.trust": "/config/trust", "config.query_rss": "/config/query_rss", "pci": "/pci", "pci.domain": "/pci/domain", "pci.bus": "/pci/bus", "pci.slot": "/pci/slot", "pci.function": "/pci/function", "used": "/used", "used_by": "/used_by"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.migrate_max_downtime": "/migration_max_downtime", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "meta.tags": "/tags", "migrate.sync_lock": "/migrate_sync_lock", "snapshots": "/snapshots"}, "tag": {"name": "", "type": "/type", "protected": "/protected"}, "domain_snapshot": {"name": "", "timestamp": "/timestamp", "xml": "/xml", "rbd_snapshots": "/rbdsnaplist"}, "network": {"vni": "", "type": "/nettype", "mtu": "/mtu", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "db_device": "/db_device", "fsid": "/fsid", "ofsid": "/fsid/osd", "cfsid": "/fsid/cluster", "lvm": "/lvm", "vg": "/lvm/vg", "lv": "/lvm/lv", "is_split": "/is_split", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "tier": "/tier", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}