Update and streamline README and update images

Ensure node health value is an int
Fix bug in VM state list
2023-12-10 23:57:01 -05:00 · 2023-12-10 23:56:50 -05:00 · 2023-12-10 23:44:01 -05:00 · 2023-12-10 22:24:38 -05:00 · 2023-12-10 22:19:21 -05:00 · 2023-12-10 22:19:09 -05:00
24 changed files with 496 additions and 215 deletions
--- a/README.md
+++ b/README.md
@ -1,5 +1,5 @@
 <p align="center">
-<img alt="Logo banner" src="docs/images/pvc_logo_black.png"/>
+<img alt="Logo banner" src="images/pvc_logo_black.png"/>
 <br/><br/>
 <a href="https://github.com/parallelvirtualcluster/pvc"><img alt="License" src="https://img.shields.io/github/license/parallelvirtualcluster/pvc"/></a>
 <a href="https://github.com/psf/black"><img alt="Code style: Black" src="https://img.shields.io/badge/code%20style-black-000000.svg"/></a>
@ -23,37 +23,58 @@ Installation of PVC is accomplished by two main components: a [Node installer IS

 Just give it physical servers, and it will run your VMs without you having to think about it, all in just an hour or two of setup time.

-
-## What is it based on?
-
-The core node and API daemons, as well as the CLI API client, are written in Python 3 and are fully Free Software (GNU GPL v3). In addition to these, PVC makes use of the following software tools to provide a holistic hyperconverged infrastructure solution:
-
-  * Debian GNU/Linux as the base OS.
-  * Linux KVM, QEMU, and Libvirt for VM management.
-  * Linux `ip`, FRRouting, NFTables, DNSMasq, and PowerDNS for network management.
-  * Ceph for storage management.
-  * Apache Zookeeper for the primary cluster state database.
-  * Patroni PostgreSQL manager for the secondary relation databases (DNS aggregation, Provisioner configuration).
-
-
 ## Getting Started

-To get started with PVC, please see the [About](https://docs.parallelvirtualcluster.org/en/latest/about/) page for general information about the project, and the [Getting Started](https://docs.parallelvirtualcluster.org/en/latest/getting-started/) page for details on configuring your first cluster.
-
+To get started with PVC, please see the [About](https://docs.parallelvirtualcluster.org/en/latest/about-pvc/) page for general information about the project, and the [Getting Started](https://docs.parallelvirtualcluster.org/en/latest/deployment/getting-started/) page for details on configuring your first cluster.

 ## Changelog

-View the changelog in [CHANGELOG.md](CHANGELOG.md).
-
+View the changelog in [CHANGELOG.md](CHANGELOG.md). **Please note that any breaking changes are announced here; ensure you read the changelog before upgrading!**

 ## Screenshots

-While PVC's API and internals aren't very screenshot-worthy, here is some example output of the CLI tool.
+These screenshots show some of the available functionality of the PVC system and CLI as of PVC v0.9.85.

-<p><img alt="Node listing" src="docs/images/pvc-nodes.png"/><br/><i>Listing the nodes in a cluster</i></p>
+<p><img alt="0. Integrated help" src="images/0-integrated-help.png"/><br/>
+<i>The CLI features an integrated, fully-featured help system to show details about every possible command.</i>
+</p>

-<p><img alt="Network listing" src="docs/images/pvc-networks.png"/><br/><i>Listing the networks in a cluster, showing 3 bridged and 1 IPv4-only managed networks</i></p>
+<p><img alt="1. Connection management" src="images/1-connection-management.png"/><br/>
+<i>A single CLI instance can manage multiple clusters, including a quick detail view, and will default to a "local" connection if an "/etc/pvc/pvc.conf" file is found; sensitive API keys are hidden by default.</i>
+</p>

-<p><img alt="VM listing and migration" src="docs/images/pvc-migration.png"/><br/><i>Listing a limited set of VMs and migrating one with status updates</i></p>
+<p><img alt="2. Cluster details and output formats" src="images/2-cluster-details-and-output-formats.png"/><br/>
+<i>PVC can show the key details of your cluster at a glance, including health, persistent fault events, and key resources; the CLI can output both in pretty human format and JSON for easier machine parsing in scripts.</i>
+</p>

-<p><img alt="Node logs" src="docs/images/pvc-nodelog.png"/><br/><i>Viewing the logs of a node (keepalives and VM [un]migration)</i></p>
+<p><img alt="3. Node information" src="images/3-node-information.png"/><br/>
+<i>PVC can show details about the nodes in the cluster, including their live health and resource utilization.</i>
+</p>
+
+<p><img alt="4. VM information" src="images/4-vm-information.png"/><br/>
+<i>PVC can show details about the VMs in the cluster, including their state, resource allocations</i>
+</p>
+
+<p><img alt="5. VM details" src="images/5-vm-details.png"/><br/>
+<i>In addition to the above basic details, PVC can also show extensive information about a running VM's devices and other resource utilization.</i>
+</p>
+
+<p><img alt="6. Network information" src="images/6-network-information.png"/><br/>
+<i>PVC has two major client network types, and ensures a consistent configuration of client networks across the entire cluster; managed networks can feature DHCP, DNS, firewall, and other functionality including DHCP reservations.</i>
+</p>
+
+<p><img alt="7. Storage information" src="images/7-storage-information.png"/><br/>
+<i>PVC provides a convenient abstracted view of the underlying Ceph system and can manage all core aspects of it.</i>
+</p>
+
+<p><img alt="8. VM and node logs" src="images/8-vm-and-node-logs.png"/><br/>
+<i>PVC can display logs from VM serial consoles (if properly configured) and nodes in-client to facilitate quick troubleshooting.</i>
+</p>
+
+<p><img alt="9. VM and worker tasks" src="images/9-vm-and-worker-tasks.png"/><br/>
+<i>PVC provides full VM lifecycle management, as well as long-running worker-based commands (in this example, clearing a VM's storage locks).</i>
+</p>
+
+<p><img alt="10. Provisioner" src="images/10-provisioner.png"/><br/>
+<i>PVC features an extensively customizable and configurable VM provisioner system, including EC2-compatible CloudInit support, allowing you to define flexible VM profiles and provision new VMs with a single command.</i>
+</p>
--- a/api-daemon/pvcapid/helper.py
+++ b/api-daemon/pvcapid/helper.py
@ -136,21 +136,10 @@ def cluster_metrics(zkhandler):
    if not status_retflag:
        return "Error: Status data threw error", 400

-    faults_retflag, faults_data = pvc_faults.get_list(zkhandler)
-    if not faults_retflag:
-        return "Error: Faults data threw error", 400
-
-    node_retflag, node_data = pvc_node.get_list(zkhandler)
-    if not node_retflag:
-        return "Error: Node data threw error", 400
-
-    vm_retflag, vm_data = pvc_vm.get_list(zkhandler)
-    if not vm_retflag:
-        return "Error: VM data threw error", 400
-
-    osd_retflag, osd_data = pvc_ceph.get_list_osd(zkhandler)
-    if not osd_retflag:
-        return "Error: OSD data threw error", 400
+    faults_data = status_data["detail"]["faults"]
+    node_data = status_data["detail"]["node"]
+    vm_data = status_data["detail"]["vm"]
+    osd_data = status_data["detail"]["osd"]

    output_lines = list()

@ -237,7 +226,7 @@ def cluster_metrics(zkhandler):
    for state in set([s.split(",")[0] for s in pvc_common.ceph_osd_state_combinations]):
        osd_up_state_map[state] = 0
    for osd in osd_data:
-        if osd["stats"]["up"] > 0:
+        if osd["up"] == "up":
            osd_up_state_map["up"] += 1
        else:
            osd_up_state_map["down"] += 1
@ -252,7 +241,7 @@ def cluster_metrics(zkhandler):
    for state in set([s.split(",")[1] for s in pvc_common.ceph_osd_state_combinations]):
        osd_in_state_map[state] = 0
    for osd in osd_data:
-        if osd["stats"]["in"] > 0:
+        if osd["in"] == "in":
            osd_in_state_map["in"] += 1
        else:
            osd_in_state_map["out"] += 1
--- a/daemon-common/ceph.py
+++ b/daemon-common/ceph.py
@ -215,14 +215,26 @@ def getClusterOSDList(zkhandler):


 def getOSDInformation(zkhandler, osd_id):
-    # Get the devices
-    osd_fsid = zkhandler.read(("osd.ofsid", osd_id))
-    osd_node = zkhandler.read(("osd.node", osd_id))
-    osd_device = zkhandler.read(("osd.device", osd_id))
-    osd_is_split = bool(strtobool(zkhandler.read(("osd.is_split", osd_id))))
-    osd_db_device = zkhandler.read(("osd.db_device", osd_id))
+    (
+        osd_fsid,
+        osd_node,
+        osd_device,
+        _osd_is_split,
+        osd_db_device,
+        osd_stats_raw,
+    ) = zkhandler.read_many(
+        [
+            ("osd.ofsid", osd_id),
+            ("osd.node", osd_id),
+            ("osd.device", osd_id),
+            ("osd.is_split", osd_id),
+            ("osd.db_device", osd_id),
+            ("osd.stats", osd_id),
+        ]
+    )
+
+    osd_is_split = bool(strtobool(_osd_is_split))
    # Parse the stats data
-    osd_stats_raw = zkhandler.read(("osd.stats", osd_id))
    osd_stats = dict(json.loads(osd_stats_raw))

    osd_information = {
--- a/daemon-common/cluster.py
+++ b/daemon-common/cluster.py
@ -23,10 +23,7 @@ from json import loads

 import daemon_lib.common as common
 import daemon_lib.faults as faults
-import daemon_lib.vm as pvc_vm
 import daemon_lib.node as pvc_node
-import daemon_lib.network as pvc_network
-import daemon_lib.ceph as pvc_ceph


 def set_maintenance(zkhandler, maint_state):
@ -45,9 +42,7 @@ def set_maintenance(zkhandler, maint_state):
        return True, "Successfully set cluster in normal mode"


-def getClusterHealthFromFaults(zkhandler):
-    faults_list = faults.getAllFaults(zkhandler)
-
+def getClusterHealthFromFaults(zkhandler, faults_list):
    unacknowledged_faults = [fault for fault in faults_list if fault["status"] != "ack"]

    # Generate total cluster health numbers
@ -217,20 +212,38 @@ def getClusterHealth(zkhandler, node_list, vm_list, ceph_osd_list):


 def getNodeHealth(zkhandler, node_list):
+    # Get the health state of all nodes
+    node_health_reads = list()
+    for node in node_list:
+        node_health_reads += [
+            ("node.monitoring.health", node),
+            ("node.monitoring.plugins", node),
+        ]
+    all_node_health_details = zkhandler.read_many(node_health_reads)
+    # Parse out the Node health details
    node_health = dict()
-    for index, node in enumerate(node_list):
+    for nidx, node in enumerate(node_list):
+        # Split the large list of return values by the IDX of this node
+        # Each node result is 2 fields long
+        pos_start = nidx * 2
+        pos_end = nidx * 2 + 2
+        node_health_value, node_health_plugins = tuple(
+            all_node_health_details[pos_start:pos_end]
+        )
+        node_health_details = pvc_node.getNodeHealthDetails(
+            zkhandler, node, node_health_plugins.split()
+        )
+
        node_health_messages = list()
-        node_health_value = node["health"]
-        for entry in node["health_details"]:
+        for entry in node_health_details:
            if entry["health_delta"] > 0:
                node_health_messages.append(f"'{entry['name']}': {entry['message']}")

        node_health_entry = {
-            "health": node_health_value,
+            "health": int(node_health_value),
            "messages": node_health_messages,
        }
-
-        node_health[node["name"]] = node_health_entry
+        node_health[node] = node_health_entry

    return node_health

@ -239,78 +252,146 @@ def getClusterInformation(zkhandler):
    # Get cluster maintenance state
    maintenance_state = zkhandler.read("base.config.maintenance")

-    # Get node information object list
-    retcode, node_list = pvc_node.get_list(zkhandler, None)
-
    # Get primary node
-    primary_node = common.getPrimaryNode(zkhandler)
-
-    # Get PVC version of primary node
-    pvc_version = "0.0.0"
-    for node in node_list:
-        if node["name"] == primary_node:
-            pvc_version = node["pvc_version"]
-
-    # Get vm information object list
-    retcode, vm_list = pvc_vm.get_list(zkhandler, None, None, None, None)
-
-    # Get network information object list
-    retcode, network_list = pvc_network.get_list(zkhandler, None, None)
-
-    # Get storage information object list
-    retcode, ceph_osd_list = pvc_ceph.get_list_osd(zkhandler, None)
-    retcode, ceph_pool_list = pvc_ceph.get_list_pool(zkhandler, None)
-    retcode, ceph_volume_list = pvc_ceph.get_list_volume(zkhandler, None, None)
-    retcode, ceph_snapshot_list = pvc_ceph.get_list_snapshot(
-        zkhandler, None, None, None
+    maintenance_state, primary_node = zkhandler.read_many(
+        [
+            ("base.config.maintenance"),
+            ("base.config.primary_node"),
+        ]
    )

-    # Determine, for each subsection, the total count
+    # Get PVC version of primary node
+    pvc_version = zkhandler.read(("node.data.pvc_version", primary_node))
+
+    # Get the list of Nodes
+    node_list = zkhandler.children("base.node")
    node_count = len(node_list)
-    vm_count = len(vm_list)
-    network_count = len(network_list)
-    ceph_osd_count = len(ceph_osd_list)
-    ceph_pool_count = len(ceph_pool_list)
-    ceph_volume_count = len(ceph_volume_list)
-    ceph_snapshot_count = len(ceph_snapshot_list)
-
-    # Format the Node states
+    # Get the daemon and domain states of all Nodes
+    node_state_reads = list()
+    for node in node_list:
+        node_state_reads += [
+            ("node.state.daemon", node),
+            ("node.state.domain", node),
+        ]
+    all_node_states = zkhandler.read_many(node_state_reads)
+    # Parse out the Node states
+    node_data = list()
    formatted_node_states = {"total": node_count}
-    for state in common.node_state_combinations:
-        state_count = 0
-        for node in node_list:
-            node_state = f"{node['daemon_state']},{node['domain_state']}"
-            if node_state == state:
-                state_count += 1
-        if state_count > 0:
-            formatted_node_states[state] = state_count
+    for nidx, node in enumerate(node_list):
+        # Split the large list of return values by the IDX of this node
+        # Each node result is 2 fields long
+        pos_start = nidx * 2
+        pos_end = nidx * 2 + 2
+        node_daemon_state, node_domain_state = tuple(all_node_states[pos_start:pos_end])
+        node_data.append(
+            {
+                "name": node,
+                "daemon_state": node_daemon_state,
+                "domain_state": node_domain_state,
+            }
+        )
+        node_state = f"{node_daemon_state},{node_domain_state}"
+        # Add to the count for this node's state
+        if node_state in common.node_state_combinations:
+            if formatted_node_states.get(node_state) is not None:
+                formatted_node_states[node_state] += 1
+            else:
+                formatted_node_states[node_state] = 1

-    # Format the VM states
+    # Get the list of VMs
+    vm_list = zkhandler.children("base.domain")
+    vm_count = len(vm_list)
+    # Get the states of all VMs
+    vm_state_reads = list()
+    for vm in vm_list:
+        vm_state_reads += [
+            ("domain", vm),
+            ("domain.state", vm),
+        ]
+    all_vm_states = zkhandler.read_many(vm_state_reads)
+    # Parse out the VM states
+    vm_data = list()
    formatted_vm_states = {"total": vm_count}
-    for state in common.vm_state_combinations:
-        state_count = 0
-        for vm in vm_list:
-            if vm["state"] == state:
-                state_count += 1
-        if state_count > 0:
-            formatted_vm_states[state] = state_count
+    for vidx, vm in enumerate(vm_list):
+        # Split the large list of return values by the IDX of this VM
+        # Each VM result is 2 field long
+        pos_start = vidx * 2
+        pos_end = vidx * 2 + 2
+        vm_name, vm_state = tuple(all_vm_states[pos_start:pos_end])
+        vm_data.append(
+            {
+                "uuid": vm,
+                "name": vm_name,
+                "state": vm_state,
+            }
+        )
+        # Add to the count for this VM's state
+        if vm_state in common.vm_state_combinations:
+            if formatted_vm_states.get(vm_state) is not None:
+                formatted_vm_states[vm_state] += 1
+            else:
+                formatted_vm_states[vm_state] = 1

-    # Format the OSD states
+    # Get the list of Ceph OSDs
+    ceph_osd_list = zkhandler.children("base.osd")
+    ceph_osd_count = len(ceph_osd_list)
+    # Get the states of all OSDs ("stat" is not a typo since we're reading stats; states are in
+    # the stats JSON object)
+    osd_stat_reads = list()
+    for osd in ceph_osd_list:
+        osd_stat_reads += [("osd.stats", osd)]
+    all_osd_stats = zkhandler.read_many(osd_stat_reads)
+    # Parse out the OSD states
+    osd_data = list()
+    formatted_osd_states = {"total": ceph_osd_count}
    up_texts = {1: "up", 0: "down"}
    in_texts = {1: "in", 0: "out"}
-    formatted_osd_states = {"total": ceph_osd_count}
-    for state in common.ceph_osd_state_combinations:
-        state_count = 0
-        for ceph_osd in ceph_osd_list:
-            ceph_osd_state = f"{up_texts[ceph_osd['stats']['up']]},{in_texts[ceph_osd['stats']['in']]}"
-            if ceph_osd_state == state:
-                state_count += 1
-        if state_count > 0:
-            formatted_osd_states[state] = state_count
+    for oidx, osd in enumerate(ceph_osd_list):
+        # Split the large list of return values by the IDX of this OSD
+        # Each OSD result is 1 field long, so just use the IDX
+        _osd_stats = all_osd_stats[oidx]
+        # We have to load this JSON object and get our up/in states from it
+        osd_stats = loads(_osd_stats)
+        # Get our states
+        osd_up = up_texts[osd_stats["up"]]
+        osd_in = in_texts[osd_stats["in"]]
+        osd_data.append(
+            {
+                "id": osd,
+                "up": osd_up,
+                "in": osd_in,
+            }
+        )
+        osd_state = f"{osd_up},{osd_in}"
+        # Add to the count for this OSD's state
+        if osd_state in common.ceph_osd_state_combinations:
+            if formatted_osd_states.get(osd_state) is not None:
+                formatted_osd_states[osd_state] += 1
+            else:
+                formatted_osd_states[osd_state] = 1
+
+    # Get the list of Networks
+    network_list = zkhandler.children("base.network")
+    network_count = len(network_list)
+
+    # Get the list of Ceph pools
+    ceph_pool_list = zkhandler.children("base.pool")
+    ceph_pool_count = len(ceph_pool_list)
+
+    # Get the list of Ceph volumes
+    ceph_volume_list = zkhandler.children("base.volume")
+    ceph_volume_count = len(ceph_volume_list)
+
+    # Get the list of Ceph snapshots
+    ceph_snapshot_list = zkhandler.children("base.snapshot")
+    ceph_snapshot_count = len(ceph_snapshot_list)
+
+    # Get the list of faults
+    faults_data = faults.getAllFaults(zkhandler)

    # Format the status data
    cluster_information = {
-        "cluster_health": getClusterHealthFromFaults(zkhandler),
+        "cluster_health": getClusterHealthFromFaults(zkhandler, faults_data),
        "node_health": getNodeHealth(zkhandler, node_list),
        "maintenance": maintenance_state,
        "primary_node": primary_node,
@ -323,6 +404,12 @@ def getClusterInformation(zkhandler):
        "pools": ceph_pool_count,
        "volumes": ceph_volume_count,
        "snapshots": ceph_snapshot_count,
+        "detail": {
+            "node": node_data,
+            "vm": vm_data,
+            "osd": osd_data,
+            "faults": faults_data,
+        },
    }

    return cluster_information
--- a/daemon-common/faults.py
+++ b/daemon-common/faults.py
@ -95,12 +95,24 @@ def getFault(zkhandler, fault_id):
        return None

    fault_id = fault_id
-    fault_last_time = zkhandler.read(("faults.last_time", fault_id))
-    fault_first_time = zkhandler.read(("faults.first_time", fault_id))
-    fault_ack_time = zkhandler.read(("faults.ack_time", fault_id))
-    fault_status = zkhandler.read(("faults.status", fault_id))
-    fault_delta = int(zkhandler.read(("faults.delta", fault_id)))
-    fault_message = zkhandler.read(("faults.message", fault_id))
+
+    (
+        fault_last_time,
+        fault_first_time,
+        fault_ack_time,
+        fault_status,
+        fault_delta,
+        fault_message,
+    ) = zkhandler.read_many(
+        [
+            ("faults.last_time", fault_id),
+            ("faults.first_time", fault_id),
+            ("faults.ack_time", fault_id),
+            ("faults.status", fault_id),
+            ("faults.delta", fault_id),
+            ("faults.message", fault_id),
+        ]
+    )

    # Acknowledged faults have a delta of 0
    if fault_ack_time != "":
@ -112,7 +124,7 @@ def getFault(zkhandler, fault_id):
        "first_reported": fault_first_time,
        "acknowledged_at": fault_ack_time,
        "status": fault_status,
-        "health_delta": fault_delta,
+        "health_delta": int(fault_delta),
        "message": fault_message,
    }

@ -126,11 +138,42 @@ def getAllFaults(zkhandler, sort_key="last_reported"):

    all_faults = zkhandler.children(("base.faults"))

-    faults_detail = list()
-
+    faults_reads = list()
    for fault_id in all_faults:
-        fault_detail = getFault(zkhandler, fault_id)
-        faults_detail.append(fault_detail)
+        faults_reads += [
+            ("faults.last_time", fault_id),
+            ("faults.first_time", fault_id),
+            ("faults.ack_time", fault_id),
+            ("faults.status", fault_id),
+            ("faults.delta", fault_id),
+            ("faults.message", fault_id),
+        ]
+    all_faults_data = list(zkhandler.read_many(faults_reads))
+
+    faults_detail = list()
+    for fidx, fault_id in enumerate(all_faults):
+        # Split the large list of return values by the IDX of this fault
+        # Each fault result is 6 fields long
+        pos_start = fidx * 6
+        pos_end = fidx * 6 + 6
+        (
+            fault_last_time,
+            fault_first_time,
+            fault_ack_time,
+            fault_status,
+            fault_delta,
+            fault_message,
+        ) = tuple(all_faults_data[pos_start:pos_end])
+        fault_output = {
+            "id": fault_id,
+            "last_reported": fault_last_time,
+            "first_reported": fault_first_time,
+            "acknowledged_at": fault_ack_time,
+            "status": fault_status,
+            "health_delta": int(fault_delta),
+            "message": fault_message,
+        }
+        faults_detail.append(fault_output)

    sorted_faults = sorted(faults_detail, key=lambda x: x[sort_key])
    # Sort newest-first for time-based sorts
--- a/daemon-common/network.py
+++ b/daemon-common/network.py
@ -142,19 +142,37 @@ def getNetworkACLs(zkhandler, vni, _direction):


 def getNetworkInformation(zkhandler, vni):
-    description = zkhandler.read(("network", vni))
-    nettype = zkhandler.read(("network.type", vni))
-    mtu = zkhandler.read(("network.mtu", vni))
-    domain = zkhandler.read(("network.domain", vni))
-    name_servers = zkhandler.read(("network.nameservers", vni))
-    ip6_network = zkhandler.read(("network.ip6.network", vni))
-    ip6_gateway = zkhandler.read(("network.ip6.gateway", vni))
-    dhcp6_flag = zkhandler.read(("network.ip6.dhcp", vni))
-    ip4_network = zkhandler.read(("network.ip4.network", vni))
-    ip4_gateway = zkhandler.read(("network.ip4.gateway", vni))
-    dhcp4_flag = zkhandler.read(("network.ip4.dhcp", vni))
-    dhcp4_start = zkhandler.read(("network.ip4.dhcp_start", vni))
-    dhcp4_end = zkhandler.read(("network.ip4.dhcp_end", vni))
+    (
+        description,
+        nettype,
+        mtu,
+        domain,
+        name_servers,
+        ip6_network,
+        ip6_gateway,
+        dhcp6_flag,
+        ip4_network,
+        ip4_gateway,
+        dhcp4_flag,
+        dhcp4_start,
+        dhcp4_end,
+    ) = zkhandler.read_many(
+        [
+            ("network", vni),
+            ("network.type", vni),
+            ("network.mtu", vni),
+            ("network.domain", vni),
+            ("network.nameservers", vni),
+            ("network.ip6.network", vni),
+            ("network.ip6.gateway", vni),
+            ("network.ip6.dhcp", vni),
+            ("network.ip4.network", vni),
+            ("network.ip4.gateway", vni),
+            ("network.ip4.dhcp", vni),
+            ("network.ip4.dhcp_start", vni),
+            ("network.ip4.dhcp_end", vni),
+        ]
+    )

    # Construct a data structure to represent the data
    network_information = {
@ -818,31 +836,45 @@ def getSRIOVVFInformation(zkhandler, node, vf):
    if not zkhandler.exists(("node.sriov.vf", node, "sriov_vf", vf)):
        return []

-    pf = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pf", vf))
-    mtu = zkhandler.read(("node.sriov.vf", node, "sriov_vf.mtu", vf))
-    mac = zkhandler.read(("node.sriov.vf", node, "sriov_vf.mac", vf))
-    vlan_id = zkhandler.read(("node.sriov.vf", node, "sriov_vf.config.vlan_id", vf))
-    vlan_qos = zkhandler.read(("node.sriov.vf", node, "sriov_vf.config.vlan_qos", vf))
-    tx_rate_min = zkhandler.read(
-        ("node.sriov.vf", node, "sriov_vf.config.tx_rate_min", vf)
+    (
+        pf,
+        mtu,
+        mac,
+        vlan_id,
+        vlan_qos,
+        tx_rate_min,
+        tx_rate_max,
+        link_state,
+        spoof_check,
+        trust,
+        query_rss,
+        pci_domain,
+        pci_bus,
+        pci_slot,
+        pci_function,
+        used,
+        used_by_domain,
+    ) = zkhandler.read_many(
+        [
+            ("node.sriov.vf", node, "sriov_vf.pf", vf),
+            ("node.sriov.vf", node, "sriov_vf.mtu", vf),
+            ("node.sriov.vf", node, "sriov_vf.mac", vf),
+            ("node.sriov.vf", node, "sriov_vf.config.vlan_id", vf),
+            ("node.sriov.vf", node, "sriov_vf.config.vlan_qos", vf),
+            ("node.sriov.vf", node, "sriov_vf.config.tx_rate_min", vf),
+            ("node.sriov.vf", node, "sriov_vf.config.tx_rate_max", vf),
+            ("node.sriov.vf", node, "sriov_vf.config.link_state", vf),
+            ("node.sriov.vf", node, "sriov_vf.config.spoof_check", vf),
+            ("node.sriov.vf", node, "sriov_vf.config.trust", vf),
+            ("node.sriov.vf", node, "sriov_vf.config.query_rss", vf),
+            ("node.sriov.vf", node, "sriov_vf.pci.domain", vf),
+            ("node.sriov.vf", node, "sriov_vf.pci.bus", vf),
+            ("node.sriov.vf", node, "sriov_vf.pci.slot", vf),
+            ("node.sriov.vf", node, "sriov_vf.pci.function", vf),
+            ("node.sriov.vf", node, "sriov_vf.used", vf),
+            ("node.sriov.vf", node, "sriov_vf.used_by", vf),
+        ]
    )
-    tx_rate_max = zkhandler.read(
-        ("node.sriov.vf", node, "sriov_vf.config.tx_rate_max", vf)
-    )
-    link_state = zkhandler.read(
-        ("node.sriov.vf", node, "sriov_vf.config.link_state", vf)
-    )
-    spoof_check = zkhandler.read(
-        ("node.sriov.vf", node, "sriov_vf.config.spoof_check", vf)
-    )
-    trust = zkhandler.read(("node.sriov.vf", node, "sriov_vf.config.trust", vf))
-    query_rss = zkhandler.read(("node.sriov.vf", node, "sriov_vf.config.query_rss", vf))
-    pci_domain = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pci.domain", vf))
-    pci_bus = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pci.bus", vf))
-    pci_slot = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pci.slot", vf))
-    pci_function = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pci.function", vf))
-    used = zkhandler.read(("node.sriov.vf", node, "sriov_vf.used", vf))
-    used_by_domain = zkhandler.read(("node.sriov.vf", node, "sriov_vf.used_by", vf))

    vf_information = {
        "phy": vf,
--- a/daemon-common/node.py
+++ b/daemon-common/node.py
@ -26,60 +26,49 @@ import json
 import daemon_lib.common as common


-def getNodeInformation(zkhandler, node_name):
-    """
-    Gather information about a node from the Zookeeper database and return a dict() containing it.
-    """
-    node_daemon_state = zkhandler.read(("node.state.daemon", node_name))
-    node_coordinator_state = zkhandler.read(("node.state.router", node_name))
-    node_domain_state = zkhandler.read(("node.state.domain", node_name))
-    node_static_data = zkhandler.read(("node.data.static", node_name)).split()
-    node_pvc_version = zkhandler.read(("node.data.pvc_version", node_name))
-    node_cpu_count = int(node_static_data[0])
-    node_kernel = node_static_data[1]
-    node_os = node_static_data[2]
-    node_arch = node_static_data[3]
-    node_vcpu_allocated = int(zkhandler.read(("node.vcpu.allocated", node_name)))
-    node_mem_total = int(zkhandler.read(("node.memory.total", node_name)))
-    node_mem_allocated = int(zkhandler.read(("node.memory.allocated", node_name)))
-    node_mem_provisioned = int(zkhandler.read(("node.memory.provisioned", node_name)))
-    node_mem_used = int(zkhandler.read(("node.memory.used", node_name)))
-    node_mem_free = int(zkhandler.read(("node.memory.free", node_name)))
-    node_load = float(zkhandler.read(("node.cpu.load", node_name)))
-    node_domains_count = int(
-        zkhandler.read(("node.count.provisioned_domains", node_name))
-    )
-    node_running_domains = zkhandler.read(("node.running_domains", node_name)).split()
-    try:
-        node_health = int(zkhandler.read(("node.monitoring.health", node_name)))
-    except Exception:
-        node_health = "N/A"
-    try:
-        node_health_plugins = zkhandler.read(
-            ("node.monitoring.plugins", node_name)
-        ).split()
-    except Exception:
-        node_health_plugins = list()
-
-    node_health_details = list()
+def getNodeHealthDetails(zkhandler, node_name, node_health_plugins):
+    plugin_reads = list()
    for plugin in node_health_plugins:
-        plugin_last_run = zkhandler.read(
-            ("node.monitoring.data", node_name, "monitoring_plugin.last_run", plugin)
-        )
-        plugin_health_delta = zkhandler.read(
+        plugin_reads += [
+            (
+                "node.monitoring.data",
+                node_name,
+                "monitoring_plugin.last_run",
+                plugin,
+            ),
            (
                "node.monitoring.data",
                node_name,
                "monitoring_plugin.health_delta",
                plugin,
-            )
-        )
-        plugin_message = zkhandler.read(
-            ("node.monitoring.data", node_name, "monitoring_plugin.message", plugin)
-        )
-        plugin_data = zkhandler.read(
-            ("node.monitoring.data", node_name, "monitoring_plugin.data", plugin)
-        )
+            ),
+            (
+                "node.monitoring.data",
+                node_name,
+                "monitoring_plugin.message",
+                plugin,
+            ),
+            (
+                "node.monitoring.data",
+                node_name,
+                "monitoring_plugin.data",
+                plugin,
+            ),
+        ]
+    all_plugin_data = list(zkhandler.read_many(plugin_reads))
+
+    node_health_details = list()
+    for pidx, plugin in enumerate(node_health_plugins):
+        # Split the large list of return values by the IDX of this plugin
+        # Each plugin result is 4 fields long
+        pos_start = pidx * 4
+        pos_end = pidx * 4 + 4
+        (
+            plugin_last_run,
+            plugin_health_delta,
+            plugin_message,
+            plugin_data,
+        ) = tuple(all_plugin_data[pos_start:pos_end])
        plugin_output = {
            "name": plugin,
            "last_run": int(plugin_last_run),
@ -89,6 +78,82 @@ def getNodeInformation(zkhandler, node_name):
        }
        node_health_details.append(plugin_output)

+    return node_health_details
+
+
+def getNodeInformation(zkhandler, node_name):
+    """
+    Gather information about a node from the Zookeeper database and return a dict() containing it.
+    """
+
+    (
+        node_daemon_state,
+        node_coordinator_state,
+        node_domain_state,
+        node_pvc_version,
+        _node_static_data,
+        _node_vcpu_allocated,
+        _node_mem_total,
+        _node_mem_allocated,
+        _node_mem_provisioned,
+        _node_mem_used,
+        _node_mem_free,
+        _node_load,
+        _node_domains_count,
+        _node_running_domains,
+        _node_health,
+        _node_health_plugins,
+    ) = zkhandler.read_many(
+        [
+            ("node.state.daemon", node_name),
+            ("node.state.router", node_name),
+            ("node.state.domain", node_name),
+            ("node.data.pvc_version", node_name),
+            ("node.data.static", node_name),
+            ("node.vcpu.allocated", node_name),
+            ("node.memory.total", node_name),
+            ("node.memory.allocated", node_name),
+            ("node.memory.provisioned", node_name),
+            ("node.memory.used", node_name),
+            ("node.memory.free", node_name),
+            ("node.cpu.load", node_name),
+            ("node.count.provisioned_domains", node_name),
+            ("node.running_domains", node_name),
+            ("node.monitoring.health", node_name),
+            ("node.monitoring.plugins", node_name),
+        ]
+    )
+
+    node_static_data = _node_static_data.split()
+    node_cpu_count = int(node_static_data[0])
+    node_kernel = node_static_data[1]
+    node_os = node_static_data[2]
+    node_arch = node_static_data[3]
+
+    node_vcpu_allocated = int(_node_vcpu_allocated)
+    node_mem_total = int(_node_mem_total)
+    node_mem_allocated = int(_node_mem_allocated)
+    node_mem_provisioned = int(_node_mem_provisioned)
+    node_mem_used = int(_node_mem_used)
+    node_mem_free = int(_node_mem_free)
+    node_load = float(_node_load)
+    node_domains_count = int(_node_domains_count)
+    node_running_domains = _node_running_domains.split()
+
+    try:
+        node_health = int(_node_health)
+    except Exception:
+        node_health = "N/A"
+
+    try:
+        node_health_plugins = _node_health_plugins.split()
+    except Exception:
+        node_health_plugins = list()
+
+    node_health_details = getNodeHealthDetails(
+        zkhandler, node_name, node_health_plugins
+    )
+
    # Construct a data structure to represent the data
    node_information = {
        "name": node_name,
--- a/daemon-common/zkhandler.py
+++ b/daemon-common/zkhandler.py
@ -19,6 +19,7 @@
 #
 ###############################################################################

+import asyncio
 import os
 import time
 import uuid
@ -239,10 +240,41 @@ class ZKHandler(object):
                # This path is invalid; this is likely due to missing schema entries, so return None
                return None

-            return self.zk_conn.get(path)[0].decode(self.encoding)
+            res = self.zk_conn.get(path)
+            return res[0].decode(self.encoding)
        except NoNodeError:
            return None

+    async def read_async(self, key):
+        """
+        Read data from a key asynchronously
+        """
+        try:
+            path = self.get_schema_path(key)
+            if path is None:
+                # This path is invalid; this is likely due to missing schema entries, so return None
+                return None
+
+            val = self.zk_conn.get_async(path)
+            data = val.get()
+            return data[0].decode(self.encoding)
+        except NoNodeError:
+            return None
+
+    async def _read_many(self, keys):
+        """
+        Async runner for read_many
+        """
+        res = await asyncio.gather(*(self.read_async(key) for key in keys))
+        return tuple(res)
+
+    def read_many(self, keys):
+        """
+        Read data from several keys, asynchronously. Returns a tuple of all key values once all
+        reads are complete.
+        """
+        return asyncio.run(self._read_many(keys))
+
    def write(self, kvpairs):
        """
        Create or update one or more keys' data
--- a/docs/images/pvc-migration.png
+++ b/docs/images/pvc-migration.png
--- a/docs/images/pvc-networks.png
+++ b/docs/images/pvc-networks.png
--- a/docs/images/pvc-nodelog.png
+++ b/docs/images/pvc-nodelog.png
--- a/docs/images/pvc-nodes.png
+++ b/docs/images/pvc-nodes.png
--- a/images/0-integrated-help.png
+++ b/images/0-integrated-help.png
--- a/images/1-connection-management.png
+++ b/images/1-connection-management.png
--- a/images/10-provisioner.png
+++ b/images/10-provisioner.png
--- a/images/2-cluster-details-and-output-formats.png
+++ b/images/2-cluster-details-and-output-formats.png
--- a/images/3-node-information.png
+++ b/images/3-node-information.png
--- a/images/4-vm-information.png
+++ b/images/4-vm-information.png
--- a/images/5-vm-details.png
+++ b/images/5-vm-details.png
--- a/images/6-network-information.png
+++ b/images/6-network-information.png
--- a/images/7-storage-information.png
+++ b/images/7-storage-information.png
--- a/images/8-vm-and-node-logs.png
+++ b/images/8-vm-and-node-logs.png
--- a/images/9-vm-and-worker-tasks.png
+++ b/images/9-vm-and-worker-tasks.png
--- a/docs/images/pvc_logo_black.png
+++ b/docs/images/pvc_logo_black.png
Author	SHA1	Message	Date
Joshua M. Boniface	ab0a1e0946	Update and streamline README and update images	2023-12-10 23:57:01 -05:00
Joshua M. Boniface	7c116b2fbc	Ensure node health value is an int	2023-12-10 23:56:50 -05:00
Joshua M. Boniface	1023c55087	Fix bug in VM state list	2023-12-10 23:44:01 -05:00
Joshua M. Boniface	9235187c6f	Port Ceph functions to read_many Only ports getOSDInformation, as all the others feature 3 or less reads which is acceptable sequentially.	2023-12-10 22:24:38 -05:00
Joshua M. Boniface	0c94f1b4f8	Port Network functions to read_many	2023-12-10 22:19:21 -05:00
Joshua M. Boniface	44a4f0e1f7	Use new info detail output instead of new lists Avoids multiple additional ZK calls by using data that is now in the status detail output.	2023-12-10 22:19:09 -05:00
Joshua M. Boniface	5d53a3e529	Add state and faults detail to cluster information We already parse this information out anyways, so might as well add it to the API output JSON. This can be leveraged by the Prometheus endpoint as well to avoid duplicate listings.	2023-12-10 17:29:32 -05:00
Joshua M. Boniface	35e22cb50f	Simplify cluster status handling This significantly simplifies cluster state handling by removing most of the superfluous get_list() calls, replacing them with basic child reads since most of them are just for a count anyways. The ones that require states simplify this down to a child read plus direct reads for the exact items required while leveraging the new read_many() function.	2023-12-10 17:05:46 -05:00
Joshua M. Boniface	a3171b666b	Split node health into separate function	2023-12-10 16:52:10 -05:00
Joshua M. Boniface	48e41d7b05	Port Faults getFault and getAllFaults to read_many	2023-12-10 16:05:16 -05:00
Joshua M. Boniface	d6aecf195e	Port Node getNodeInformation to read_many	2023-12-10 15:53:28 -05:00
Joshua M. Boniface	9329784010	Implement async ZK read function Adds a function, "read_many", which can take in multiple ZK keys and return the values from all of them, using asyncio to avoid reading sequentially. Initial tests show a marked improvement in read performance of multiple read()-heavy functions (e.g. "get_list()" functions) with this method.	2023-12-10 15:35:40 -05:00