Compare commits
12 Commits
v0.9.85
...
ab0a1e0946
Author | SHA1 | Date | |
---|---|---|---|
ab0a1e0946 | |||
7c116b2fbc | |||
1023c55087 | |||
9235187c6f | |||
0c94f1b4f8 | |||
44a4f0e1f7 | |||
5d53a3e529 | |||
35e22cb50f | |||
a3171b666b | |||
48e41d7b05 | |||
d6aecf195e | |||
9329784010 |
67
README.md
@ -1,5 +1,5 @@
|
||||
<p align="center">
|
||||
<img alt="Logo banner" src="docs/images/pvc_logo_black.png"/>
|
||||
<img alt="Logo banner" src="images/pvc_logo_black.png"/>
|
||||
<br/><br/>
|
||||
<a href="https://github.com/parallelvirtualcluster/pvc"><img alt="License" src="https://img.shields.io/github/license/parallelvirtualcluster/pvc"/></a>
|
||||
<a href="https://github.com/psf/black"><img alt="Code style: Black" src="https://img.shields.io/badge/code%20style-black-000000.svg"/></a>
|
||||
@ -23,37 +23,58 @@ Installation of PVC is accomplished by two main components: a [Node installer IS
|
||||
|
||||
Just give it physical servers, and it will run your VMs without you having to think about it, all in just an hour or two of setup time.
|
||||
|
||||
|
||||
## What is it based on?
|
||||
|
||||
The core node and API daemons, as well as the CLI API client, are written in Python 3 and are fully Free Software (GNU GPL v3). In addition to these, PVC makes use of the following software tools to provide a holistic hyperconverged infrastructure solution:
|
||||
|
||||
* Debian GNU/Linux as the base OS.
|
||||
* Linux KVM, QEMU, and Libvirt for VM management.
|
||||
* Linux `ip`, FRRouting, NFTables, DNSMasq, and PowerDNS for network management.
|
||||
* Ceph for storage management.
|
||||
* Apache Zookeeper for the primary cluster state database.
|
||||
* Patroni PostgreSQL manager for the secondary relation databases (DNS aggregation, Provisioner configuration).
|
||||
|
||||
|
||||
## Getting Started
|
||||
|
||||
To get started with PVC, please see the [About](https://docs.parallelvirtualcluster.org/en/latest/about/) page for general information about the project, and the [Getting Started](https://docs.parallelvirtualcluster.org/en/latest/getting-started/) page for details on configuring your first cluster.
|
||||
|
||||
To get started with PVC, please see the [About](https://docs.parallelvirtualcluster.org/en/latest/about-pvc/) page for general information about the project, and the [Getting Started](https://docs.parallelvirtualcluster.org/en/latest/deployment/getting-started/) page for details on configuring your first cluster.
|
||||
|
||||
## Changelog
|
||||
|
||||
View the changelog in [CHANGELOG.md](CHANGELOG.md).
|
||||
|
||||
View the changelog in [CHANGELOG.md](CHANGELOG.md). **Please note that any breaking changes are announced here; ensure you read the changelog before upgrading!**
|
||||
|
||||
## Screenshots
|
||||
|
||||
While PVC's API and internals aren't very screenshot-worthy, here is some example output of the CLI tool.
|
||||
These screenshots show some of the available functionality of the PVC system and CLI as of PVC v0.9.85.
|
||||
|
||||
<p><img alt="Node listing" src="docs/images/pvc-nodes.png"/><br/><i>Listing the nodes in a cluster</i></p>
|
||||
<p><img alt="0. Integrated help" src="images/0-integrated-help.png"/><br/>
|
||||
<i>The CLI features an integrated, fully-featured help system to show details about every possible command.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="Network listing" src="docs/images/pvc-networks.png"/><br/><i>Listing the networks in a cluster, showing 3 bridged and 1 IPv4-only managed networks</i></p>
|
||||
<p><img alt="1. Connection management" src="images/1-connection-management.png"/><br/>
|
||||
<i>A single CLI instance can manage multiple clusters, including a quick detail view, and will default to a "local" connection if an "/etc/pvc/pvc.conf" file is found; sensitive API keys are hidden by default.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="VM listing and migration" src="docs/images/pvc-migration.png"/><br/><i>Listing a limited set of VMs and migrating one with status updates</i></p>
|
||||
<p><img alt="2. Cluster details and output formats" src="images/2-cluster-details-and-output-formats.png"/><br/>
|
||||
<i>PVC can show the key details of your cluster at a glance, including health, persistent fault events, and key resources; the CLI can output both in pretty human format and JSON for easier machine parsing in scripts.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="Node logs" src="docs/images/pvc-nodelog.png"/><br/><i>Viewing the logs of a node (keepalives and VM [un]migration)</i></p>
|
||||
<p><img alt="3. Node information" src="images/3-node-information.png"/><br/>
|
||||
<i>PVC can show details about the nodes in the cluster, including their live health and resource utilization.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="4. VM information" src="images/4-vm-information.png"/><br/>
|
||||
<i>PVC can show details about the VMs in the cluster, including their state, resource allocations</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="5. VM details" src="images/5-vm-details.png"/><br/>
|
||||
<i>In addition to the above basic details, PVC can also show extensive information about a running VM's devices and other resource utilization.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="6. Network information" src="images/6-network-information.png"/><br/>
|
||||
<i>PVC has two major client network types, and ensures a consistent configuration of client networks across the entire cluster; managed networks can feature DHCP, DNS, firewall, and other functionality including DHCP reservations.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="7. Storage information" src="images/7-storage-information.png"/><br/>
|
||||
<i>PVC provides a convenient abstracted view of the underlying Ceph system and can manage all core aspects of it.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="8. VM and node logs" src="images/8-vm-and-node-logs.png"/><br/>
|
||||
<i>PVC can display logs from VM serial consoles (if properly configured) and nodes in-client to facilitate quick troubleshooting.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="9. VM and worker tasks" src="images/9-vm-and-worker-tasks.png"/><br/>
|
||||
<i>PVC provides full VM lifecycle management, as well as long-running worker-based commands (in this example, clearing a VM's storage locks).</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="10. Provisioner" src="images/10-provisioner.png"/><br/>
|
||||
<i>PVC features an extensively customizable and configurable VM provisioner system, including EC2-compatible CloudInit support, allowing you to define flexible VM profiles and provision new VMs with a single command.</i>
|
||||
</p>
|
||||
|
@ -136,21 +136,10 @@ def cluster_metrics(zkhandler):
|
||||
if not status_retflag:
|
||||
return "Error: Status data threw error", 400
|
||||
|
||||
faults_retflag, faults_data = pvc_faults.get_list(zkhandler)
|
||||
if not faults_retflag:
|
||||
return "Error: Faults data threw error", 400
|
||||
|
||||
node_retflag, node_data = pvc_node.get_list(zkhandler)
|
||||
if not node_retflag:
|
||||
return "Error: Node data threw error", 400
|
||||
|
||||
vm_retflag, vm_data = pvc_vm.get_list(zkhandler)
|
||||
if not vm_retflag:
|
||||
return "Error: VM data threw error", 400
|
||||
|
||||
osd_retflag, osd_data = pvc_ceph.get_list_osd(zkhandler)
|
||||
if not osd_retflag:
|
||||
return "Error: OSD data threw error", 400
|
||||
faults_data = status_data["detail"]["faults"]
|
||||
node_data = status_data["detail"]["node"]
|
||||
vm_data = status_data["detail"]["vm"]
|
||||
osd_data = status_data["detail"]["osd"]
|
||||
|
||||
output_lines = list()
|
||||
|
||||
@ -237,7 +226,7 @@ def cluster_metrics(zkhandler):
|
||||
for state in set([s.split(",")[0] for s in pvc_common.ceph_osd_state_combinations]):
|
||||
osd_up_state_map[state] = 0
|
||||
for osd in osd_data:
|
||||
if osd["stats"]["up"] > 0:
|
||||
if osd["up"] == "up":
|
||||
osd_up_state_map["up"] += 1
|
||||
else:
|
||||
osd_up_state_map["down"] += 1
|
||||
@ -252,7 +241,7 @@ def cluster_metrics(zkhandler):
|
||||
for state in set([s.split(",")[1] for s in pvc_common.ceph_osd_state_combinations]):
|
||||
osd_in_state_map[state] = 0
|
||||
for osd in osd_data:
|
||||
if osd["stats"]["in"] > 0:
|
||||
if osd["in"] == "in":
|
||||
osd_in_state_map["in"] += 1
|
||||
else:
|
||||
osd_in_state_map["out"] += 1
|
||||
|
@ -215,14 +215,26 @@ def getClusterOSDList(zkhandler):
|
||||
|
||||
|
||||
def getOSDInformation(zkhandler, osd_id):
|
||||
# Get the devices
|
||||
osd_fsid = zkhandler.read(("osd.ofsid", osd_id))
|
||||
osd_node = zkhandler.read(("osd.node", osd_id))
|
||||
osd_device = zkhandler.read(("osd.device", osd_id))
|
||||
osd_is_split = bool(strtobool(zkhandler.read(("osd.is_split", osd_id))))
|
||||
osd_db_device = zkhandler.read(("osd.db_device", osd_id))
|
||||
(
|
||||
osd_fsid,
|
||||
osd_node,
|
||||
osd_device,
|
||||
_osd_is_split,
|
||||
osd_db_device,
|
||||
osd_stats_raw,
|
||||
) = zkhandler.read_many(
|
||||
[
|
||||
("osd.ofsid", osd_id),
|
||||
("osd.node", osd_id),
|
||||
("osd.device", osd_id),
|
||||
("osd.is_split", osd_id),
|
||||
("osd.db_device", osd_id),
|
||||
("osd.stats", osd_id),
|
||||
]
|
||||
)
|
||||
|
||||
osd_is_split = bool(strtobool(_osd_is_split))
|
||||
# Parse the stats data
|
||||
osd_stats_raw = zkhandler.read(("osd.stats", osd_id))
|
||||
osd_stats = dict(json.loads(osd_stats_raw))
|
||||
|
||||
osd_information = {
|
||||
|
@ -23,10 +23,7 @@ from json import loads
|
||||
|
||||
import daemon_lib.common as common
|
||||
import daemon_lib.faults as faults
|
||||
import daemon_lib.vm as pvc_vm
|
||||
import daemon_lib.node as pvc_node
|
||||
import daemon_lib.network as pvc_network
|
||||
import daemon_lib.ceph as pvc_ceph
|
||||
|
||||
|
||||
def set_maintenance(zkhandler, maint_state):
|
||||
@ -45,9 +42,7 @@ def set_maintenance(zkhandler, maint_state):
|
||||
return True, "Successfully set cluster in normal mode"
|
||||
|
||||
|
||||
def getClusterHealthFromFaults(zkhandler):
|
||||
faults_list = faults.getAllFaults(zkhandler)
|
||||
|
||||
def getClusterHealthFromFaults(zkhandler, faults_list):
|
||||
unacknowledged_faults = [fault for fault in faults_list if fault["status"] != "ack"]
|
||||
|
||||
# Generate total cluster health numbers
|
||||
@ -217,20 +212,38 @@ def getClusterHealth(zkhandler, node_list, vm_list, ceph_osd_list):
|
||||
|
||||
|
||||
def getNodeHealth(zkhandler, node_list):
|
||||
# Get the health state of all nodes
|
||||
node_health_reads = list()
|
||||
for node in node_list:
|
||||
node_health_reads += [
|
||||
("node.monitoring.health", node),
|
||||
("node.monitoring.plugins", node),
|
||||
]
|
||||
all_node_health_details = zkhandler.read_many(node_health_reads)
|
||||
# Parse out the Node health details
|
||||
node_health = dict()
|
||||
for index, node in enumerate(node_list):
|
||||
for nidx, node in enumerate(node_list):
|
||||
# Split the large list of return values by the IDX of this node
|
||||
# Each node result is 2 fields long
|
||||
pos_start = nidx * 2
|
||||
pos_end = nidx * 2 + 2
|
||||
node_health_value, node_health_plugins = tuple(
|
||||
all_node_health_details[pos_start:pos_end]
|
||||
)
|
||||
node_health_details = pvc_node.getNodeHealthDetails(
|
||||
zkhandler, node, node_health_plugins.split()
|
||||
)
|
||||
|
||||
node_health_messages = list()
|
||||
node_health_value = node["health"]
|
||||
for entry in node["health_details"]:
|
||||
for entry in node_health_details:
|
||||
if entry["health_delta"] > 0:
|
||||
node_health_messages.append(f"'{entry['name']}': {entry['message']}")
|
||||
|
||||
node_health_entry = {
|
||||
"health": node_health_value,
|
||||
"health": int(node_health_value),
|
||||
"messages": node_health_messages,
|
||||
}
|
||||
|
||||
node_health[node["name"]] = node_health_entry
|
||||
node_health[node] = node_health_entry
|
||||
|
||||
return node_health
|
||||
|
||||
@ -239,78 +252,146 @@ def getClusterInformation(zkhandler):
|
||||
# Get cluster maintenance state
|
||||
maintenance_state = zkhandler.read("base.config.maintenance")
|
||||
|
||||
# Get node information object list
|
||||
retcode, node_list = pvc_node.get_list(zkhandler, None)
|
||||
|
||||
# Get primary node
|
||||
primary_node = common.getPrimaryNode(zkhandler)
|
||||
|
||||
# Get PVC version of primary node
|
||||
pvc_version = "0.0.0"
|
||||
for node in node_list:
|
||||
if node["name"] == primary_node:
|
||||
pvc_version = node["pvc_version"]
|
||||
|
||||
# Get vm information object list
|
||||
retcode, vm_list = pvc_vm.get_list(zkhandler, None, None, None, None)
|
||||
|
||||
# Get network information object list
|
||||
retcode, network_list = pvc_network.get_list(zkhandler, None, None)
|
||||
|
||||
# Get storage information object list
|
||||
retcode, ceph_osd_list = pvc_ceph.get_list_osd(zkhandler, None)
|
||||
retcode, ceph_pool_list = pvc_ceph.get_list_pool(zkhandler, None)
|
||||
retcode, ceph_volume_list = pvc_ceph.get_list_volume(zkhandler, None, None)
|
||||
retcode, ceph_snapshot_list = pvc_ceph.get_list_snapshot(
|
||||
zkhandler, None, None, None
|
||||
maintenance_state, primary_node = zkhandler.read_many(
|
||||
[
|
||||
("base.config.maintenance"),
|
||||
("base.config.primary_node"),
|
||||
]
|
||||
)
|
||||
|
||||
# Determine, for each subsection, the total count
|
||||
# Get PVC version of primary node
|
||||
pvc_version = zkhandler.read(("node.data.pvc_version", primary_node))
|
||||
|
||||
# Get the list of Nodes
|
||||
node_list = zkhandler.children("base.node")
|
||||
node_count = len(node_list)
|
||||
vm_count = len(vm_list)
|
||||
network_count = len(network_list)
|
||||
ceph_osd_count = len(ceph_osd_list)
|
||||
ceph_pool_count = len(ceph_pool_list)
|
||||
ceph_volume_count = len(ceph_volume_list)
|
||||
ceph_snapshot_count = len(ceph_snapshot_list)
|
||||
|
||||
# Format the Node states
|
||||
# Get the daemon and domain states of all Nodes
|
||||
node_state_reads = list()
|
||||
for node in node_list:
|
||||
node_state_reads += [
|
||||
("node.state.daemon", node),
|
||||
("node.state.domain", node),
|
||||
]
|
||||
all_node_states = zkhandler.read_many(node_state_reads)
|
||||
# Parse out the Node states
|
||||
node_data = list()
|
||||
formatted_node_states = {"total": node_count}
|
||||
for state in common.node_state_combinations:
|
||||
state_count = 0
|
||||
for node in node_list:
|
||||
node_state = f"{node['daemon_state']},{node['domain_state']}"
|
||||
if node_state == state:
|
||||
state_count += 1
|
||||
if state_count > 0:
|
||||
formatted_node_states[state] = state_count
|
||||
for nidx, node in enumerate(node_list):
|
||||
# Split the large list of return values by the IDX of this node
|
||||
# Each node result is 2 fields long
|
||||
pos_start = nidx * 2
|
||||
pos_end = nidx * 2 + 2
|
||||
node_daemon_state, node_domain_state = tuple(all_node_states[pos_start:pos_end])
|
||||
node_data.append(
|
||||
{
|
||||
"name": node,
|
||||
"daemon_state": node_daemon_state,
|
||||
"domain_state": node_domain_state,
|
||||
}
|
||||
)
|
||||
node_state = f"{node_daemon_state},{node_domain_state}"
|
||||
# Add to the count for this node's state
|
||||
if node_state in common.node_state_combinations:
|
||||
if formatted_node_states.get(node_state) is not None:
|
||||
formatted_node_states[node_state] += 1
|
||||
else:
|
||||
formatted_node_states[node_state] = 1
|
||||
|
||||
# Format the VM states
|
||||
# Get the list of VMs
|
||||
vm_list = zkhandler.children("base.domain")
|
||||
vm_count = len(vm_list)
|
||||
# Get the states of all VMs
|
||||
vm_state_reads = list()
|
||||
for vm in vm_list:
|
||||
vm_state_reads += [
|
||||
("domain", vm),
|
||||
("domain.state", vm),
|
||||
]
|
||||
all_vm_states = zkhandler.read_many(vm_state_reads)
|
||||
# Parse out the VM states
|
||||
vm_data = list()
|
||||
formatted_vm_states = {"total": vm_count}
|
||||
for state in common.vm_state_combinations:
|
||||
state_count = 0
|
||||
for vm in vm_list:
|
||||
if vm["state"] == state:
|
||||
state_count += 1
|
||||
if state_count > 0:
|
||||
formatted_vm_states[state] = state_count
|
||||
for vidx, vm in enumerate(vm_list):
|
||||
# Split the large list of return values by the IDX of this VM
|
||||
# Each VM result is 2 field long
|
||||
pos_start = vidx * 2
|
||||
pos_end = vidx * 2 + 2
|
||||
vm_name, vm_state = tuple(all_vm_states[pos_start:pos_end])
|
||||
vm_data.append(
|
||||
{
|
||||
"uuid": vm,
|
||||
"name": vm_name,
|
||||
"state": vm_state,
|
||||
}
|
||||
)
|
||||
# Add to the count for this VM's state
|
||||
if vm_state in common.vm_state_combinations:
|
||||
if formatted_vm_states.get(vm_state) is not None:
|
||||
formatted_vm_states[vm_state] += 1
|
||||
else:
|
||||
formatted_vm_states[vm_state] = 1
|
||||
|
||||
# Format the OSD states
|
||||
# Get the list of Ceph OSDs
|
||||
ceph_osd_list = zkhandler.children("base.osd")
|
||||
ceph_osd_count = len(ceph_osd_list)
|
||||
# Get the states of all OSDs ("stat" is not a typo since we're reading stats; states are in
|
||||
# the stats JSON object)
|
||||
osd_stat_reads = list()
|
||||
for osd in ceph_osd_list:
|
||||
osd_stat_reads += [("osd.stats", osd)]
|
||||
all_osd_stats = zkhandler.read_many(osd_stat_reads)
|
||||
# Parse out the OSD states
|
||||
osd_data = list()
|
||||
formatted_osd_states = {"total": ceph_osd_count}
|
||||
up_texts = {1: "up", 0: "down"}
|
||||
in_texts = {1: "in", 0: "out"}
|
||||
formatted_osd_states = {"total": ceph_osd_count}
|
||||
for state in common.ceph_osd_state_combinations:
|
||||
state_count = 0
|
||||
for ceph_osd in ceph_osd_list:
|
||||
ceph_osd_state = f"{up_texts[ceph_osd['stats']['up']]},{in_texts[ceph_osd['stats']['in']]}"
|
||||
if ceph_osd_state == state:
|
||||
state_count += 1
|
||||
if state_count > 0:
|
||||
formatted_osd_states[state] = state_count
|
||||
for oidx, osd in enumerate(ceph_osd_list):
|
||||
# Split the large list of return values by the IDX of this OSD
|
||||
# Each OSD result is 1 field long, so just use the IDX
|
||||
_osd_stats = all_osd_stats[oidx]
|
||||
# We have to load this JSON object and get our up/in states from it
|
||||
osd_stats = loads(_osd_stats)
|
||||
# Get our states
|
||||
osd_up = up_texts[osd_stats["up"]]
|
||||
osd_in = in_texts[osd_stats["in"]]
|
||||
osd_data.append(
|
||||
{
|
||||
"id": osd,
|
||||
"up": osd_up,
|
||||
"in": osd_in,
|
||||
}
|
||||
)
|
||||
osd_state = f"{osd_up},{osd_in}"
|
||||
# Add to the count for this OSD's state
|
||||
if osd_state in common.ceph_osd_state_combinations:
|
||||
if formatted_osd_states.get(osd_state) is not None:
|
||||
formatted_osd_states[osd_state] += 1
|
||||
else:
|
||||
formatted_osd_states[osd_state] = 1
|
||||
|
||||
# Get the list of Networks
|
||||
network_list = zkhandler.children("base.network")
|
||||
network_count = len(network_list)
|
||||
|
||||
# Get the list of Ceph pools
|
||||
ceph_pool_list = zkhandler.children("base.pool")
|
||||
ceph_pool_count = len(ceph_pool_list)
|
||||
|
||||
# Get the list of Ceph volumes
|
||||
ceph_volume_list = zkhandler.children("base.volume")
|
||||
ceph_volume_count = len(ceph_volume_list)
|
||||
|
||||
# Get the list of Ceph snapshots
|
||||
ceph_snapshot_list = zkhandler.children("base.snapshot")
|
||||
ceph_snapshot_count = len(ceph_snapshot_list)
|
||||
|
||||
# Get the list of faults
|
||||
faults_data = faults.getAllFaults(zkhandler)
|
||||
|
||||
# Format the status data
|
||||
cluster_information = {
|
||||
"cluster_health": getClusterHealthFromFaults(zkhandler),
|
||||
"cluster_health": getClusterHealthFromFaults(zkhandler, faults_data),
|
||||
"node_health": getNodeHealth(zkhandler, node_list),
|
||||
"maintenance": maintenance_state,
|
||||
"primary_node": primary_node,
|
||||
@ -323,6 +404,12 @@ def getClusterInformation(zkhandler):
|
||||
"pools": ceph_pool_count,
|
||||
"volumes": ceph_volume_count,
|
||||
"snapshots": ceph_snapshot_count,
|
||||
"detail": {
|
||||
"node": node_data,
|
||||
"vm": vm_data,
|
||||
"osd": osd_data,
|
||||
"faults": faults_data,
|
||||
},
|
||||
}
|
||||
|
||||
return cluster_information
|
||||
|
@ -95,12 +95,24 @@ def getFault(zkhandler, fault_id):
|
||||
return None
|
||||
|
||||
fault_id = fault_id
|
||||
fault_last_time = zkhandler.read(("faults.last_time", fault_id))
|
||||
fault_first_time = zkhandler.read(("faults.first_time", fault_id))
|
||||
fault_ack_time = zkhandler.read(("faults.ack_time", fault_id))
|
||||
fault_status = zkhandler.read(("faults.status", fault_id))
|
||||
fault_delta = int(zkhandler.read(("faults.delta", fault_id)))
|
||||
fault_message = zkhandler.read(("faults.message", fault_id))
|
||||
|
||||
(
|
||||
fault_last_time,
|
||||
fault_first_time,
|
||||
fault_ack_time,
|
||||
fault_status,
|
||||
fault_delta,
|
||||
fault_message,
|
||||
) = zkhandler.read_many(
|
||||
[
|
||||
("faults.last_time", fault_id),
|
||||
("faults.first_time", fault_id),
|
||||
("faults.ack_time", fault_id),
|
||||
("faults.status", fault_id),
|
||||
("faults.delta", fault_id),
|
||||
("faults.message", fault_id),
|
||||
]
|
||||
)
|
||||
|
||||
# Acknowledged faults have a delta of 0
|
||||
if fault_ack_time != "":
|
||||
@ -112,7 +124,7 @@ def getFault(zkhandler, fault_id):
|
||||
"first_reported": fault_first_time,
|
||||
"acknowledged_at": fault_ack_time,
|
||||
"status": fault_status,
|
||||
"health_delta": fault_delta,
|
||||
"health_delta": int(fault_delta),
|
||||
"message": fault_message,
|
||||
}
|
||||
|
||||
@ -126,11 +138,42 @@ def getAllFaults(zkhandler, sort_key="last_reported"):
|
||||
|
||||
all_faults = zkhandler.children(("base.faults"))
|
||||
|
||||
faults_detail = list()
|
||||
|
||||
faults_reads = list()
|
||||
for fault_id in all_faults:
|
||||
fault_detail = getFault(zkhandler, fault_id)
|
||||
faults_detail.append(fault_detail)
|
||||
faults_reads += [
|
||||
("faults.last_time", fault_id),
|
||||
("faults.first_time", fault_id),
|
||||
("faults.ack_time", fault_id),
|
||||
("faults.status", fault_id),
|
||||
("faults.delta", fault_id),
|
||||
("faults.message", fault_id),
|
||||
]
|
||||
all_faults_data = list(zkhandler.read_many(faults_reads))
|
||||
|
||||
faults_detail = list()
|
||||
for fidx, fault_id in enumerate(all_faults):
|
||||
# Split the large list of return values by the IDX of this fault
|
||||
# Each fault result is 6 fields long
|
||||
pos_start = fidx * 6
|
||||
pos_end = fidx * 6 + 6
|
||||
(
|
||||
fault_last_time,
|
||||
fault_first_time,
|
||||
fault_ack_time,
|
||||
fault_status,
|
||||
fault_delta,
|
||||
fault_message,
|
||||
) = tuple(all_faults_data[pos_start:pos_end])
|
||||
fault_output = {
|
||||
"id": fault_id,
|
||||
"last_reported": fault_last_time,
|
||||
"first_reported": fault_first_time,
|
||||
"acknowledged_at": fault_ack_time,
|
||||
"status": fault_status,
|
||||
"health_delta": int(fault_delta),
|
||||
"message": fault_message,
|
||||
}
|
||||
faults_detail.append(fault_output)
|
||||
|
||||
sorted_faults = sorted(faults_detail, key=lambda x: x[sort_key])
|
||||
# Sort newest-first for time-based sorts
|
||||
|
@ -142,19 +142,37 @@ def getNetworkACLs(zkhandler, vni, _direction):
|
||||
|
||||
|
||||
def getNetworkInformation(zkhandler, vni):
|
||||
description = zkhandler.read(("network", vni))
|
||||
nettype = zkhandler.read(("network.type", vni))
|
||||
mtu = zkhandler.read(("network.mtu", vni))
|
||||
domain = zkhandler.read(("network.domain", vni))
|
||||
name_servers = zkhandler.read(("network.nameservers", vni))
|
||||
ip6_network = zkhandler.read(("network.ip6.network", vni))
|
||||
ip6_gateway = zkhandler.read(("network.ip6.gateway", vni))
|
||||
dhcp6_flag = zkhandler.read(("network.ip6.dhcp", vni))
|
||||
ip4_network = zkhandler.read(("network.ip4.network", vni))
|
||||
ip4_gateway = zkhandler.read(("network.ip4.gateway", vni))
|
||||
dhcp4_flag = zkhandler.read(("network.ip4.dhcp", vni))
|
||||
dhcp4_start = zkhandler.read(("network.ip4.dhcp_start", vni))
|
||||
dhcp4_end = zkhandler.read(("network.ip4.dhcp_end", vni))
|
||||
(
|
||||
description,
|
||||
nettype,
|
||||
mtu,
|
||||
domain,
|
||||
name_servers,
|
||||
ip6_network,
|
||||
ip6_gateway,
|
||||
dhcp6_flag,
|
||||
ip4_network,
|
||||
ip4_gateway,
|
||||
dhcp4_flag,
|
||||
dhcp4_start,
|
||||
dhcp4_end,
|
||||
) = zkhandler.read_many(
|
||||
[
|
||||
("network", vni),
|
||||
("network.type", vni),
|
||||
("network.mtu", vni),
|
||||
("network.domain", vni),
|
||||
("network.nameservers", vni),
|
||||
("network.ip6.network", vni),
|
||||
("network.ip6.gateway", vni),
|
||||
("network.ip6.dhcp", vni),
|
||||
("network.ip4.network", vni),
|
||||
("network.ip4.gateway", vni),
|
||||
("network.ip4.dhcp", vni),
|
||||
("network.ip4.dhcp_start", vni),
|
||||
("network.ip4.dhcp_end", vni),
|
||||
]
|
||||
)
|
||||
|
||||
# Construct a data structure to represent the data
|
||||
network_information = {
|
||||
@ -818,31 +836,45 @@ def getSRIOVVFInformation(zkhandler, node, vf):
|
||||
if not zkhandler.exists(("node.sriov.vf", node, "sriov_vf", vf)):
|
||||
return []
|
||||
|
||||
pf = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pf", vf))
|
||||
mtu = zkhandler.read(("node.sriov.vf", node, "sriov_vf.mtu", vf))
|
||||
mac = zkhandler.read(("node.sriov.vf", node, "sriov_vf.mac", vf))
|
||||
vlan_id = zkhandler.read(("node.sriov.vf", node, "sriov_vf.config.vlan_id", vf))
|
||||
vlan_qos = zkhandler.read(("node.sriov.vf", node, "sriov_vf.config.vlan_qos", vf))
|
||||
tx_rate_min = zkhandler.read(
|
||||
("node.sriov.vf", node, "sriov_vf.config.tx_rate_min", vf)
|
||||
(
|
||||
pf,
|
||||
mtu,
|
||||
mac,
|
||||
vlan_id,
|
||||
vlan_qos,
|
||||
tx_rate_min,
|
||||
tx_rate_max,
|
||||
link_state,
|
||||
spoof_check,
|
||||
trust,
|
||||
query_rss,
|
||||
pci_domain,
|
||||
pci_bus,
|
||||
pci_slot,
|
||||
pci_function,
|
||||
used,
|
||||
used_by_domain,
|
||||
) = zkhandler.read_many(
|
||||
[
|
||||
("node.sriov.vf", node, "sriov_vf.pf", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.mtu", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.mac", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.vlan_id", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.vlan_qos", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.tx_rate_min", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.tx_rate_max", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.link_state", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.spoof_check", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.trust", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.query_rss", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.pci.domain", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.pci.bus", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.pci.slot", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.pci.function", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.used", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.used_by", vf),
|
||||
]
|
||||
)
|
||||
tx_rate_max = zkhandler.read(
|
||||
("node.sriov.vf", node, "sriov_vf.config.tx_rate_max", vf)
|
||||
)
|
||||
link_state = zkhandler.read(
|
||||
("node.sriov.vf", node, "sriov_vf.config.link_state", vf)
|
||||
)
|
||||
spoof_check = zkhandler.read(
|
||||
("node.sriov.vf", node, "sriov_vf.config.spoof_check", vf)
|
||||
)
|
||||
trust = zkhandler.read(("node.sriov.vf", node, "sriov_vf.config.trust", vf))
|
||||
query_rss = zkhandler.read(("node.sriov.vf", node, "sriov_vf.config.query_rss", vf))
|
||||
pci_domain = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pci.domain", vf))
|
||||
pci_bus = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pci.bus", vf))
|
||||
pci_slot = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pci.slot", vf))
|
||||
pci_function = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pci.function", vf))
|
||||
used = zkhandler.read(("node.sriov.vf", node, "sriov_vf.used", vf))
|
||||
used_by_domain = zkhandler.read(("node.sriov.vf", node, "sriov_vf.used_by", vf))
|
||||
|
||||
vf_information = {
|
||||
"phy": vf,
|
||||
|
@ -26,60 +26,49 @@ import json
|
||||
import daemon_lib.common as common
|
||||
|
||||
|
||||
def getNodeInformation(zkhandler, node_name):
|
||||
"""
|
||||
Gather information about a node from the Zookeeper database and return a dict() containing it.
|
||||
"""
|
||||
node_daemon_state = zkhandler.read(("node.state.daemon", node_name))
|
||||
node_coordinator_state = zkhandler.read(("node.state.router", node_name))
|
||||
node_domain_state = zkhandler.read(("node.state.domain", node_name))
|
||||
node_static_data = zkhandler.read(("node.data.static", node_name)).split()
|
||||
node_pvc_version = zkhandler.read(("node.data.pvc_version", node_name))
|
||||
node_cpu_count = int(node_static_data[0])
|
||||
node_kernel = node_static_data[1]
|
||||
node_os = node_static_data[2]
|
||||
node_arch = node_static_data[3]
|
||||
node_vcpu_allocated = int(zkhandler.read(("node.vcpu.allocated", node_name)))
|
||||
node_mem_total = int(zkhandler.read(("node.memory.total", node_name)))
|
||||
node_mem_allocated = int(zkhandler.read(("node.memory.allocated", node_name)))
|
||||
node_mem_provisioned = int(zkhandler.read(("node.memory.provisioned", node_name)))
|
||||
node_mem_used = int(zkhandler.read(("node.memory.used", node_name)))
|
||||
node_mem_free = int(zkhandler.read(("node.memory.free", node_name)))
|
||||
node_load = float(zkhandler.read(("node.cpu.load", node_name)))
|
||||
node_domains_count = int(
|
||||
zkhandler.read(("node.count.provisioned_domains", node_name))
|
||||
)
|
||||
node_running_domains = zkhandler.read(("node.running_domains", node_name)).split()
|
||||
try:
|
||||
node_health = int(zkhandler.read(("node.monitoring.health", node_name)))
|
||||
except Exception:
|
||||
node_health = "N/A"
|
||||
try:
|
||||
node_health_plugins = zkhandler.read(
|
||||
("node.monitoring.plugins", node_name)
|
||||
).split()
|
||||
except Exception:
|
||||
node_health_plugins = list()
|
||||
|
||||
node_health_details = list()
|
||||
def getNodeHealthDetails(zkhandler, node_name, node_health_plugins):
|
||||
plugin_reads = list()
|
||||
for plugin in node_health_plugins:
|
||||
plugin_last_run = zkhandler.read(
|
||||
("node.monitoring.data", node_name, "monitoring_plugin.last_run", plugin)
|
||||
)
|
||||
plugin_health_delta = zkhandler.read(
|
||||
plugin_reads += [
|
||||
(
|
||||
"node.monitoring.data",
|
||||
node_name,
|
||||
"monitoring_plugin.last_run",
|
||||
plugin,
|
||||
),
|
||||
(
|
||||
"node.monitoring.data",
|
||||
node_name,
|
||||
"monitoring_plugin.health_delta",
|
||||
plugin,
|
||||
)
|
||||
)
|
||||
plugin_message = zkhandler.read(
|
||||
("node.monitoring.data", node_name, "monitoring_plugin.message", plugin)
|
||||
)
|
||||
plugin_data = zkhandler.read(
|
||||
("node.monitoring.data", node_name, "monitoring_plugin.data", plugin)
|
||||
)
|
||||
),
|
||||
(
|
||||
"node.monitoring.data",
|
||||
node_name,
|
||||
"monitoring_plugin.message",
|
||||
plugin,
|
||||
),
|
||||
(
|
||||
"node.monitoring.data",
|
||||
node_name,
|
||||
"monitoring_plugin.data",
|
||||
plugin,
|
||||
),
|
||||
]
|
||||
all_plugin_data = list(zkhandler.read_many(plugin_reads))
|
||||
|
||||
node_health_details = list()
|
||||
for pidx, plugin in enumerate(node_health_plugins):
|
||||
# Split the large list of return values by the IDX of this plugin
|
||||
# Each plugin result is 4 fields long
|
||||
pos_start = pidx * 4
|
||||
pos_end = pidx * 4 + 4
|
||||
(
|
||||
plugin_last_run,
|
||||
plugin_health_delta,
|
||||
plugin_message,
|
||||
plugin_data,
|
||||
) = tuple(all_plugin_data[pos_start:pos_end])
|
||||
plugin_output = {
|
||||
"name": plugin,
|
||||
"last_run": int(plugin_last_run),
|
||||
@ -89,6 +78,82 @@ def getNodeInformation(zkhandler, node_name):
|
||||
}
|
||||
node_health_details.append(plugin_output)
|
||||
|
||||
return node_health_details
|
||||
|
||||
|
||||
def getNodeInformation(zkhandler, node_name):
|
||||
"""
|
||||
Gather information about a node from the Zookeeper database and return a dict() containing it.
|
||||
"""
|
||||
|
||||
(
|
||||
node_daemon_state,
|
||||
node_coordinator_state,
|
||||
node_domain_state,
|
||||
node_pvc_version,
|
||||
_node_static_data,
|
||||
_node_vcpu_allocated,
|
||||
_node_mem_total,
|
||||
_node_mem_allocated,
|
||||
_node_mem_provisioned,
|
||||
_node_mem_used,
|
||||
_node_mem_free,
|
||||
_node_load,
|
||||
_node_domains_count,
|
||||
_node_running_domains,
|
||||
_node_health,
|
||||
_node_health_plugins,
|
||||
) = zkhandler.read_many(
|
||||
[
|
||||
("node.state.daemon", node_name),
|
||||
("node.state.router", node_name),
|
||||
("node.state.domain", node_name),
|
||||
("node.data.pvc_version", node_name),
|
||||
("node.data.static", node_name),
|
||||
("node.vcpu.allocated", node_name),
|
||||
("node.memory.total", node_name),
|
||||
("node.memory.allocated", node_name),
|
||||
("node.memory.provisioned", node_name),
|
||||
("node.memory.used", node_name),
|
||||
("node.memory.free", node_name),
|
||||
("node.cpu.load", node_name),
|
||||
("node.count.provisioned_domains", node_name),
|
||||
("node.running_domains", node_name),
|
||||
("node.monitoring.health", node_name),
|
||||
("node.monitoring.plugins", node_name),
|
||||
]
|
||||
)
|
||||
|
||||
node_static_data = _node_static_data.split()
|
||||
node_cpu_count = int(node_static_data[0])
|
||||
node_kernel = node_static_data[1]
|
||||
node_os = node_static_data[2]
|
||||
node_arch = node_static_data[3]
|
||||
|
||||
node_vcpu_allocated = int(_node_vcpu_allocated)
|
||||
node_mem_total = int(_node_mem_total)
|
||||
node_mem_allocated = int(_node_mem_allocated)
|
||||
node_mem_provisioned = int(_node_mem_provisioned)
|
||||
node_mem_used = int(_node_mem_used)
|
||||
node_mem_free = int(_node_mem_free)
|
||||
node_load = float(_node_load)
|
||||
node_domains_count = int(_node_domains_count)
|
||||
node_running_domains = _node_running_domains.split()
|
||||
|
||||
try:
|
||||
node_health = int(_node_health)
|
||||
except Exception:
|
||||
node_health = "N/A"
|
||||
|
||||
try:
|
||||
node_health_plugins = _node_health_plugins.split()
|
||||
except Exception:
|
||||
node_health_plugins = list()
|
||||
|
||||
node_health_details = getNodeHealthDetails(
|
||||
zkhandler, node_name, node_health_plugins
|
||||
)
|
||||
|
||||
# Construct a data structure to represent the data
|
||||
node_information = {
|
||||
"name": node_name,
|
||||
|
@ -19,6 +19,7 @@
|
||||
#
|
||||
###############################################################################
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
import time
|
||||
import uuid
|
||||
@ -239,10 +240,41 @@ class ZKHandler(object):
|
||||
# This path is invalid; this is likely due to missing schema entries, so return None
|
||||
return None
|
||||
|
||||
return self.zk_conn.get(path)[0].decode(self.encoding)
|
||||
res = self.zk_conn.get(path)
|
||||
return res[0].decode(self.encoding)
|
||||
except NoNodeError:
|
||||
return None
|
||||
|
||||
async def read_async(self, key):
|
||||
"""
|
||||
Read data from a key asynchronously
|
||||
"""
|
||||
try:
|
||||
path = self.get_schema_path(key)
|
||||
if path is None:
|
||||
# This path is invalid; this is likely due to missing schema entries, so return None
|
||||
return None
|
||||
|
||||
val = self.zk_conn.get_async(path)
|
||||
data = val.get()
|
||||
return data[0].decode(self.encoding)
|
||||
except NoNodeError:
|
||||
return None
|
||||
|
||||
async def _read_many(self, keys):
|
||||
"""
|
||||
Async runner for read_many
|
||||
"""
|
||||
res = await asyncio.gather(*(self.read_async(key) for key in keys))
|
||||
return tuple(res)
|
||||
|
||||
def read_many(self, keys):
|
||||
"""
|
||||
Read data from several keys, asynchronously. Returns a tuple of all key values once all
|
||||
reads are complete.
|
||||
"""
|
||||
return asyncio.run(self._read_many(keys))
|
||||
|
||||
def write(self, kvpairs):
|
||||
"""
|
||||
Create or update one or more keys' data
|
||||
|
Before Width: | Height: | Size: 88 KiB |
Before Width: | Height: | Size: 41 KiB |
Before Width: | Height: | Size: 300 KiB |
Before Width: | Height: | Size: 42 KiB |
BIN
images/0-integrated-help.png
Normal file
After Width: | Height: | Size: 100 KiB |
BIN
images/1-connection-management.png
Normal file
After Width: | Height: | Size: 50 KiB |
BIN
images/10-provisioner.png
Normal file
After Width: | Height: | Size: 124 KiB |
BIN
images/2-cluster-details-and-output-formats.png
Normal file
After Width: | Height: | Size: 140 KiB |
BIN
images/3-node-information.png
Normal file
After Width: | Height: | Size: 97 KiB |
BIN
images/4-vm-information.png
Normal file
After Width: | Height: | Size: 94 KiB |
BIN
images/5-vm-details.png
Normal file
After Width: | Height: | Size: 126 KiB |
BIN
images/6-network-information.png
Normal file
After Width: | Height: | Size: 118 KiB |
BIN
images/7-storage-information.png
Normal file
After Width: | Height: | Size: 166 KiB |
BIN
images/8-vm-and-node-logs.png
Normal file
After Width: | Height: | Size: 177 KiB |
BIN
images/9-vm-and-worker-tasks.png
Normal file
After Width: | Height: | Size: 67 KiB |
Before Width: | Height: | Size: 49 KiB After Width: | Height: | Size: 49 KiB |