Compare commits

...

11 Commits

Author SHA1 Message Date
35c82b5249 Bump version to 0.9.102 2024-10-17 10:48:31 -04:00
e80b797e3a Add missing sorter for detail parser 2024-10-17 10:09:49 -04:00
7c8c71dff7 Improve handling of local connections in CLI
1. Ensure the local connection is actually always present if it exists,
and stored in the store file.

2. Remove any invalid "local" store entries if present (i.e.
pvcapid.yaml entries from legacy versions).

3. Order the connection lists such that "local" is always first.

4. Improve pretty list output format such that all fields are wider if
needed
2024-10-17 09:56:54 -04:00
861fef91e3 Add modification of Monitor hosts on XML import
Missing this means clusters with different storage hosts would fail to
start silently. Ensure these are updated like the secret UUID is as
well.
2024-10-16 16:00:54 -04:00
d1fcac1f0a Bump version to 0.9.101 2024-10-15 11:39:11 -04:00
6ace2ebf6a Set expected PVC version for mirroring 2024-10-15 11:31:50 -04:00
962fba7621 Bump up startup waits slightly
Ensures there's more time for daemons (specifically Zookeeper) to start
up and synchronize between nodes.
2024-10-15 11:10:23 -04:00
49bf51da38 Fix indentation of previous fix 2024-10-15 10:57:33 -04:00
1293e8ae7e Fix bugs in lock freeing function
1. The destination state on an error was invalid; should be "stop".

2. If a lock was listed but removing it fails (because it was already
cleared somehow, this would error. In turn this would cause the VM to
not migrate and be left in an undefined state. Fix that when unlocking
is forced.
2024-10-15 10:43:52 -04:00
ae2cf8a070 Add some time for Zookeeper to synchronize 2024-10-15 10:43:44 -04:00
ab5bd3c57d Fix handling of invalid nets in list
Ensure we add the difference in length between the visual output and the
ANSI-coded output to avoid the format handler mishandling the length.
2024-10-14 12:51:02 -04:00
17 changed files with 126 additions and 34 deletions

View File

@ -1 +1 @@
0.9.100 0.9.102

View File

@ -1,5 +1,34 @@
## PVC Changelog ## PVC Changelog
###### [v0.9.102](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.102)
* [API Daemon] Ensures that received config snapshots update storage hosts in addition to secret UUIDs
* [CLI Client] Fixes several bugs around local connection handling and connection listings
###### [v0.9.101](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.101)
**New Feature**: Adds VM snapshot sending (`vm snapshot send`), VM mirroring (`vm mirror create`), and (offline) mirror promotion (`vm mirror promote`). Permits transferring VM snapshots to remote clusters, individually or repeatedly, and promoting them to active status, for disaster recovery and migration between clusters.
**Breaking Change**: Migrates the API daemon into Gunicorn when in production mode. Permits more scalable and performant operation of the API. **Requires additional dependency packages on all coordinator nodes** (`gunicorn`, `python3-gunicorn`, `python3-setuptools`); upgrade via `pvc-ansible` is strongly recommended.
**Enhancement**: Provides whole cluster utilization stats in the cluster status data. Permits better observability into the overall resource utilization of the cluster.
**Enhancement**: Adds a new storage benchmark format (v2) which includes additional resource utilization statistics. This allows for better evaluation of storage performance impact on the cluster as a whole. The updated format also permits arbitrary benchmark job names for easier parsing and tracking.
* [API Daemon] Allows scanning of new volumes added manually via other commands
* [API Daemon/CLI Client] Adds whole cluster utilization statistics to cluster status
* [API Daemon] Moves production API execution into Gunicorn
* [API Daemon] Adds a new storage benchmark format (v2) with additional resource tracking
* [API Daemon] Adds support for named storage benchmark jobs
* [API Daemon] Fixes a bug in OSD creation which would create `split` OSDs if `--osd-count` was set to 1
* [API Daemon] Adds support for the `mirror` VM state used by snapshot mirrors
* [CLI Client] Fixes several output display bugs in various commands and in Worker task outputs
* [CLI Client] Improves and shrinks the status progress bar output to support longer messages
* [API Daemon] Adds support for sending snapshots to remote clusters
* [API Daemon] Adds support for updating and promoting snapshot mirrors to remote clusters
* [Node Daemon] Improves timeouts during primary/secondary coordinator transitions to avoid deadlocks
* [Node Daemon] Improves timeouts during keepalive updates to avoid deadlocks
* [Node Daemon] Refactors fencing thread structure to ensure a single fencing task per cluster and sequential node fences to avoid potential anomalies (e.g. fencing 2 nodes simultaneously)
* [Node Daemon] Fixes a bug in fencing if VM locks were already freed, leaving VMs in an invalid state
* [Node Daemon] Increases the wait time during system startup to ensure Zookeeper has more time to synchronize
###### [v0.9.100](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.100) ###### [v0.9.100](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.100)
* [API Daemon] Improves the handling of "detect:" disk strings on newer systems by leveraging the "nvme" command * [API Daemon] Improves the handling of "detect:" disk strings on newer systems by leveraging the "nvme" command

View File

@ -1438,15 +1438,7 @@ def vm_snapshot_receive_block_createsnap(zkhandler, pool, volume, snapshot):
@ZKConnection(config) @ZKConnection(config)
def vm_snapshot_receive_config(zkhandler, snapshot, vm_config, source_snapshot=None): def vm_snapshot_receive_config(zkhandler, snapshot, vm_config, source_snapshot=None):
""" """
Receive a VM configuration from a remote system Receive a VM configuration snapshot from a remote system, and modify it to work on our system
This function requires some explanation.
We get a full JSON dump of the VM configuration as provided by `pvc vm info`. This contains all the information we
reasonably need to replicate the VM at the given snapshot, including metainformation.
First, we need to determine if this is an incremental or full send. If it's full, and the VM already exists,
this is an issue and we have to error. But this should have already happened with the RBD volumes.
""" """
def parse_unified_diff(diff_text, original_text): def parse_unified_diff(diff_text, original_text):
@ -1503,13 +1495,28 @@ def vm_snapshot_receive_config(zkhandler, snapshot, vm_config, source_snapshot=N
vm_xml = vm_config["xml"] vm_xml = vm_config["xml"]
vm_xml_diff = "\n".join(current_snapshot["xml_diff_lines"]) vm_xml_diff = "\n".join(current_snapshot["xml_diff_lines"])
snapshot_vm_xml = parse_unified_diff(vm_xml_diff, vm_xml) snapshot_vm_xml = parse_unified_diff(vm_xml_diff, vm_xml)
xml_data = etree.fromstring(snapshot_vm_xml)
# Replace the Ceph storage secret UUID with this cluster's # Replace the Ceph storage secret UUID with this cluster's
our_ceph_secret_uuid = config["ceph_secret_uuid"] our_ceph_secret_uuid = config["ceph_secret_uuid"]
xml_data = etree.fromstring(snapshot_vm_xml)
ceph_secrets = xml_data.xpath("//secret[@type='ceph']") ceph_secrets = xml_data.xpath("//secret[@type='ceph']")
for ceph_secret in ceph_secrets: for ceph_secret in ceph_secrets:
ceph_secret.set("uuid", our_ceph_secret_uuid) ceph_secret.set("uuid", our_ceph_secret_uuid)
# Replace the Ceph source hosts with this cluster's
our_ceph_storage_hosts = config["storage_hosts"]
our_ceph_storage_port = str(config["ceph_monitor_port"])
ceph_sources = xml_data.xpath("//source[@protocol='rbd']")
for ceph_source in ceph_sources:
for host in ceph_source.xpath("host"):
ceph_source.remove(host)
for ceph_storage_host in our_ceph_storage_hosts:
new_host = etree.Element("host")
new_host.set("name", ceph_storage_host)
new_host.set("port", our_ceph_storage_port)
ceph_source.append(new_host)
# Regenerate the VM XML
snapshot_vm_xml = etree.tostring(xml_data, pretty_print=True).decode("utf8") snapshot_vm_xml = etree.tostring(xml_data, pretty_print=True).decode("utf8")
if ( if (

View File

@ -905,7 +905,7 @@ def cli_connection_list_format_pretty(CLI_CONFIG, data):
# Parse each connection and adjust field lengths # Parse each connection and adjust field lengths
for connection in data: for connection in data:
for field, length in [(f, fields[f]["length"]) for f in fields]: for field, length in [(f, fields[f]["length"]) for f in fields]:
_length = len(str(connection[field])) _length = len(str(connection[field])) + 1
if _length > length: if _length > length:
length = len(str(connection[field])) + 1 length = len(str(connection[field])) + 1
@ -1005,7 +1005,7 @@ def cli_connection_detail_format_pretty(CLI_CONFIG, data):
# Parse each connection and adjust field lengths # Parse each connection and adjust field lengths
for connection in data: for connection in data:
for field, length in [(f, fields[f]["length"]) for f in fields]: for field, length in [(f, fields[f]["length"]) for f in fields]:
_length = len(str(connection[field])) _length = len(str(connection[field])) + 1
if _length > length: if _length > length:
length = len(str(connection[field])) + 1 length = len(str(connection[field])) + 1

View File

@ -167,9 +167,17 @@ def get_store(store_path):
with open(store_file) as fh: with open(store_file) as fh:
try: try:
store_data = jload(fh) store_data = jload(fh)
return store_data
except Exception: except Exception:
return dict() store_data = dict()
if path.exists(DEFAULT_STORE_DATA["cfgfile"]):
if store_data.get("local", None) != DEFAULT_STORE_DATA:
del store_data["local"]
if "local" not in store_data.keys():
store_data["local"] = DEFAULT_STORE_DATA
update_store(store_path, store_data)
return store_data
def update_store(store_path, store_data): def update_store(store_path, store_data):

View File

@ -68,7 +68,8 @@ def cli_connection_list_parser(connections_config, show_keys_flag):
} }
) )
return connections_data # Return, ensuring local is always first
return sorted(connections_data, key=lambda x: (x.get("name") != "local"))
def cli_connection_detail_parser(connections_config): def cli_connection_detail_parser(connections_config):
@ -121,4 +122,5 @@ def cli_connection_detail_parser(connections_config):
} }
) )
return connections_data # Return, ensuring local is always first
return sorted(connections_data, key=lambda x: (x.get("name") != "local"))

View File

@ -2402,8 +2402,10 @@ def format_list(config, vm_list):
else: else:
net_invalid_list.append(False) net_invalid_list.append(False)
display_net_string_list = []
net_string_list = [] net_string_list = []
for net_idx, net_vni in enumerate(net_list): for net_idx, net_vni in enumerate(net_list):
display_net_string_list.append(net_vni)
if net_invalid_list[net_idx]: if net_invalid_list[net_idx]:
net_string_list.append( net_string_list.append(
"{}{}{}".format( "{}{}{}".format(
@ -2428,7 +2430,9 @@ def format_list(config, vm_list):
vm_state_length=vm_state_length, vm_state_length=vm_state_length,
vm_tags_length=vm_tags_length, vm_tags_length=vm_tags_length,
vm_snapshots_length=vm_snapshots_length, vm_snapshots_length=vm_snapshots_length,
vm_nets_length=vm_nets_length, vm_nets_length=vm_nets_length
+ len(",".join(net_string_list))
- len(",".join(display_net_string_list)),
vm_ram_length=vm_ram_length, vm_ram_length=vm_ram_length,
vm_vcpu_length=vm_vcpu_length, vm_vcpu_length=vm_vcpu_length,
vm_node_length=vm_node_length, vm_node_length=vm_node_length,

View File

@ -2,7 +2,7 @@ from setuptools import setup
setup( setup(
name="pvc", name="pvc",
version="0.9.100", version="0.9.102",
packages=["pvc.cli", "pvc.lib"], packages=["pvc.cli", "pvc.lib"],
install_requires=[ install_requires=[
"Click", "Click",

View File

@ -375,8 +375,11 @@ def get_parsed_configuration(config_file):
config = {**config, **config_api_ssl} config = {**config, **config_api_ssl}
# Use coordinators as storage hosts if not explicitly specified # Use coordinators as storage hosts if not explicitly specified
# These are added as FQDNs in the storage domain
if not config["storage_hosts"] or len(config["storage_hosts"]) < 1: if not config["storage_hosts"] or len(config["storage_hosts"]) < 1:
config["storage_hosts"] = config["coordinators"] config["storage_hosts"] = []
for host in config["coordinators"]:
config["storage_hosts"].append(f"{host}.{config['storage_domain']}")
# Set up our token list if specified # Set up our token list if specified
if config["api_auth_source"] == "token": if config["api_auth_source"] == "token":

View File

@ -1997,11 +1997,14 @@ def vm_worker_flush_locks(zkhandler, celery, domain, force_unlock=False):
) )
if lock_remove_retcode != 0: if lock_remove_retcode != 0:
fail( if force_unlock and "No such file or directory" in lock_remove_stderr:
celery, continue
f"Failed to free RBD lock {lock['id']} on volume {rbd}: {lock_remove_stderr}", else:
) fail(
return False celery,
f"Failed to free RBD lock {lock['id']} on volume {rbd}: {lock_remove_stderr}",
)
return False
current_stage += 1 current_stage += 1
return finish( return finish(
@ -3266,7 +3269,7 @@ def vm_worker_send_snapshot(
) )
return False return False
expected_destination_pvc_version = "0.9.100" # TODO: 0.9.101 when completed expected_destination_pvc_version = "0.9.101"
# Work around development versions # Work around development versions
current_destination_pvc_version = re.sub( current_destination_pvc_version = re.sub(
r"~git-.*", "", current_destination_pvc_version r"~git-.*", "", current_destination_pvc_version
@ -3810,7 +3813,7 @@ def vm_worker_create_mirror(
) )
return False return False
expected_destination_pvc_version = "0.9.100" # TODO: 0.9.101 when completed expected_destination_pvc_version = "0.9.101"
# Work around development versions # Work around development versions
current_destination_pvc_version = re.sub( current_destination_pvc_version = re.sub(
r"~git-.*", "", current_destination_pvc_version r"~git-.*", "", current_destination_pvc_version
@ -4406,7 +4409,7 @@ def vm_worker_promote_mirror(
) )
return False return False
expected_destination_pvc_version = "0.9.100" # TODO: 0.9.101 when completed expected_destination_pvc_version = "0.9.101"
# Work around development versions # Work around development versions
current_destination_pvc_version = re.sub( current_destination_pvc_version = re.sub(
r"~git-.*", "", current_destination_pvc_version r"~git-.*", "", current_destination_pvc_version

33
debian/changelog vendored
View File

@ -1,3 +1,36 @@
pvc (0.9.102-0) unstable; urgency=high
* [API Daemon] Ensures that received config snapshots update storage hosts in addition to secret UUIDs
* [CLI Client] Fixes several bugs around local connection handling and connection listings
-- Joshua M. Boniface <joshua@boniface.me> Thu, 17 Oct 2024 10:48:31 -0400
pvc (0.9.101-0) unstable; urgency=high
**New Feature**: Adds VM snapshot sending (`vm snapshot send`), VM mirroring (`vm mirror create`), and (offline) mirror promotion (`vm mirror promote`). Permits transferring VM snapshots to remote clusters, individually or repeatedly, and promoting them to active status, for disaster recovery and migration between clusters.
**Breaking Change**: Migrates the API daemon into Gunicorn when in production mode. Permits more scalable and performant operation of the API. **Requires additional dependency packages on all coordinator nodes** (`gunicorn`, `python3-gunicorn`, `python3-setuptools`); upgrade via `pvc-ansible` is strongly recommended.
**Enhancement**: Provides whole cluster utilization stats in the cluster status data. Permits better observability into the overall resource utilization of the cluster.
**Enhancement**: Adds a new storage benchmark format (v2) which includes additional resource utilization statistics. This allows for better evaluation of storage performance impact on the cluster as a whole. The updated format also permits arbitrary benchmark job names for easier parsing and tracking.
* [API Daemon] Allows scanning of new volumes added manually via other commands
* [API Daemon/CLI Client] Adds whole cluster utilization statistics to cluster status
* [API Daemon] Moves production API execution into Gunicorn
* [API Daemon] Adds a new storage benchmark format (v2) with additional resource tracking
* [API Daemon] Adds support for named storage benchmark jobs
* [API Daemon] Fixes a bug in OSD creation which would create `split` OSDs if `--osd-count` was set to 1
* [API Daemon] Adds support for the `mirror` VM state used by snapshot mirrors
* [CLI Client] Fixes several output display bugs in various commands and in Worker task outputs
* [CLI Client] Improves and shrinks the status progress bar output to support longer messages
* [API Daemon] Adds support for sending snapshots to remote clusters
* [API Daemon] Adds support for updating and promoting snapshot mirrors to remote clusters
* [Node Daemon] Improves timeouts during primary/secondary coordinator transitions to avoid deadlocks
* [Node Daemon] Improves timeouts during keepalive updates to avoid deadlocks
* [Node Daemon] Refactors fencing thread structure to ensure a single fencing task per cluster and sequential node fences to avoid potential anomalies (e.g. fencing 2 nodes simultaneously)
* [Node Daemon] Fixes a bug in fencing if VM locks were already freed, leaving VMs in an invalid state
* [Node Daemon] Increases the wait time during system startup to ensure Zookeeper has more time to synchronize
-- Joshua M. Boniface <joshua@boniface.me> Tue, 15 Oct 2024 11:39:11 -0400
pvc (0.9.100-0) unstable; urgency=high pvc (0.9.100-0) unstable; urgency=high
* [API Daemon] Improves the handling of "detect:" disk strings on newer systems by leveraging the "nvme" command * [API Daemon] Improves the handling of "detect:" disk strings on newer systems by leveraging the "nvme" command

View File

@ -33,7 +33,7 @@ import os
import signal import signal
# Daemon version # Daemon version
version = "0.9.100" version = "0.9.102"
########################################################## ##########################################################

View File

@ -49,7 +49,7 @@ import re
import json import json
# Daemon version # Daemon version
version = "0.9.100" version = "0.9.102"
########################################################## ##########################################################

View File

@ -247,7 +247,7 @@ def migrateFromFencedNode(zkhandler, node_name, config, logger):
) )
zkhandler.write( zkhandler.write(
{ {
(("domain.state", dom_uuid), "stopped"), (("domain.state", dom_uuid), "stop"),
(("domain.meta.autostart", dom_uuid), "True"), (("domain.meta.autostart", dom_uuid), "True"),
} }
) )

View File

@ -102,5 +102,5 @@ def start_system_services(logger, config):
start_workerd(logger, config) start_workerd(logger, config)
start_healthd(logger, config) start_healthd(logger, config)
logger.out("Waiting 5 seconds for daemons to start", state="s") logger.out("Waiting 10 seconds for daemons to start", state="s")
sleep(5) sleep(10)

View File

@ -188,3 +188,6 @@ def setup_node(logger, config, zkhandler):
(("node.count.networks", config["node_hostname"]), "0"), (("node.count.networks", config["node_hostname"]), "0"),
] ]
) )
logger.out("Waiting 5 seconds for Zookeeper to synchronize", state="s")
time.sleep(5)

View File

@ -55,7 +55,7 @@ from daemon_lib.autobackup import (
) )
# Daemon version # Daemon version
version = "0.9.100" version = "0.9.102"
config = cfg.get_configuration() config = cfg.get_configuration()