Compare commits

..

50 Commits

Author SHA1 Message Date
f486b2d3ae Fix incorrect version 2024-12-27 21:22:28 -05:00
4190b802a5 Fix link name 2024-12-27 21:19:41 -05:00
0e48068d19 Fix links 2024-12-27 21:18:01 -05:00
98aff8fb4c Add missing step 2024-12-27 21:16:06 -05:00
f891b3c501 Fix hostnames 2024-12-27 21:13:24 -05:00
d457f74d39 Fix indenting 2024-12-27 21:12:19 -05:00
c26034381b Improve formatting 2024-12-27 21:08:17 -05:00
a782113d44 Remove header title 2024-12-27 20:46:52 -05:00
1d2b9d7a99 Add TOC for getting started guide 2024-12-27 20:45:30 -05:00
6db2201c24 Improve wording 2024-12-27 20:44:42 -05:00
1f419ddc64 Be more forceful 2024-12-27 20:37:50 -05:00
8802b28034 Fix incorrect heading 2024-12-27 20:29:46 -05:00
0abbc34ba4 Improve georedundancy documentation 2024-12-27 20:25:47 -05:00
f22fae8277 Add automirror references 2024-11-18 11:00:12 -05:00
f453bd6ac5 Add automirror and adjust snapshot age 2024-11-16 13:47:48 -05:00
ab800a666e Adjust emojis consistently and add net warning 2024-11-14 12:16:39 -05:00
8d584740ea Update Docs index 2024-10-25 23:53:18 -04:00
10ed0526d1 Update README badge order 2024-10-25 23:48:11 -04:00
03aecb1a26 Update README 2024-10-25 23:45:51 -04:00
13635eb883 Update README 2024-10-25 23:38:36 -04:00
ba46e8c3ef Update README 2024-10-25 23:29:16 -04:00
043fa0da7f Fix formatting 2024-10-25 03:03:06 -04:00
6769bdb086 Unify formatting 2024-10-25 03:01:15 -04:00
9039bc0b9d Fix indentation 2024-10-25 02:57:24 -04:00
004be3de16 Add OVA mention 2024-10-25 02:56:09 -04:00
5301e47614 Rework some wording 2024-10-25 02:54:48 -04:00
8ddd9dc965 Move the provisioner guide 2024-10-25 02:52:51 -04:00
fdabdbb52d Remove mention of obsolete worksheet 2024-10-25 02:50:47 -04:00
0e10395419 Bump the base Debian version 2024-10-25 02:50:13 -04:00
1dd8e07a55 Add notes about mirrors 2024-10-25 02:48:26 -04:00
add583472a Fix typo 2024-10-25 02:43:21 -04:00
93ef4985a5 Up to 8 spaces 2024-10-25 02:39:47 -04:00
17244a4b96 Add sidebar to API reference 2024-10-25 02:38:59 -04:00
3973d5d079 Add more indentation 2024-10-25 02:37:07 -04:00
35fe7c64b6 Add trailing spaces too 2024-10-25 02:36:39 -04:00
ecffc2412c Try more spaces 2024-10-25 02:35:07 -04:00
43bc38b998 Try more newlines 2024-10-25 02:33:50 -04:00
0aaa28d5df Try backticks 2024-10-25 02:32:22 -04:00
40647b785c Try to fix formatting of hosts example 2024-10-25 02:31:05 -04:00
968855eca8 Fix formatting 2024-10-25 02:29:31 -04:00
e81211e3c6 Fix formatting 2024-10-25 02:28:26 -04:00
c6eddb6ece Fix formatting 2024-10-25 02:25:47 -04:00
e853f972f1 Fix indents 2024-10-25 02:22:37 -04:00
9c6ed63278 Update the Getting Started documentation 2024-10-25 02:13:08 -04:00
7ee0598b08 Update swagger spec 2024-10-19 11:48:52 -04:00
f809bf8166 Update API doc for 0.9.101/102 2024-10-18 01:29:17 -04:00
1b15c92e51 Update the description of VM define endpoint 2024-10-01 13:31:07 -04:00
4dc77a66f4 Add proper response schema for 202 responses 2024-10-01 13:26:23 -04:00
f940b2ff44 Update API documentation and link 2024-09-30 20:51:12 -04:00
1c3eec48ea Update spec for upcoming release 2024-09-07 12:33:09 -04:00
11 changed files with 2909 additions and 571 deletions

View File

@ -1,10 +1,11 @@
<p align="center">
<img alt="Logo banner" src="docs/images/pvc_logo_black.png"/>
<img alt="Logo banner" src="https://docs.parallelvirtualcluster.org/en/latest/images/pvc_logo_black.png"/>
<br/><br/>
<a href="https://www.parallelvirtualcluster.org"><img alt="Website" src="https://img.shields.io/badge/visit-website-blue"/></a>
<a href="https://github.com/parallelvirtualcluster/pvc/releases"><img alt="Latest Release" src="https://img.shields.io/github/release-pre/parallelvirtualcluster/pvc"/></a>
<a href="https://docs.parallelvirtualcluster.org/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/parallelvirtualcluster/badge/?version=latest"/></a>
<a href="https://github.com/parallelvirtualcluster/pvc"><img alt="License" src="https://img.shields.io/github/license/parallelvirtualcluster/pvc"/></a>
<a href="https://github.com/psf/black"><img alt="Code style: Black" src="https://img.shields.io/badge/code%20style-black-000000.svg"/></a>
<a href="https://github.com/parallelvirtualcluster/pvc/releases"><img alt="Release" src="https://img.shields.io/github/release-pre/parallelvirtualcluster/pvc"/></a>
<a href="https://docs.parallelvirtualcluster.org/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/parallelvirtualcluster/badge/?version=latest"/></a>
</p>
## What is PVC?
@ -19,41 +20,13 @@ As a consequence of its features, PVC makes administrating very high-uptime VMs
PVC also features an optional, fully customizable VM provisioning framework, designed to automate and simplify VM deployments using custom provisioning profiles, scripts, and CloudInit userdata API support.
Installation of PVC is accomplished by two main components: a [Node installer ISO](https://github.com/parallelvirtualcluster/pvc-installer) which creates on-demand installer ISOs, and an [Ansible role framework](https://github.com/parallelvirtualcluster/pvc-ansible) to configure, bootstrap, and administrate the nodes. Installation can also be fully automated with a companion [cluster bootstrapping system](https://github.com/parallelvirtualcluster/pvc-bootstrap). Once up, the cluster is managed via an HTTP REST API, accessible via a Python Click CLI client or WebUI.
Installation of PVC is accomplished by two main components: a [Node installer ISO](https://github.com/parallelvirtualcluster/pvc-installer) which creates on-demand installer ISOs, and an [Ansible role framework](https://github.com/parallelvirtualcluster/pvc-ansible) to configure, bootstrap, and administrate the nodes. Installation can also be fully automated with a companion [cluster bootstrapping system](https://github.com/parallelvirtualcluster/pvc-bootstrap). Once up, the cluster is managed via an HTTP REST API, accessible via a Python Click CLI client ~~or WebUI~~ (eventually).
Just give it physical servers, and it will run your VMs without you having to think about it, all in just an hour or two of setup time.
More information about PVC, its motivations, the hardware requirements, and setting up and managing a cluster [can be found over at our docs page](https://docs.parallelvirtualcluster.org).
## What is it based on?
## Documentation
The core node and API daemons, as well as the CLI API client, are written in Python 3 and are fully Free Software (GNU GPL v3). In addition to these, PVC makes use of the following software tools to provide a holistic hyperconverged infrastructure solution:
This repository contains the MKdocs configuration for the https://docs.parallelvirtualcluster.org ReadTheDocs page.
* Debian GNU/Linux as the base OS.
* Linux KVM, QEMU, and Libvirt for VM management.
* Linux `ip`, FRRouting, NFTables, DNSMasq, and PowerDNS for network management.
* Ceph for storage management.
* Apache Zookeeper for the primary cluster state database.
* Patroni PostgreSQL manager for the secondary relation databases (DNS aggregation, Provisioner configuration).
## Getting Started
To get started with PVC, please see the [About](https://docs.parallelvirtualcluster.org/en/latest/about/) page for general information about the project, and the [Getting Started](https://docs.parallelvirtualcluster.org/en/latest/getting-started/) page for details on configuring your first cluster.
## Changelog
View the changelog in [CHANGELOG.md](CHANGELOG.md).
## Screenshots
While PVC's API and internals aren't very screenshot-worthy, here is some example output of the CLI tool.
<p><img alt="Node listing" src="docs/images/pvc-nodes.png"/><br/><i>Listing the nodes in a cluster</i></p>
<p><img alt="Network listing" src="docs/images/pvc-networks.png"/><br/><i>Listing the networks in a cluster, showing 3 bridged and 1 IPv4-only managed networks</i></p>
<p><img alt="VM listing and migration" src="docs/images/pvc-migration.png"/><br/><i>Listing a limited set of VMs and migrating one with status updates</i></p>
<p><img alt="Node logs" src="docs/images/pvc-nodelog.png"/><br/><i>Viewing the logs of a node (keepalives and VM [un]migration)</i></p>

View File

@ -52,8 +52,10 @@ All features are as of the latest version: <a href="https://github.com/parallelv
* Simple resource management (vCPU/memory) w/restart
* Hot attach/detach of virtual NICs and block devices
* Tag support for organization/classification
* VM hot/online backup creation, with incremental image support, management (delete), and quick restore to external storage
* VM hot/online snapshot creation (disks + configuration), with incremental image support, management (delete), and restore
* VM autobackups with self-contained backup rotation and optional automatic mounting of remote storage resources
* VM snapshot shipping to external clusters (mirroring) and mirror promotion
* VM automirrors with self-contained snapshot rotation for regular creation of mirrors
### Network Management

View File

@ -92,7 +92,7 @@ The Zookeeper database runs on the coordinator nodes, and requires a majority qu
### Patroni/PostgreSQL
PVC uses the Patroni PostgreSQL cluster manager to store relational data for use by the [Provisioner subsystem](../manuals/provisioner.md) and managed network DNS aggregation.
PVC uses the Patroni PostgreSQL cluster manager to store relational data for use by the [Provisioner subsystem](../deployment/provisioner) and managed network DNS aggregation.
The Patroni system runs on the coordinator nodes, with the primary coordinator taking on the "leader" role (read-write) and all others taking on the "follower" role (read-only). Patroni leverages Zookeeper to handle state, and is thus dependent on Zookeeper to function.
@ -156,9 +156,9 @@ Managed client networks leverage the EBGP VXLAN subsystem to provide virtual lay
PVC can provide services to clients in this network via the DNSMasq subsystem, including IPv4 and IPv6 routing, firewalling, DHCP, DNS, and NTP. An upstream router must be configured to accept and return traffic from these network(s), either via BGP or static routing, if outside access is required.
**NOTE:** Be aware of the potential for "tromboning" when routing between managed networks. All traffic to and from a managed network will flow out the primary coordinator. Thus, if there is a large amount of inter-network traffic between two managed networks, all this traffic will traverse the primary coordinator, introducing a potential bottleneck. To avoid this, keep the amount of inter-network routing between managed networks or between managed networks and the outside world to a minimum.
📝 **NOTE** Be aware of the potential for "tromboning" when routing between managed networks. All traffic to and from a managed network will flow out the primary coordinator. Thus, if there is a large amount of inter-network traffic between two managed networks, all this traffic will traverse the primary coordinator, introducing a potential bottleneck. To avoid this, keep the amount of inter-network routing between managed networks or between managed networks and the outside world to a minimum.
One major purpose of managed networks is to provide a bootstrapping mechanism for new VMs deployed using the [PVC provisioner](../manuals/provisioner.md) with CloudInit metadata services (see that documentation for details). Such deployments will require at least one managed network to provide access to the CloudInit metadata system.
One major purpose of managed networks is to provide a bootstrapping mechanism for new VMs deployed using the [PVC provisioner](../deployment/provisioner) with CloudInit metadata services (see that documentation for details). Such deployments will require at least one managed network to provide access to the CloudInit metadata system.
#### Bridged
@ -174,13 +174,13 @@ SR-IOV provides two mechanisms for directly passing underlying network devices i
SR-IOV networks require static configuration of the hypervisor nodes, both to define the PFs and to define how many VFs can be created on each PF. These options are defined with the `sriov_device` and `vfcount` options in the `pvcnoded.yaml` configuration file.
**NOTE:** Changing the PF or VF configuration cannot be done dynamically, and requires a restart of the `pvcnoded` daemon.
📝 **NOTE** Changing the PF or VF configuration cannot be done dynamically, and requires a restart of the `pvcnoded` daemon.
**NOTE:** Some SR-IOV NICs, specifically Intel NICs, cannot have the `vfcount` modified during runtime after being set. The node must be rebooted for changes to be applied.
📝 **NOTE** Some SR-IOV NICs, specifically Intel NICs, cannot have the `vfcount` modified during runtime after being set. The node must be rebooted for changes to be applied.
Once one or more PFs are configured, VFs can then be created on individual nodes via the PVC API, which can then be mapped to VMs in a 1-to-1 relationship.
**NOTE:** The administrator must be careful to ensure the allocated VFs and PFs are identical between all nodes, otherwise migration of VMs between nodes can result in incorrect network assignments.
📝 **NOTE** The administrator must be careful to ensure the allocated VFs and PFs are identical between all nodes, otherwise migration of VMs between nodes can result in incorrect network assignments.
Once VFs are created, they may be attached to VMs using one of the two strategies mentioned above. Each strategy has trade-offs, so careful consideration is required:

View File

@ -2,56 +2,88 @@
title: Georedundancy
---
A PVC cluster can have nodes in different physical locations to provide georedundancy, that is, the ability for a cluster to suffer the loss of a physical location (and all included servers). This solution, however, has some notable caveats that must be closely considered before deploying to avoid poor performance and potential [fencing](fencing.md) failures.
This page will outline the important caveats and potential solutions if possible.
Georeundancy refers to the ability of a system to run across multiple physical geographic areas, and to help tolerate the loss of one of those areas due to a catastrophic event. With respect to PVC, there are two primary types of georedundancy: single-cluster georedundancy, which covers the distribution of the nodes of a single cluster across multiple locations; and multi-cluster georedundancy, in which individual clusters are created at multiple locations and communicate at a higher level. This page will outline the implementation, important caveats and potential solutions if possible, for both kinds of georeundancy.
[TOC]
## Node Placement & Physical Design
## Single-Cluster Georedundancy
In order to form [a proper quorum](cluster-architecture.md#quorum-and-node-loss), a majority of nodes must be able to communicate for the cluster to function.
In a single-cluster georedundant design, one logical cluster can have its nodes, and specifically it's coordinator nodes, placed in different physical locations. This can help ensure that the cluster remains available even if one of the physical locations becomes unavailable, but it has multiple major caveats to consider.
2 site georedundancy is functionally worthless with a PVC cluster. Assuming 2 nodes at a "primary" site and 1 node at a "secondary" site, while the cluster could tolerate the loss of the secondary site and its single node, said single node would not be able to take over for both nodes should the primary site be down. In such a situation, the single node could not form a quorum with itself and the cluster would be inoperable. If the issue is a network cut, this would potentially be more impactful than allowing all nodes in the cut site to continue to function.
### Number of Locations
Since the nodes in a PVC cluster require a majority quorum to function, there must be at least 3 sites, of which any 2 must be able to communicate directly with each other should the 3rd fail. A single coordinator (for a 3 node cluster) would then be placed at each site.
2 site georedundancy is functionally worthless within a single PVC cluster: if the primary site were to go down, the secondary site will not have enough coordinator nodes to form a majority quorum, and the entire cluster would fail.
[![2 Site Caveats](images/pvc-georedundancy-2-site.png)](images/pvc-georedundancy-2-site.png)
In addition, a 3 site configuration configuration without a full mesh or ring, i.e. where a single site which functions as an anchor between the other two, would be a point of failure and would render the cluster non-functional if offline.
In addition, a 3 site configuration configuration without a full mesh or ring, i.e. where a single site functions as an anchor between the other two, would be a point of failure and would render the cluster non-functional if offline.
[![3 Site Caveats](images/pvc-georedundancy-broken-mesh.png)](images/pvc-georedundancy-broken-mesh.png)
Thus, the smallest useful, georedundant, physical design is 3 sites in full mesh or ring. The loss of any one site in this scenario will still allow the remain nodes to form quorum and function.
Thus, the smallest useful georedundant physical design is 3 sites in full mesh or ring. The loss of any one site in this scenario will still allow the remaining nodes to form quorum and function.
[![3 Site Solution](images/pvc-georedundancy-full-mesh.png)](images/pvc-georedundancy-full-mesh.png)
A larger cluster could theoretically span more sites, however with a maximum of 5 coordinators recommended, this many sites is likely to be overkill for the PVC solution.
A larger cluster could theoretically span 3 (as 2+2+1) or more sites, however with a maximum of 5 coordinators recommended, this many sites is likely to be overkill for the PVC solution; multi-cluster georedundancy would be a preferable solution for such a large distribution of nodes.
### Hypervisors
Since hypervisors are not affected by nor affect the quorum, any number can be placed at any site. Only compute resources would thus be affected should that site go offline. For instance, a design with one coordinator and one hypervisor at each site would provide a full 4 nodes of compute resources even if one site is offline.
Since hypervisors are not affected by nor affect the quorum, any number can be placed at any site. Only compute resources would thus be affected should that site go offline.
## Fencing
### Fencing
PVC's [fencing mechanism](fencing.md) relies entirely on network access. First, network access is required for a node to update its keepalives to the other nodes via Zookeeper. Second, IPMI out-of-band connectivity is required for the remaining nodes to fence a dead node.
Georedundancy introduces significant complications to this process. First, it makes network cuts more likely, as the cut can occur outside of a single physical location. Second, the nature of the cut means, that without backup connectivity for the IPMI functionality, any fencing attempt would fail, thus preventing automatic recovery of VMs on the cut site.
Georedundancy introduces significant complications to this process. First, it makes network cuts more likely, as the cut can now occur somewhere outside of the administrator's control (e.g. on a public utility pole, or in a provider's upstream network). Second, the nature of the cut means that without backup connectivity for the IPMI functionality, any fencing attempt would fail, thus preventing automatic recovery of VMs from the cut site onto the remaining sites. Thus, in this design, several normally-possible recovery situations become impossible to recover from automatically, up to and including any recovery at all. Situations where individual VM availability is paramount are therefore not ideally served by single-cluster georedundancy.
Thus, in this design, several normally-possible recovery situations become harder. This does not preclude georedundancy in the author's opinion, but is something that must be more carefully considered, especially in network design.
### Orphaned Site Availability
## Network Speed
It is also important to note that network cut scenarios in this case will result in the outage of the orphaned site, even if it is otherwise functional. As the single node could no longer communicate with the majority of the storage cluster, its VMs will become unresponsive and blocked from I/O. Thus, splitting a single cluster between sites like this will **not** help ensure that the cut site remains available; on the contrary, the cut site will effectively be sacrificed to preserve the *remainder* of the cluster. For instance, office workers in that location would still not be able to access services on the cluster, even if those services happening to be running in the same physical location.
### Network Speed
PVC clusters are quite network-intensive, as outlined in the [hardware requirements](hardware-requirements.md#networking) documentation. This can pose problems with multi-site clusters with slower interconnects. At least 10Gbps is recommended between nodes, and this includes nodes in different physical locations. In addition, the traffic here is bursty and dependent on VM workloads, both in terms of storage and VM migration. Thus, the site interconnects must account for the speed required of a PVC cluster in addition to any other traffic.
## Network Latency & Distance
### Network Latency & Distance
The storage write process for PVC is heavily dependent on network latency. To explain why, one must understand the process behind writes within the Ceph storage subsystem:
The storage write performance within PVC is heavily dependent on network latency. To explain why, one must understand the process behind writes within the Ceph storage subsystem:
[![Ceph Write Process](images/pvc-ceph-write-process.png)](images/pvc-ceph-write-process.png)
As illustrated in this diagram, a write will only be accepted by the client once it has been successfully written to at least `min_copies` OSDs, as defined by the pool replication level (usually 2). Thus, the latency of network communications between the client and another node becomes a major factor in storage performance for writes, as the write cannot complete without at least 4x this latency (send, ack, receive, ack). Significant physical distances and thus latencies (more than about 3ms) begin to introduce performance degradation, and latencies above about 5-10ms can result in a significant drop in write performance.
As illustrated in this diagram, a write will only be accepted by the client once it has been successfully written to at least `min_copies` OSDs, as defined by the pool replication level (usually 2). Thus, the latency of network communications between any two nodes becomes a major factor in storage performance for writes, as the write cannot complete without at least 4x this latency (send, ack, receive, ack). Significant physical distances and thus latencies (more than about 3ms) begin to introduce performance degradation, and latencies above about 5-10ms can result in a significant drop in write performance.
To combat this, georedundant nodes should be as close as possible, both geographically and physically.
To combat this, georedundant nodes should be as close as possible, ideally within 20-30km of each other at a maximum. Thus, a ring *within* a city would work well; a ring *between* cities would likely hamper performance significantly.
## Suggested Scenarios
## Overall Conclusion: Avoid Single-Cluster Georedundancy
The author cannot cover every possible option, but georedundancy must be very carefully considered. Ultimately, PVC is designed to ease several very common failure modes (maintenance outages, hardware failures, etc.); while rarer ones (fire, flood, meteor strike, etc.) might be possible to mitigate as well, these are not primary considerations in the design of PVC. It is thus up to each cluster administrator to define the correct balance between failure modes, risk, liability, and mitigations (e.g. remote backups) to best account for their particular needs.
It is the opinion of the author that the caveats of single-cluster georedundancy outweigh the benefits in almost every case. The only situation for which multi-site georedundancy provides a notable benefit is in ensuring that copies of data are stored online at multiple locations, but this can also be achieved at higher layers as well. Thus, we strongly recommend against this solution for most use-cases.
## Multi-Cluster Georedundancy
Starting with PVC version 0.9.104, the system now supports online VM snapshot transfers between clusters. This can help enable a second georedundancy mode, leveraging a full cluster in two sites, between which important VMs replicate. In addition, this design can be used with higher-layer abstractions like service-level redundancy to ensure the optimal operation of services even if an entire cluster becomes unavailable. Service-level redundancy between two clusters is not addressed here.
Multi-cluster redundancy eliminates most of the caveats of single-cluster georedundancy while permitting single-instance VMs to be safely replicated for hot availability, but introduces several additional caveats regarding promotion of VMs between clusters that must be considered before and during failure events.
### No Failover Automation
Georedundancy with multiple clusters offers no automation within the PVC system for transitioning VMs like with single-cluster fencing and recovery. If a fault occurs necessitating promotion of services to the secondary cluster, this must be completed manually by the administrator. In addition, once the primary site recovers, it must be handled carefully to re-converge the clusters (see below).
### VM Automirrors
The VM automirror subsystem must be used for proper automatic redundancy on any single-instance VMs within the cluster. A "primary" side must be selected to run the service normally, while a "secondary" site receives regular mirror snapshots to update its local copy and be ready for promotion should this be necessary. Note that controlled cutovers (for e.g. maintenance events) do not present issues aside from brief VM downtime, as a final snapshot is sent during these operations.
The automirror schedule is very important to define here. Since automirrors are point-in-time snapshots, only data at the last sent snapshot will be available on the secondary cluster. Thus, extremely frequent automirrors, on the order of hours or even minutes, are recommended. In addition note that automirrors are run on a fixed schedule for all VMs in the cluster; it is not possible to designate some VMs to run more frequently at this time.
It is also recommended that the guest OSes of any VMs set for automirror support use atomic writes if possible, as online snapshots must be crash-consistent. Most modern operating and file systems are supported, but care must be taken when using e.g. in-memory caching of writes or other similar mechanisms to avoid data loss.
### Data Loss During Transitions
VM automirror snapshots are point-in-time; for a clean promotion without data loss, the `pvc vm mirror promote` command must be used. This affects both directions:
* When promoting a VM on the secondary after a catastrophic failure of the primary (i.e. one in which `pvc vm mirror promote` cannot be used), any data written to the primary side since the last snapshot will be lost. As mentioned above, this necessitates very frequent automirror snapshots to be viable, but even with frequent snapshots some amount of data loss will occur.
* Once the secondary is promoted to become the primary manually, both clusters will consider themselves primary for the VM, should the original primary cluster recover. At that time, there will be a split-brain between the two, and one side's changes must be discarded; there is no reconciliation possible on the PVC side between the two instances. Usually, recovery here will mean the removal of the original primary's copy of the VM and a re-synchronization from the former secondary (now primary) to the original primary cluster with `pvc vm mirror create`, followed by a graceful transition with `pvc vm mirror promote`. Note that the transition will also result in additional downtime for the VM.
## Overall Conclusion: Proceed with Caution
Ultimately the potential for data loss during unplanned promotions must be carefully weighed against the benefits of manually promoting the peer cluster. For short or transient outages, it is highly likely to result in more data loss and impact than is acceptable, and thus a manual promotion should only be considered in truly catastrophic situations. In such situations, the amount of acceptable data loss must inform the timing of the automirrors, and thus how frequently snapshots are taken and transmitted. Ultimately, service-level redundancy is advised when any data loss would be catastrophic.

File diff suppressed because it is too large Load Diff

View File

@ -1,8 +1,6 @@
# PVC Provisioner Manual
# PVC Provisioner Guide
The PVC provisioner is a subsection of the main PVC API. It interfaces directly with the Zookeeper database using the common client functions, and with the Patroni PostgreSQL database to store details. The provisioner also interfaces directly with the Ceph storage cluster, for mapping volumes, creating filesystems, and installing guests.
Details of the Provisioner API interface can be found in [the API manual](/manuals/api).
The PVC provisioner is a subsection of the main PVC system, designed to aid administrators in quickly deploying virtual machines (mostly major Linux flavours) according to defined templates and profiles, leveraging CloudInit and customizable provisioning scripts, or by deploying OVA images.
- [PVC Provisioner Manual](#pvc-provisioner-manual)
* [Overview](#overview)
@ -19,15 +17,13 @@ Details of the Provisioner API interface can be found in [the API manual](/manua
## Overview
The purpose of the Provisioner API is to provide a convenient way for administrators to automate the creation of new virtual machines on the PVC cluster.
The purpose of the Provisioner is to provide a convenient way for administrators to automate the creation of new virtual machines on the PVC cluster.
The Provisioner allows the administrator to construct descriptions of VMs, called profiles, which include system resource specifications, network interfaces, disks, cloud-init userdata, and installation scripts. These profiles are highly modular, allowing the administrator to specify arbitrary combinations of the mentioned VM features with which to build new VMs.
The provisioner supports creating VMs based off of installation scripts, by cloning existing volumes, and by uploading OVA image templates to the cluster.
Examples in the following sections use the CLI exclusively for demonstration purposes. For details of the underlying API calls, please see the [API interface reference](/manuals/api-reference.html).
Use of the PVC Provisioner is not required. Administrators can always perform their own installation tasks, and the provisioner is not specially integrated, calling various other API commands as though they were run from the CLI or API.
Use of the PVC Provisioner is not required. Administrators can always perform their own installation tasks, and the provisioner is not specially integrated, calling various other commands as though they were run from the CLI or API.
# PVC Provisioner concepts
@ -218,87 +214,86 @@ As mentioned above, the `VMBuilderScript` instance includes several instance var
* `self.vm_data`: A full dictionary representation of the data provided by the PVC provisioner about the VM. Includes many useful details for crafting the VM configuration and setting up disks and networks. An example, in JSON format:
```
{
"ceph_monitor_list": [
"hv1.pvcstorage.tld",
"hv2.pvcstorage.tld",
"hv3.pvcstorage.tld"
],
"ceph_monitor_port": "6789",
"ceph_monitor_secret": "96721723-8650-4a72-b8f6-a93cd1a20f0c",
"mac_template": null,
"networks": [
{
"eth_bridge": "vmbr1001",
"id": 72,
"network_template": 69,
"vni": "1001"
},
{
"eth_bridge": "vmbr101",
"id": 73,
"network_template": 69,
"vni": "101"
}
],
"script": [contents of this file]
"script_arguments": {
"deb_mirror": "http://ftp.debian.org/debian",
"deb_release": "bullseye"
},
"system_architecture": "x86_64",
"system_details": {
"id": 78,
"migration_method": "live",
"name": "small",
"node_autostart": false,
"node_limit": null,
"node_selector": null,
"ova": null,
"serial": true,
"vcpu_count": 2,
"vnc": false,
"vnc_bind": null,
"vram_mb": 2048
},
"volumes": [
{
"disk_id": "sda",
"disk_size_gb": 4,
"filesystem": "ext4",
"filesystem_args": "-L=root",
"id": 9,
"mountpoint": "/",
"pool": "vms",
"source_volume": null,
"storage_template": 67
},
{
"disk_id": "sdb",
"disk_size_gb": 4,
"filesystem": "ext4",
"filesystem_args": "-L=var",
"id": 10,
"mountpoint": "/var",
"pool": "vms",
"source_volume": null,
"storage_template": 67
},
{
"disk_id": "sdc",
"disk_size_gb": 4,
"filesystem": "ext4",
"filesystem_args": "-L=log",
"id": 11,
"mountpoint": "/var/log",
"pool": "vms",
"source_volume": null,
"storage_template": 67
}
]
}
```
{
"ceph_monitor_list": [
"hv1.pvcstorage.tld",
"hv2.pvcstorage.tld",
"hv3.pvcstorage.tld"
],
"ceph_monitor_port": "6789",
"ceph_monitor_secret": "96721723-8650-4a72-b8f6-a93cd1a20f0c",
"mac_template": null,
"networks": [
{
"eth_bridge": "vmbr1001",
"id": 72,
"network_template": 69,
"vni": "1001"
},
{
"eth_bridge": "vmbr101",
"id": 73,
"network_template": 69,
"vni": "101"
}
],
"script": [contents of this file]
"script_arguments": {
"deb_mirror": "http://ftp.debian.org/debian",
"deb_release": "bullseye"
},
"system_architecture": "x86_64",
"system_details": {
"id": 78,
"migration_method": "live",
"name": "small",
"node_autostart": false,
"node_limit": null,
"node_selector": null,
"ova": null,
"serial": true,
"vcpu_count": 2,
"vnc": false,
"vnc_bind": null,
"vram_mb": 2048
},
"volumes": [
{
"disk_id": "sda",
"disk_size_gb": 4,
"filesystem": "ext4",
"filesystem_args": "-L=root",
"id": 9,
"mountpoint": "/",
"pool": "vms",
"source_volume": null,
"storage_template": 67
},
{
"disk_id": "sdb",
"disk_size_gb": 4,
"filesystem": "ext4",
"filesystem_args": "-L=var",
"id": 10,
"mountpoint": "/var",
"pool": "vms",
"source_volume": null,
"storage_template": 67
},
{
"disk_id": "sdc",
"disk_size_gb": 4,
"filesystem": "ext4",
"filesystem_args": "-L=log",
"id": 11,
"mountpoint": "/var/log",
"pool": "vms",
"source_volume": null,
"storage_template": 67
}
]
}
Since the `VMBuilderScript` runs within its own context but within the PVC Provisioner/API system, it is possible to use many helper libraries from the PVC system itself, including both the built-in daemon libraries (used by the API itself) and several explicit provisioning script helpers. The following are commonly-used (in the examples) imports that can be leveraged:
@ -311,9 +306,9 @@ Since the `VMBuilderScript` runs within its own context but within the PVC Provi
* `daemon_lib.common`: Part of the PVC daemon libraries, provides several common functions, including, most usefully, `run_os_command` which provides a wrapped, convenient method to call arbitrary shell/OS commands while returning a POSIX returncode, stdout, and stderr (a tuple of the 3 in that order).
* `daemon_lib.ceph`: Part of the PVC daemon libraries, provides several commands for managing Ceph RBD volumes, including, but not limited to, `clone_volume`, `add_volume`, `map_volume`, and `unmap_volume`. See the `debootstrap` example for a detailed usage example.
For safety reasons, the script runs in a modified chroot environment on the hypervisor. It will have full access to the entire / (root partition) of the hypervisor, but read-only. In addition it has read-write access to /dev, /sys, /run, and a fresh /tmp to write to; use /tmp/target (as convention) as the destination for any mounting of volumes and installation. Thus it is not possible to do things like `apt-get install`ing additional programs within a script; any such requirements must be set up before running the script (e.g. via `pvc-ansible`).
For safety reasons, the script runs in a modified chroot environment on the hypervisor. It will have full access to the entire `/` (root partition) of the hypervisor, but read-only. In addition it has read-write access to `/dev`, `/sys`, `/run`, and a fresh `/tmp` to write to; use `/tmp/target` (as convention) as the destination for any mounting of volumes and installation. Thus it is not possible to do things like `apt-get install`ing additional programs within a script; any such requirements must be set up before running the script (e.g. via `pvc-ansible`).
**WARNING**: Of course, despite this "safety" mechanism, it is VERY IMPORTANT to be cognizant that this script runs AS ROOT ON THE HYPERVISOR SYSTEM with FULL ACCESS to the cluster. You should NEVER allow arbitrary, untrusted users the ability to add or modify provisioning scripts. It is trivially easy to write scripts which will do destructive things - for example writing to arbitrary /dev objects, running arbitrary root-level commands, or importing PVC library functions to delete VMs, RBD volumes, or pools. Thus, ensure you vett and understand every script on the system, audit them regularly for both intentional and accidental malicious activity, and of course (to reiterate), do not allow untrusted script creation!
⚠️ **WARNING** Of course, despite this "safety" mechanism, it is **very important** to be cognizant that this script runs **as root on the hypervisor system** with **full access to the cluster**. You should **never** allow arbitrary, untrusted users the ability to add or modify provisioning scripts. It is trivially easy to write scripts which will do destructive things - for example writing to arbitrary `/dev` objects, running arbitrary root-level commands, or importing PVC library functions to delete VMs, RBD volumes, or pools. Thus, ensure you vett and understand every script on the system, audit them regularly for both intentional and accidental malicious activity, and of course (to reiterate), do not allow untrusted script creation!
## Profiles
@ -398,11 +393,9 @@ Using cluster "local" - Host: "10.0.0.1:7370" Scheme: "http" Prefix: "/api/v1"
Task ID: 39639f8c-4866-49de-8c51-4179edec0194
```
**NOTE**: A VM that is set to do so will be defined on the cluster early in the provisioning process, before creating disks or executing the provisioning script, with the special status `provision`. Once completed, if the VM is not set to start automatically, the state will remain `provision`, with the VM not running, until its state is explicitly changed with the client (or via autostart when its node returns to `ready` state).
📝 **NOTE** A VM that is set to do so will be defined on the cluster early in the provisioning process, before creating disks or executing the provisioning script, with the special status `provision`. Once completed, if the VM is not set to start automatically, the state will remain `provision`, with the VM not running, until its state is explicitly changed with the client (or via autostart when its node returns to `ready` state).
**NOTE**: Provisioning jobs are tied to the node that spawned them. If the primary node changes, provisioning jobs will continue to run against that node until they are completed, interrupted, or fail, but the active API (now on the new primary node) will not have access to any status data from these jobs, until the primary node status is returned to the original host. The CLI will warn the administrator of this if there are active jobs while running `node primary` or `node secondary` commands.
**NOTE**: Provisioning jobs cannot be cancelled, either before they start or during execution. The administrator should always let an invalid job either complete or fail out automatically, then remove the erroneous VM with the `vm remove` command.
📝 **NOTE** Provisioning jobs cannot be cancelled, either before they start or during execution. The administrator should always let an invalid job either complete or fail out automatically, then remove the erroneous VM with the `vm remove` command.
# Deploying VMs from OVA images
@ -438,4 +431,4 @@ During import, PVC splits the OVA into its constituent parts, including any disk
Because of this, OVA profiles do not include storage templates like other PVC profiles. A storage template can still be added to such a profile, and the block devices will be added after the main block devices. However, this is generally not recommended; it is far better to modify the OVA to add additional volume(s) before uploading it instead.
**WARNING**: Never adjust the sizes of the OVA VMDK-formatted storage volumes (named `ova_<NAME>_sdX`) or remove them without removing the OVA itself in the provisioner; doing so will prevent the deployment of the OVA, specifically the conversion of the images to raw format at deploy time, and render the OVA profile useless.
⚠️ **WARNING** Never adjust the sizes of the OVA VMDK-formatted storage volumes (named `ova_<NAME>_sdX`) or remove them without removing the OVA itself in the provisioner; doing so will prevent the deployment of the OVA, specifically the conversion of the images to raw format at deploy time, and render the OVA profile useless.

View File

@ -1,10 +1,11 @@
<p align="center">
<img alt="Logo banner" src="images/pvc_logo_black.png"/>
<img alt="Logo banner" src="https://docs.parallelvirtualcluster.org/en/latest/images/pvc_logo_black.png"/>
<br/><br/>
<a href="https://www.parallelvirtualcluster.org"><img alt="Website" src="https://img.shields.io/badge/visit-website-blue"/></a>
<a href="https://github.com/parallelvirtualcluster/pvc/releases"><img alt="Latest Release" src="https://img.shields.io/github/release-pre/parallelvirtualcluster/pvc"/></a>
<a href="https://docs.parallelvirtualcluster.org/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/parallelvirtualcluster/badge/?version=latest"/></a>
<a href="https://github.com/parallelvirtualcluster/pvc"><img alt="License" src="https://img.shields.io/github/license/parallelvirtualcluster/pvc"/></a>
<a href="https://github.com/psf/black"><img alt="Code style: Black" src="https://img.shields.io/badge/code%20style-black-000000.svg"/></a>
<a href="https://github.com/parallelvirtualcluster/pvc/releases"><img alt="Release" src="https://img.shields.io/github/release-pre/parallelvirtualcluster/pvc"/></a>
<a href="https://docs.parallelvirtualcluster.org/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/parallelvirtualcluster/badge/?version=latest"/></a>
</p>
## What is PVC?
@ -23,13 +24,15 @@ Installation of PVC is accomplished by two main components: a [Node installer IS
Just give it physical servers, and it will run your VMs without you having to think about it, all in just an hour or two of setup time.
More information about PVC, its motivations, the hardware requirements, and setting up and managing a cluster [can be found over at our docs page](https://docs.parallelvirtualcluster.org).
## Getting Started
To get started with PVC, please see the [About](https://docs.parallelvirtualcluster.org/en/latest/about-pvc/) page for general information about the project, and the [Getting Started](https://docs.parallelvirtualcluster.org/en/latest/deployment/getting-started/) page for details on configuring your first cluster.
## Changelog
View the changelog in [CHANGELOG.md](CHANGELOG.md). **Please note that any breaking changes are announced here; ensure you read the changelog before upgrading!**
View the changelog in [CHANGELOG.md](https://github.com/parallelvirtualcluster/pvc/blob/master/README.md). **Please note that any breaking changes are announced here; ensure you read the changelog before upgrading!**
## Screenshots

View File

@ -4,7 +4,7 @@ The PVC API is a standalone client application for PVC. It interfaces directly w
The API is built using Flask and is packaged in the Debian package `pvc-client-api`. The API depends on the common client functions of the `pvc-client-common` package as does the CLI client.
Details of the API interface can be found in [the manual](/manuals/api).
The full API endpoint and schema documentation [can be found here](/en/latest/manuals/api-reference.html).
# PVC HTTP API manual
@ -349,7 +349,3 @@ The Ceph monitor port. Should always be `6789`.
* *required*
The Libvirt storage secret UUID for the Ceph cluster.
## API Endpoint Documentation
The full API endpoint and schema documentation [can be found here](/manuals/api-reference.html).

View File

@ -332,7 +332,7 @@ The action to take regarding VMs once a node is *successfully* fenced, i.e. the
The action to take regarding VMs once a node fencing *fails*, i.e. the IPMI command to restart the node reports a failure. Can be one of `None`, to perform no action and the default, or `migrate` to migrate and start all failed VMs on other nodes.
**WARNING:** This functionality is potentially **dangerous** and can result in data loss or corruption in the VM disks; the post-fence migration process *explicitly clears RBD locks on the disk volumes*. It is designed only for specific and advanced use-cases, such as servers that do not reliably report IPMI responses or servers without IPMI (not recommended; see the [cluster architecture documentation](/architecture/cluster)). If this is set to `migrate`, the `suicide_intervals` **must** be set to provide at least some guarantee that the VMs on the node will actually be terminated before this condition triggers. The administrator should think very carefully about their setup and potential failure modes before enabling this option.
⚠️ **WARNING** This functionality is potentially **dangerous** and can result in data loss or corruption in the VM disks; the post-fence migration process *explicitly clears RBD locks on the disk volumes*. It is designed only for specific and advanced use-cases, such as servers that do not reliably report IPMI responses or servers without IPMI (not recommended; see the [cluster architecture documentation](/architecture/cluster)). If this is set to `migrate`, the `suicide_intervals` **must** be set to provide at least some guarantee that the VMs on the node will actually be terminated before this condition triggers. The administrator should think very carefully about their setup and potential failure modes before enabling this option.
#### `system` → `fencing` → `ipmi` → `host`

File diff suppressed because it is too large Load Diff

View File

@ -19,9 +19,11 @@ nav:
- 'Georedundancy': 'architecture/georedundancy.md'
- 'Deployment':
- 'Getting Started Guide': 'deployment/getting-started.md'
- 'Provisioner Guide': 'deployment/provisioner.md'
- 'Manuals':
- 'PVC CLI': 'manuals/cli.md'
- 'PVC HTTP API': 'manuals/api.md'
- 'PVC Node Daemon': 'manuals/daemon.md'
- 'PVC Provisioner': 'manuals/provisioner.md'
- 'PVC Node Health Plugins': 'manuals/health-plugins.md'
- 'API':
- 'API Reference': 'manuals/api-reference.html'