Rewrite the about page

This commit is contained in:
Joshua Boniface 2019-07-06 15:20:35 -04:00
parent 43e4718d4f
commit ee0abf880a
1 changed files with 31 additions and 100 deletions

View File

@ -1,132 +1,63 @@
# About the Parallel Virtual Cluster suite
## Changelog
## Project Goals and Philosophy
#### v0.4
Server management and system administration have changed significantly in the last decade. Computing as a resource is here, and software-defined is the norm. Gone are the days of pet servers, of tweaking configuration files by hand, and of painstakingly installing from ISO images in 52x CD-ROM drives. This is a brave new world.
Full implementation of virtual management and virtual networking functionality. Partial implementation of storage functionality.
As part of this trend, the rise of IaaS (Infrastructure as a Service) has created an entirely new way for administrators and, increasingly, developers, to interact with servers. They need to be able to provision virtual machines easily and quickly, to ensure those virtual machines are reliable and consistent, and to avoid downtime wherever possible.
#### v0.3
However, the state of the Free Software, virtual management ecosystem in 2019 is quite dissapointing. On the one hand are the giant, IaaS products like OpenStack and CloudStack. These are massive pieces of software, featuring dozens of interlocking parts, designed for massive clusters and public cloud deployments. They're great for a "hyperscale" provider, a large-scale SaaS/IaaS provider, or an enterprise. But they're not designed for small teams or small clusters. On the other hand, tools like Proxmox, oVirt, and even good old fashioned shell scripts are barely scalable, are showing their age, and have become increasily unweildy for advanced usecases - great for one server, not so great for 9 in a highly-available cluster. Not to mention the constant attempts to monitize by throwing features behind Enterprise subscriptions. In short, there is a massive gap between the old-style, pet-based virtualization and the modern, large-scale, IaaS-type virtualization.
Basic implementation of virtual management functionality.
PVC aims to bridge this gap. As a Python 3-based, fully-Free Software, scalable, and redundant private "cloud" that isn't afraid to say it's for small clusters, PVC is able to provide the simple, easy-to-use, small cluster you need today, with minimal administrator work, while being able to scale as your system grows, supporting hundreds or thousands of VMs across dozens of nodes. High availability is baked right into the core software, giving you piece of mind about your cluster, and ensuring that your systems keep running no matter what happens. And the interface couldn't be easier - a straightforward Click-based CLI and a Flask-based HTTP API provide access to the cluster for you to manage, either directly or though scripts or WebUIs. And since everything is Free Software, you can always inspect it, customize it to your usecase, add features, and contribute back to the community if you so choose.
## Philosophical Overview
PVC provides all the features you'd expect of a "cloud" system - easy management of VMs, including live migration between nodes for maximum uptime; virtual networking support using either vLANs or EVPN-based VXLAN; shared, redundant, object-based storage using Ceph, and a convenient API interface for building your own interfaces. It is able to do this without being excessively complex, and without making sacrifices for legacy ideas.
The current state of the private cloud as of 2019 is very weak. On the one hand are the traditional tools, which let you manage a KVM cluster using scripts but requiring large amounts of administrator work and manual configuration based off very rough best practices. On the other hand are the "cloud infrastructure" tools, which are either massive and unwieldy, complex, and in some cases costly, or simply don't fit the traditional niche of virtualized servers.
If you need to run virtual machines, and don't have the time to learn the Stacks or the money to spend on VMWare or Nutanix, PVC might be just what you're looking for.
PVC aims to be a middle option - all the features of a modern cloud, such as software-defined storage and networking, full high-availability at every layer, and software-based management via APIs, combined with a very shallow learning curve and minimal complexity for the administrator on the CLI or WebUI, all while being completely Free Software-based. It adheres to four main principles, which we will outline in some detail below.
## Cluster Architecture
#### Be Free Software Forever (or Bust)
A PVC cluster is based around "nodes", which are physical servers on which the various daemons, storage, networks, and virtual machines run. Each node is self-contained; it is able to perform any and all cluster functions if needed, and there are no segmentations of function between different types of physical hosts.
Free Software is important. Without the ability to study, modify, and change our software, we are beholden to other, often corporate, actors and their motives. PVC commits to being Free Software in the strictest sense, licensed under the GNU GPL 3.0 (or later), and promising now and forever to not charge a cent for a single feature - no "open core paid addons" philosophy, and no tricks. What you see is always what you get, and you are always free to modify PVC to fit your needs and contribute back to the community.
A limited number of nodes, called "coordinators", are statically configured to provide additional services for the cluster. All databases for instance run on the coordinators, but not other nodes. This prevents any issues with scaling database clusters across dozens of hosts, while still retaining maximum redundancy. In a standard configuration, 3 or 5 nodes are designated as coordinators, and additional nodes connect to the coordinators for database access where required. For quorum purposes, there should always be an odd number of coordinators, and exceeding 5 is likely not required even for large clusters.
#### Be Opinionated and Efficient and Pick The Best Software
The primary database for PVC is Zookeeper, which is a highly-available key-value store designed with consistency in mind. Each node connects to the Zookeeper cluster running on the coordinators to send and receive data from the rest of the cluster. The clients also interface with this Zookeeper cluster to configure and obtain state about the various objects in the cluster. This database is the central authority for all nodes.
Choice is good, until we become paralyzed by it, so don't try to be everything for everyone at the expense of good design. PVC aims to make a lot of decisions on software components, as well as how components interact and the overall architecture, for you, based on current best practices, so you can get on with your life. PVC aims to make good design choices based on solid, modern technologies, not held back by legacy debt, while still providing enough configuration flexibility to support almost any administrators' workload.
Nodes are networked together via at least 3 different networks, set during bootstrap. The first is the "upstream" network, which provides upstream access for the nodes, for instance Internet connectivity, sending routes to client networks to upstream routers, etc. This should usually be a private/firewalled network to prevent unauthorized access to the cluster. The second is the "cluster" network, which is a private RFC1918 network that is unrouted and that nodes use to communicate between one another for Zookeeper access, Libvirt migrations, EVPN VXLAN tunnels, etc. The third is the "storage" network, which is used by the Ceph storage cluster for inter-OSD communication, allowing it to be separate from the main cluster network for maximum performance flexibility.
#### Be Scalable and Redundant but Not Hyperscale
Further information about the general cluster architecture can be found at the [cluster architecture page](/architecture/cluster).
A cluster that can only scale to two nodes is not scalable. PVC aims to be scalable to any reasonable size from 1 to 100+ hypervisors and thousands of VMs. However, PVC is *not* "hyperscale" - it knows where it stands, and isn't afraid to have an upper bound after which another more complex cluster suite is better suited. But it also isn't limited to a 2- or 3- node cluster forcing you to make such a decision just a short time in the future. PVC aims to bridge the gap between "unscalable" and "hyperscale", and provide administrators with a cluster they can use for years to come and grow beyond what they expect, and is perfect for anything from a homelab to a small datacenter without compromising today.
## Node Architecture
#### Be Simple To Use, Configure, and Maintain
Within each node, the PVC daemon is a single Python 3 program which handles all node functionality, including networking, starting cluster services, managing creation/removal of VMs, networks, and storage, and providing utilization statistics and information to the cluster.
Administrator time is valuable, and every minute you spend babysitting pets or learning the intricacies of an overly-complex "Cloud" platform is time you could better spend learning, growing, and evolving your other systems. PVC aims to always be simple for an administrator to use, at every stage from bootstrapping, scaling, and day-to-day administration. At the same time, it aims to be extremely powerful, backed by solid infrastructure design that gives you the best possible system without compromising flexibility. Set up your cluster in a day; grow it for years; manage it in seconds.
The daemon uses an object-oriented approach, with most cluster objects being represented by class objects of a specific type. Each node has a full view of all cluster objects and can interact with them based on events from the cluster as needed.
## Architecture overview
Further information about the node daemon architecture can be found at the [daemon architecture page](/architecture/daemon).
PVC is based on a semi-decentralized design with a dynamic number of fully-functional nodes. Each node in the cluster is capable, based on the configuration, of handling any cluster tasks if needed. However in a normal deployment, the first 3 or 5 servers act as cluster "coordinators", taking on a number of management roles, while other nodes connect to the coordinators for state and information. One coordinator provides additional "primary" functionality, such as DHCP services, DNS aggregation, and client network gateways/routing, and this role can pass dynamically between coordinators based on administrator intervention or automated cluster events.
## Client Architecture
The coordinator nodes host a number of services, configured at bootstrap time, that are not infinitely scalable across all nodes. These include a Zookeeper cluster for state management, MariaDB+Galera SQL cluster for DNS, and various processes supporting the primary node. Coordinators can be replaced, added, or removed by the administrator, though by default any additional nodes are configured as non-coordinators, allowing the cluster to scale out to 100 or more hypervisors while still keeping the databases manageable. Noticeably compared to other cloud cluster products, these functions do not require ever more additional servers to support, and are all built in to the main PVC daemon functionality or a small set of "cluster" VMs which are installed by default at bootstrap.
### CLI client
The primary database is Zookeeper, which is used to provide the distributed and coordinated state used by the PVC cluster to determine what resources exist, where they live, and when they should run. The Zookeeper cluster is created on the initial coordinators at bootstrap time, and can scale out onto more coordinators later as required.
The CLI client is the most basic interface to the PVC cluster. It is a Python 3 Click application which interfaces directly with the Zookeeper cluster and provides a self-documenting CLI interface for PVC.
The secondary database is MariaDB with the Galera multi-master functionality. This database primarily supports DNS aggregation services, providing a unified view of the cluster and its clients in DNS without additional administrator intervention. Some additional information about the provisioning state is kept in the database as an intermediate to being stored in Zookeeper.
Further information about the CLI client architecture can be found at the [CLI client architecture page](/architecture/cli).
PVC handles both storage and networking as software configurations defined dynamically based on data in the Zookeeper database. It makes use of BGP EVPN to provide limitless, virtual layer 2 networks for clients in the cluster, and networks are isolated by NFT firewalls, with optional DHCP and IPv6 support in client networks. Storage is provided by Ceph for redundant, replicated block devices, which scales along with the cluster in both performance and size.
The CLI client manual can be found at the [CLI manual page](/manuals/cli).
### Physical Infrastructure
### API client
PVC requires only a very simple physical infrastructure: 1, 3, or more physical servers connected via Ethernet on two flat L2 networks. More complicated topologies are supported during the bootstrapping phase but the simplest configuration should be sufficient for most simple, basic clusters or for learning.
The HTTP API client is a more advanced interface to the PVC cluster, suitable for creating custom interfaces for PVC and providing better access control than the CLI. It is a Python 3 Flask application which also interfaces directly with the Zookeper cluster, and provides services on port 7370 by default. The API features a basic, key-based authentication mechanism to prevent unauthorized access, though this is optional, and can also provide HTTPS support if required for maximum security over public networks. With the exception of cluster initialization, the API can perform all functions that the CLI client can using a similar syntax and layout. Requests return JSON, and POST requests generally expect HTTP form responses.
Each node requires a single L2 network which provides the client and storage interconnections for the cluster. These roles are separated into two distinct L3 networks, allowing them to be split onto different L2 networks if desired. These networks live entirely within the cluster and must not be shared outside the cluster or with other systems. The standard configuration is an RFC1918 /24 network for each role to provide plenty of room for nodes and the supporting cluster VMs, while being scalable up to ~100 hypervisors. A special floating IP is designated in the cluster network to provide a single point of interface to the primary coordinator.
Further information about the API client architecture can be found at the [API client architecture page](/architecture/api).
Each coordinator node, but optionally all nodes, requires a second L2 network which provides upstream routing into the cluster. In the simplest configuration, only the coordinators are present in this network and share routes to client networks and receive outside traffic to the client networks through it. PVC provides no NAT support and no explicit firewalling in from this network, so any external gateway interfaces should connect into the PVC cluster via this intermediate network for security purposes. A special floating IP is designated in the upstream network to provide a single point of interface to the primary coordinator, most importantly for static routing.
The API client manual can be found at the [API manual page](/manuals/api).
The physical hardware of the nodes depends on the target workload. Generally, at least 32GB RAM and 8 CPU cores (excluding SMT threads) is the minimum for a single node, but extremely small configurations are possible, if very limited. Note that the Ceph storage disks, PVC daemons, and, on coordinator nodes, databases and Ceph monitors, all require additional RAM and CPU power on top of the requirements of virtualized guests, so ensure that each node is tall enough for your workload and then scale out for redundancy.
### Direct bindings
All Ceph storage disks should be SSDs for optimal performance and scalability. Even small clusters generate a large amount of storage activity, so consider baseline performance carefully when selecting drives.
Both the CLI and API clients use a common set of functions to perform the actual communication with the cluster, which is packaged separately as the `pvc-client-common` package. These functions can be used directly by 3rd-party Python interfaces for PVC if desired.
Network traffic can be small to extremely high depending especially on storage requirements and cluster size. As mentioned above, the storage network is separate from the cluster network in L3, allowing it to be isolated onto a faster L2 network if required. 1GbE is the bare minimum for small clusters, or the client network on midsize clusters, but 10GbE is recommended for any clusters larger than 3 nodes for at least the storage network. Larger clusters may require separate 10GbE client and storage networks, LCAP, or more advanced configurations to provide optimal throughput for storage, all of which can be configured at bootstrap or reconfigured as the cluster grows with minimal interruption.
## Deployment architecture
PVC features fencing of nodes, and they should be accessible via an IPMI lights-out management interface to allow broken nodes to be removed from service automatically by the cluster. Running without fencing is not inherently dangerous, but has a higher chance of stalling services if nodes fail and do not properly take themselves out of service. The IPMI interfaces can be in the cluster network, or in another network reachable by the nodes via the upstream network.
The overall management, deployment, bootstrapping, and configuring of nodes is accomplished via a set of Ansible roles, found in the [`pvc-ansible` repository](https://github.com/parallelvirtualcluster/pvc-ansible), and nodes are installed via a custom installer ISO generated by the [`pvc-installer` repository](https://github.com/parallelvirtualcluster/pvc-installer). Once the cluster is set up, nodes can be added, replaced, or updated using this Ansible framework.
### Software infrastructure
The PVC server-side infrastructure consists of a single daemon, `pvcd`, which manages each node based on state information from the Zookeeper database. All nodes are capable of running virtual machines, Ceph storage OSDs, and passing traffic to virtual machines via client L2 networks.
A subset of the nodes are designated to act as "coordinator" hosts for the cluster. Usually, 3 or 5 nodes are designated as coordinators; 3 is ideal for small deployments (<30 hypervisors) while 5 allow for much larger scaling, and larger odd numbers of coordinators are possible for very large clusters. These coordinators run additional functions for the cluster beyond VMs and storage, mainly:
* running Zookeeper itself, acting as the central database for the cluster.
* running FRRouting in BGP server mode, performing route reflector and upstream routing functionality.
* running Ceph monitor and manager daemons for the storage cluster.
* acting as cluster network gateways, DHCP, and DNS servers.
* acting as provisioning servers for nodes and VMs.
A single coordinator elects itself "primary" to perform this duty at startup, and passes it off on shutdown; this can be modified manually by the administrator. The primary coordinator handles provisioning and cluster network functionality (gateway, DHCP, DNS) for the whole cluster, which the "secondary" coordinators can take over automatically if needed. While this architecture can suffer from tromboning when there is a large inter-network traffic flow, it preserves a consistent and simple layer-2 model inside each client network for administrative simplicity.
New nodes can be added dynamically; once running, the cluster supports the PXE booting of additional hypervisors which are then self-configured and added to the cluster via the provisioning framework. This framework also allows for the quick deployment of VMs based off Ceph-stored images and templates.
The core external components are:
#### Zookeeper
Zookeeper is the primary database of the cluster, running on the coordinator nodes. All activity in the cluster is mediated by Zookeeper: clients read and write data to it, and daemons determine and update object configuration and state from it. The bootstrap tool initializes the cluster on the initial set of coordinator hosts, and once configured requires manual administrative action to modify; future version using Zookeeper 3.5 may offer self-managing functionality.
Coordinator hosts automatically attempt to start the Zookeeper daemon when they start up, if it has been shut down. If the Zookeeper cluster connection is lost, all clients will pause state update operations while waiting to reconnect. Note that fencing may be triggered if only one node loses Zookeeper connectivity, as the paused operations will prevent keepalives from being sent to the cluster. Take care when rebooting coordinator nodes so that the Zookeeper cluster continues to function normally.
#### FRRouting
FRRouting is used to provide BGP for management of cluster networks. It makes use of BGP EVPN to allow dynamic, software-defined VXLAN client networks presenting as simple layer-2 networks. VMs inside a particular client network can communicate directly as if they shared a switch. FRRouting also provides upstream BGP, allowing routes to the dynamic client networks to be learned by upstream routers.
#### dnsmasq
dnsmasq is used by the coordinator nodes to provide DHCP and DNS support for cluster networks. An individual instance is started on the primary coordinator for each network, handling that network specifically.
#### PowerDNS
PowerDNS is used by the coordinator nodes to aggregate client DNS records from the dnsmasq instances and present a complete picture of the cluster DNS to clients and the outside world. An instance runs on the primary coordinator aggregating dnsmasq entries, which can then be sent to other DNS servers via AXFR, including the in-cluster DNS servers usable by clients, which also make use of PowerDNS.
#### Libvirt
Libvirt is used to manage virtual machines in the cluster. It uses the TCP communication mode to perform live migrations between nodes and must be listening on daemon startup.
#### Ceph
Ceph provides the storage infrastructure to the cluster using RBD block devices. OSDs live in each node and VM disks are stored in copies of 3 across the cluster, ensuring a high degree of resiliency. The monitor and manager functions run on the coordinator nodes for scalability.
## Client interfaces
PVC provides three main administrator interfaces and a supplemental option:
* CLI
* HTTP API
* WebUI
* Direct Python bindings
### CLI
The CLI interface (`pvc`, package `pvc-cli-client`) is used to bootstrap the cluster and is able to perform all administrative tasks. The client requires direct access to the Zookeeper cluster to operate, but is usable on any client machine; initialization however requires a Debian-based GNU/Linux system for optimal administrative ease.
Once the other administrative interfaces are provisioned, the CLI is not required, but is installed by default on all nodes in the cluster to facilitate on-machine troubleshooting and maintenance.
### HTTP API
The HTTP API interface (`pvcapi`, package `pvc-api-client`) is configured by default on a special set of cluster-aware VMs, and provides a feature-complete implementation of the CLI interface via standard HTTP commands. The API allows building advanced configuration utilities integrating PVC without the overhead of the CLI. The HTTP API is optional and installation can be disabled during clutter initialization.
### WebUI
The HTTP Web user interface (`pvcweb`, package `pvc-web-client`) is configured by default on the cluster-aware VMs running the HTTP API, and provides a stripped-down web interface for a number of common administrative tasks, as well as reporting and monitoring functionality. Like the HTTP API, the WebUI is optional and installation can be disabled during cluster initialization.
### Direct Python bindings
While not specifically an interface, the Python functions used by the above interfaces are available via the package `pvc-client-common`, and can be used in custom scripts or programs directly to bypass the CLI or API interfaces.
Further information about the Ansible deployment architecture can be found at the [Ansible architecture page](/architecture/ansible).