parallelvirtualcluster/pvc - pvc

Commit Graph

Author	SHA1	Message	Date
Joshua Boniface	a6f8500309	Improve fence handling to prevent anomalies 1. Move fence monitoring to its own thread rather than doing the listing and triggering within the main keepalive thread. 2. Add a global lock key at /config/fence_lock and use this lock key to prevent multiple nodes from trying to run fences simultaneously. 3. Run the fencing monitor for each node sequentially within the context of the main fence monitoring thread, to ensure that fences of multiple nodes happen sequentially rather than in parallel. All of these should help to prevent any anomalies where one node can try to fence multiple nodes at once without recourse.	2024-10-10 16:42:57 -04:00
Joshua Boniface	ebec1332e9	Return to relative paths for SCHEMA_ROOT_PATH	2024-10-10 16:20:02 -04:00
Joshua Boniface	8177d5f8b7	Use absolute path for ZK schema	2024-08-27 09:40:24 -04:00
Joshua Boniface	fbd5b3cca3	Remove is_backup flag for snapshots This won't be needed for anything.	2024-08-16 10:46:25 -04:00
Joshua Boniface	6fc7f45027	Add snapshot lists and timestamp Adds snapshots to the list of data in VM objects	2024-08-16 10:46:25 -04:00
Joshua Boniface	553c1e670e	Add VM snapshots functionality Adds the ability to create snapshots of an entire VM, including all its RBD disks and the VM XML config, though not any PVC metadata.	2024-08-16 10:46:25 -04:00
Joshua Boniface	8419659e1b	Ensure zkhandler is always cleaned up Even if the subfunction of an API @ZKConnection call fails, the zkhandler needs to terminate and clean up, or it leaves stuck threads around.	2024-01-30 09:48:17 -05:00
Joshua Boniface	09269f182c	Add live migrate max downtime selector meta field Adds a new flag to VM metadata to allow setting the VM live migration max downtime. This will enable very busy VMs that hang live migration to have this value changed.	2024-01-11 00:05:50 -05:00
Joshua Boniface	123c7ce857	Update copyright header on all files for 2024 Last release of 2023 is probably the best time to do this.	2023-12-29 11:16:59 -05:00
Joshua Boniface	6ed4efad33	Add new network.stats key to nodes	2023-12-21 12:48:48 -05:00
Joshua Boniface	9329784010	Implement async ZK read function Adds a function, "read_many", which can take in multiple ZK keys and return the values from all of them, using asyncio to avoid reading sequentially. Initial tests show a marked improvement in read performance of multiple read()-heavy functions (e.g. "get_list()" functions) with this method.	2023-12-10 15:35:40 -05:00
Joshua Boniface	9c2b1b29ee	Add node health to fault states Adjusts ordering and ensures that node health states are included in faults if they are less than 50%. Also adjusts fault ID generation and runs fault checks only coordinator nodes to avoid too many runs.	2023-12-01 17:38:28 -05:00
Joshua Boniface	8594eb697f	Add initial fault generation in pvchealthd References: #164	2023-12-01 17:38:27 -05:00
Joshua Boniface	08411708f6	Clean up dangling references to cmd pipes Also removes the schema references for these CMD pipes as they are no longer required.	2023-11-09 23:28:14 -05:00
Joshua Boniface	526a5f4a74	Add support for split OSD adds Allows creating multiple OSDs on a single (NVMe) block device, leveraging the "ceph-volume lvm batch" command. Replaces the previous method of creating OSDs. Also adds a new ZK item for each OSD indicating if it is split or not.	2023-11-01 21:31:35 -04:00
Joshua Boniface	f4eef30770	Add JSON health to cluster data	2023-02-15 15:26:57 -05:00
Joshua Boniface	9c14d84bfc	Add node health value and send out API	2023-02-13 15:53:39 -05:00
Joshua Boniface	3c742a827b	Initial implementation of monitoring plugin system	2023-02-13 12:06:26 -05:00
Joshua Boniface	726d0a562b	Update copyright header year	2022-10-06 11:55:27 -04:00
Joshua Boniface	464f0e0356	Store additional OSD information in ZK Ensures that information like the FSIDs and the OSD LVM volume are stored in Zookeeper at creation time and updated at daemon start time (to ensure the data is populated at least once, or if the /dev/sdX path changes). This will allow safer operation of OSD removals and the potential implementation of re-activation after node replacements.	2022-05-02 12:11:39 -04:00
Joshua Boniface	25fe45dd28	Add device class tiers to Ceph pools Allows specifying a particular device class ("tier") for a given pool, for instance SSD-only or NVMe-only. This is implemented with Crush rules on the Ceph side, and via an additional new key in the pool Zookeeper schema which is defaulted to "default".	2021-12-28 20:58:15 -05:00
Joshua Boniface	c41664d2da	Reformat code with Black code formatter Unify the code style along PEP and Black principles using the tool.	2021-11-06 03:02:43 -04:00
Joshua Boniface	50d8aa0586	Add handlers for client network MTUs Refactors some of the code in VXNetworkInterface to handle MTUs in a more streamlined fashion. Also fixes a bug whereby bridge client networks were being explicitly given the cluster dev MTU which might not be correct. Now adds support for this option explicitly in the configs, and defaults to 1500 for safety (the standard Ethernet MTU). Addresses #144	2021-10-09 17:02:27 -04:00
Joshua Boniface	adc8a5a3bc	Add separate OSD DB device support Adds in three parts: 1. Create an API endpoint to create OSD DB volume groups on a device. Passed through to the node via the same command pipeline as creating/removing OSDs, and creates a volume group with a fixed name (osd-db). 2. Adds API support for specifying whether or not to use this DB volume group when creating a new OSD via the "ext_db" flag. Naming and sizing is fixed for simplicity and based on Ceph recommendations (5% of OSD size). The Zookeeper schema tracks the block device to use during removal. 3. Adds CLI support for the new and modified API endpoints, as well as displaying the block device and DB block device in the OSD list. While I debated supporting adding a DB device to an existing OSD, in practice this ended up being a very complex operation involving stopping the OSD and setting some options, so this is not supported; this can be specified during OSD creation only. Closes #142	2021-09-23 13:59:49 -04:00
Joshua Boniface	45f23c12ea	Remove logs from schema validation These are managed entirely by the logging subsystem not by the schema handler due to catch-22's.	2021-07-20 00:00:37 -04:00
Joshua Boniface	323c7c41ae	Implement node logging into Zookeeper Adds the ability to send node daemon logs to Zookeeper to facilitate a command like "pvc node log", similar to "pvc vm log". Each node stores its logs in a separate tree under "/logs" which can then be combined or queried. By default, set by config, only 2000 lines are kept.	2021-07-18 17:11:43 -04:00
Joshua Boniface	9ea9ac3b8a	Revamp tag handling and display Add an additional protected class, limit manipulation to one at a time, and ensure future flexibility. Also makes display consistent with other VM elements.	2021-07-13 22:39:52 -04:00
Joshua Boniface	9a199992a1	Add functions for manipulating VM tags Adds tags to schema (v3), to VM definition, adds function to modify tags, adds function to get tags, and adds tags to VM data output. Tags will enable more granular classification of VMs based either on administrator configuration or from automated system events.	2021-07-13 19:05:33 -04:00
Joshua Boniface	c76149141f	Only log ZK connections when persistent Prevents spam in the API logs.	2021-07-10 23:35:49 -04:00
Joshua Boniface	80fe96b24d	Add some additional docstrings	2021-07-07 12:28:08 -04:00
Joshua Boniface	80f04ce8ee	Remove connection renewal in state handler Regenerating the ZK connection was fraught with issues, including duplicate connections, strange failures to reconnect, and various other wonkiness. Instead let Kazoo handle states sensibly. Kazoo moves to SUSPENDED state when it loses connectivity, and stays there indefinitely (based on cursory tests). And Kazoo seems to always resume from this just fine on its own. Thus all that hackery did nothing but complicate reconnection. This therefore turns the listener into a purely informational function, providing logs of when/why it failed, and we also add some additional output messages during initial connection and final disconnection.	2021-07-07 11:55:12 -04:00
Joshua Boniface	a8c28786dd	Better handle empty ipaths in schema When trying to write to sub-item paths that don't yet exist, the previous method would just blindly write to whatever the root key is, which is never what we actually want. Instead, check explicitly for a "base path" situation, and handle that. Then, if we try to get a subpath that isn't valid, return None. Finally in the various functions, if the path is None, just continue (or return false/None) and (try to) chug along.	2021-07-05 23:35:03 -04:00
Joshua Boniface	c45804e8c1	Revert "Return none if a schema path is not found" This reverts commit `b1fcf6a4a5`.	2021-07-05 23:16:39 -04:00
Joshua Boniface	b1fcf6a4a5	Return none if a schema path is not found This can cause overwriting of unintended keys, so should not be happening. Will have to find the bugs this causes.	2021-07-05 17:15:55 -04:00
Joshua Boniface	a69105569f	Add node PVC version data to Node information Allows API client to see the currently-active version of the node daemon.	2021-07-05 09:57:38 -04:00
Joshua Boniface	e093efceb1	Add NoNodeError handlers in ZK locks Instead of looping 5+ times acquiring an impossible lock on a nonexistent key, just fail on a different error and return failure immediately. This is likely a major corner case that shouldn't happen, but better to be safe than 500.	2021-07-01 01:17:38 -04:00
Joshua Boniface	a080598781	Avoid superfluous ZK exists calls These cause a major (2x) slowdown in read calls since Zookeeper connections are expensive/slow. Instead, just try the thing and return None if there's no key there. Also wrap the children command in similar error handling since that did not exist and could likely cause some bugs at some point.	2021-07-01 01:15:51 -04:00
Joshua Boniface	e623909a43	Store PHY MAC for VFs and restore after free	2021-06-22 00:56:47 -04:00
Joshua Boniface	64d1a37b3c	Add PCIe device paths to SR-IOV VF information This will be used when adding VM network interfaces of type hostdev.	2021-06-21 21:08:46 -04:00
Joshua Boniface	e7b6a3eac1	Implement SR-IOV PF and VF instances Adds support for the node daemon managing SR-IOV PF and VF instances. PFs are added to Zookeeper automatically based on the config at startup during network configuration, and are otherwise completely static. PFs are automatically removed from Zookeeper, along with all coresponding VFs, should the PF phy device be removed from the configuration. VFs are configured based on the (autocreated) VFs of each PF device, added to Zookeeper, and then a new class instance, SRIOVVFInstance, is used to watch them for configuration changes. This will enable the runtime management of VF settings by the API. The set of keys ensures that both configuration and details of the NIC can be tracked. Most keys are self-explanatory, especially for PFs and the basic keys for VFs. The configuration tree is also self-explanatory, being based entirely on the options available in the `ip link set {dev} vf` command. Two additional keys are also present: `used` and `used_by`, which will be able to track the (boolean) state of usage, as well as the VM that uses a given VIF. Since the VM side implementation will support both macvtap and direct "hostdev" assignments, this will ensure that this state can be tracked on both the VF and the VM side.	2021-06-17 01:33:03 -04:00
Joshua Boniface	23318524b9	Ensure validate writes a valid schema version	2021-06-14 21:27:37 -04:00
Joshua Boniface	5f11b3198b	Fix base schema None issue in handler too	2021-06-14 21:13:40 -04:00
Joshua Boniface	49f4feb482	Fix typo bug in key rename	2021-06-14 00:51:45 -04:00
Joshua Boniface	3bad3de720	Verify if key exists before reading	2021-06-13 15:39:43 -04:00
Joshua Boniface	7110a42e5f	Add final schema elements after refactoring	2021-06-13 14:26:17 -04:00
Joshua Boniface	f071343333	Add DHCP lease schema and temp workaround	2021-06-12 18:22:43 -04:00
Joshua Boniface	b1c13c9fc1	Fix another bug with read call	2021-06-10 01:08:18 -04:00
Joshua Boniface	75fc40a1e8	Fix bug with nkipath	2021-06-10 01:00:40 -04:00
Joshua Boniface	2aa7f87ca9	Fix bug in creating child path keys	2021-06-10 00:55:54 -04:00
Joshua Boniface	5273c4ebfa	Fix bug with encoding raw creates	2021-06-10 00:52:07 -04:00

1 2

81 Commits