parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	02138974fa	Add device class tiers to Ceph pools Allows specifying a particular device class ("tier") for a given pool, for instance SSD-only or NVMe-only. This is implemented with Crush rules on the Ceph side, and via an additional new key in the pool Zookeeper schema which is defaulted to "default".	2021-12-28 20:58:15 -05:00
Joshua M. Boniface	c776aba8b3	Standardize fuzzy matching and use fullmatch Solves two problems: 1. How match fuzziness was used was very inconsistent; make them all the same, i.e. "if is_fuzzy and limit, apply .* to both sides". 2. Use re.fullmatch instead of re.match to ensure exact matching of the regex to the value. Without fuzziness, this would sometimes cause inconsistent behavior, for instance if a limit was non-fuzzy "vm", expecting to match the actual "vm", but also matching "vm1" too.	2021-12-06 16:35:29 -05:00
Joshua M. Boniface	dd8f07526f	Use positive check rather than negative Ensure the VM is start before doing shutdown/stop, rather than being stopped. Prevents overwrite of existing disable state and other weirdness.	2021-11-06 04:08:33 -04:00
Joshua M. Boniface	739b60b91e	Perform automatic shutdown/stop on VM disable Instead of requiring the VM to already be stopped, instead allow disable state changes to perform a shutdown first. Also add a force option which will do a hard stop instead of a shutdown. References #148	2021-11-06 03:57:24 -04:00
Joshua M. Boniface	2083fd824a	Reformat code with Black code formatter Unify the code style along PEP and Black principles using the tool.	2021-11-06 03:02:43 -04:00
Joshua M. Boniface	40e7e04aad	Fix invalid schema key Addresses #144	2021-10-09 18:42:33 -04:00
Joshua M. Boniface	89f62318bd	Add MTU to network creation/modification Addresses #144	2021-10-09 17:51:32 -04:00
Joshua M. Boniface	f7a826bf52	Add handlers for client network MTUs Refactors some of the code in VXNetworkInterface to handle MTUs in a more streamlined fashion. Also fixes a bug whereby bridge client networks were being explicitly given the cluster dev MTU which might not be correct. Now adds support for this option explicitly in the configs, and defaults to 1500 for safety (the standard Ethernet MTU). Addresses #144	2021-10-09 17:02:27 -04:00
Joshua M. Boniface	5501586a47	Add limit negation to VM list When using the "state", "node", or "tag" arguments to a VM list, add support for a "negate" flag to look for all VMs not in the state, node, or tag state.	2021-10-07 11:50:52 -04:00
Joshua M. Boniface	44491dd988	Add support for configurable OSD DB ratios The default of 0.05 (5%) is likely ideal in the initial implementation, but allow this to be set explicitly for maximum flexibility in space-constrained or performance-critical use-cases.	2021-09-24 01:06:39 -04:00
Joshua M. Boniface	6cef68d157	Add separate OSD DB device support Adds in three parts: 1. Create an API endpoint to create OSD DB volume groups on a device. Passed through to the node via the same command pipeline as creating/removing OSDs, and creates a volume group with a fixed name (osd-db). 2. Adds API support for specifying whether or not to use this DB volume group when creating a new OSD via the "ext_db" flag. Naming and sizing is fixed for simplicity and based on Ceph recommendations (5% of OSD size). The Zookeeper schema tracks the block device to use during removal. 3. Adds CLI support for the new and modified API endpoints, as well as displaying the block device and DB block device in the OSD list. While I debated supporting adding a DB device to an existing OSD, in practice this ended up being a very complex operation involving stopping the OSD and setting some options, so this is not supported; this can be specified during OSD creation only. Closes #142	2021-09-23 13:59:49 -04:00
Joshua M. Boniface	6e0d0e264e	Add memory and vCPU checks to VM define/modify Ensures that a VM won't: (a) Have provisioned more RAM than there is available on a given node. Due to memory overprovisioning, this is simply a "is the VM memory count more than the node count", and doesn't factor in free or used memory on a node, total cluster usage, etc. So if a node has 64GB total RAM, the VM limit is 64GB. It is up to an administrator to ensure sanity below that value. (b) Have provisioned more vCPUs than there are CPU cores on the node, minus 2 to account for hypervisor/storage processes. Will ensure there is no severe CPU contention caused by a single VM having more vCPUs than there are actual execution threads available. Closes #139	2021-09-13 01:51:21 -04:00
Joshua M. Boniface	1855d03a36	Add pool size check when resizing volumes Closes #140	2021-09-12 19:54:51 -04:00
Joshua M. Boniface	73c96d1e93	Add VM device hot attach/detach support Adds a new API endpoint to support hot attach/detach of devices, and the corresponding client-side logic to use this endpoint when doing VM network/storage add/remove actions. The live attach is now the default behaviour for these types of additions and removals, and can be disabled if needed. Closes #141	2021-09-12 19:33:00 -04:00
Joshua M. Boniface	73e8149cb0	Remove explicit image-features from rbd cmd This should be managed in ceph.conf with the `rbd default features` configuration option instead, and thus can be tailored to the underlying OS version.	2021-07-30 11:33:59 -04:00
Joshua M. Boniface	4a7246b8c0	Ensure RBD resize has bytes appended If this isn't, the resize will be interpreted as a MB value and result in an absurdly big volume instead. This is the same consistency validation that occurs on add.	2021-07-30 11:25:13 -04:00
Joshua M. Boniface	c49351469b	Revert "Ensure consistent sizing of volumes" This reverts commit `dc03e95bbf`.	2021-07-29 15:30:00 -04:00
Joshua M. Boniface	dc03e95bbf	Ensure consistent sizing of volumes Convert from human to bytes, then to megabytes and always pass this to the RBD command. This ensures consistency regardless of what is actually passed by the user.	2021-07-29 15:14:25 -04:00
Joshua M. Boniface	45f23c12ea	Remove logs from schema validation These are managed entirely by the logging subsystem not by the schema handler due to catch-22's.	2021-07-20 00:00:37 -04:00
Joshua M. Boniface	b14bc7e3a3	Add retry to log writes	2021-07-19 13:11:28 -04:00
Joshua M. Boniface	4d6842f942	Don't bail out if write fails, keep retrying	2021-07-19 13:09:36 -04:00
Joshua M. Boniface	e9df043c0a	Ensure ZK logging does not block startup	2021-07-19 12:19:59 -04:00
Joshua M. Boniface	5be968123f	Readd 1 second queue get timeout Otherwise daemon stops will sometimes inexplicably block.	2021-07-18 22:17:57 -04:00
Joshua M. Boniface	99fd7ebe63	Fix excessive CPU due to looping	2021-07-18 22:06:50 -04:00
Joshua M. Boniface	cffc96d156	Fix failure in creating base keys	2021-07-18 21:00:23 -04:00
Joshua M. Boniface	b770e15a91	Fix final termination of logger We need to do a bit more finagling with the logger on termination to ensure that all messages are written and the queue drained before actually terminating.	2021-07-18 19:53:00 -04:00
Joshua M. Boniface	982dfd52c6	Adjust date output format	2021-07-18 19:00:54 -04:00
Joshua M. Boniface	a088aa4484	Add node log functions to API and CLI	2021-07-18 18:54:28 -04:00
Joshua M. Boniface	323c7c41ae	Implement node logging into Zookeeper Adds the ability to send node daemon logs to Zookeeper to facilitate a command like "pvc node log", similar to "pvc vm log". Each node stores its logs in a separate tree under "/logs" which can then be combined or queried. By default, set by config, only 2000 lines are kept.	2021-07-18 17:11:43 -04:00
Joshua M. Boniface	75fb60b1b4	Add VM list filtering by tag Uses same method as state or node filtering, rather than altering how the main LIMIT field works.	2021-07-14 00:59:20 -04:00
Joshua M. Boniface	9ea9ac3b8a	Revamp tag handling and display Add an additional protected class, limit manipulation to one at a time, and ensure future flexibility. Also makes display consistent with other VM elements.	2021-07-13 22:39:52 -04:00
Joshua M. Boniface	c0a3467b70	Simplify VM metadata reads Directly call the new common getDomainMetadata function to avoid excessive Zookeeper calls for this information.	2021-07-13 19:05:33 -04:00
Joshua M. Boniface	9a199992a1	Add functions for manipulating VM tags Adds tags to schema (v3), to VM definition, adds function to modify tags, adds function to get tags, and adds tags to VM data output. Tags will enable more granular classification of VMs based either on administrator configuration or from automated system events.	2021-07-13 19:05:33 -04:00
Joshua M. Boniface	c76149141f	Only log ZK connections when persistent Prevents spam in the API logs.	2021-07-10 23:35:49 -04:00
Joshua M. Boniface	0699c48d10	Fix bad schema path name	2021-07-09 16:47:09 -04:00
Joshua M. Boniface	4832245d9c	Handle non-RBD disks and non-RBD errors better	2021-07-09 15:48:57 -04:00
Joshua M. Boniface	2138f2f59f	Fail VM removal on disk removal failures Prevents bad states where the VM is "removed" but some of its disks remain due to e.g. stuck watchers. Rearrange the sequence so it goes stop, delete disks, then delete VM, and then return a failure if any of the disk(s) fail to remove, allowing the task to be rerun after fixing the problem.	2021-07-09 15:39:06 -04:00
Joshua M. Boniface	d1d355a96b	Avoid errors if stats data is None	2021-07-09 13:13:54 -04:00
Joshua M. Boniface	c0c9327a7d	Return an empty log if the value is None	2021-07-09 13:08:00 -04:00
Joshua M. Boniface	5ffabcfef5	Avoid failing if we can't get the future data	2021-07-09 13:05:37 -04:00
Joshua M. Boniface	80fe96b24d	Add some additional docstrings	2021-07-07 12:28:08 -04:00
Joshua M. Boniface	80f04ce8ee	Remove connection renewal in state handler Regenerating the ZK connection was fraught with issues, including duplicate connections, strange failures to reconnect, and various other wonkiness. Instead let Kazoo handle states sensibly. Kazoo moves to SUSPENDED state when it loses connectivity, and stays there indefinitely (based on cursory tests). And Kazoo seems to always resume from this just fine on its own. Thus all that hackery did nothing but complicate reconnection. This therefore turns the listener into a purely informational function, providing logs of when/why it failed, and we also add some additional output messages during initial connection and final disconnection.	2021-07-07 11:55:12 -04:00
Joshua M. Boniface	a8c28786dd	Better handle empty ipaths in schema When trying to write to sub-item paths that don't yet exist, the previous method would just blindly write to whatever the root key is, which is never what we actually want. Instead, check explicitly for a "base path" situation, and handle that. Then, if we try to get a subpath that isn't valid, return None. Finally in the various functions, if the path is None, just continue (or return false/None) and (try to) chug along.	2021-07-05 23:35:03 -04:00
Joshua M. Boniface	c45804e8c1	Revert "Return none if a schema path is not found" This reverts commit `b1fcf6a4a5`.	2021-07-05 23:16:39 -04:00
Joshua M. Boniface	b1fcf6a4a5	Return none if a schema path is not found This can cause overwriting of unintended keys, so should not be happening. Will have to find the bugs this causes.	2021-07-05 17:15:55 -04:00
Joshua M. Boniface	a69105569f	Add node PVC version data to Node information Allows API client to see the currently-active version of the node daemon.	2021-07-05 09:57:38 -04:00
Joshua M. Boniface	e44f3d623e	Remove unnecessary try/except blocks from VM reads The zkhandler read() function takes care of ensuring there is a None value returned if these fail, so these aren't required. Makes the code a fair bit more readable here.	2021-07-02 12:01:58 -04:00
Joshua M. Boniface	43009486ae	Move Ceph pool/volume list assembly to thread pool Same reasons as the VM list, though less impactful.	2021-07-01 17:33:13 -04:00
Joshua M. Boniface	58789f1db4	Move VM list assembly to thread pool This helps parallelize the numerous Zookeeper calls a little bit, at least within the bounds of the GIL, to improve performance when getting a large list of VMs. The max_workers value is capped at 32 to avoid causing too many threads during concurrent executions, but still provides a noticeable speedup (on the order of 0.2-0.4 seconds with 75 VMs, scaling up further as counts grow).	2021-07-01 17:32:47 -04:00
Joshua M. Boniface	baf4c3fbc7	Add performance profiler function Usable anywhere that the global daemon "config" parameter can be passed in (e.g. pvcapid/helper.py, pvcnoded/Daemon.py, etc.). Stores results in a subdirectory of the PVC logdir called "profiler" if this directory can be created, or prints results. The debug config parameter ensures that the profiler can be added to functions and not run unless the server is explicitly in debug mode. Might not be useful as I don't initially plan to add this to every function (only when investigating performance problems), but this flexibility allows that to change later.	2021-07-01 14:01:33 -04:00

1 2 3 4 5

246 Commits