parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	c0f7ba0125	Add limit negation to VM list When using the "state", "node", or "tag" arguments to a VM list, add support for a "negate" flag to look for all VMs not in the state, node, or tag state.	2021-10-07 11:50:52 -04:00
Joshua M. Boniface	65df807b09	Add support for configurable OSD DB ratios The default of 0.05 (5%) is likely ideal in the initial implementation, but allow this to be set explicitly for maximum flexibility in space-constrained or performance-critical use-cases.	2021-09-24 01:06:39 -04:00
Joshua M. Boniface	adc8a5a3bc	Add separate OSD DB device support Adds in three parts: 1. Create an API endpoint to create OSD DB volume groups on a device. Passed through to the node via the same command pipeline as creating/removing OSDs, and creates a volume group with a fixed name (osd-db). 2. Adds API support for specifying whether or not to use this DB volume group when creating a new OSD via the "ext_db" flag. Naming and sizing is fixed for simplicity and based on Ceph recommendations (5% of OSD size). The Zookeeper schema tracks the block device to use during removal. 3. Adds CLI support for the new and modified API endpoints, as well as displaying the block device and DB block device in the OSD list. While I debated supporting adding a DB device to an existing OSD, in practice this ended up being a very complex operation involving stopping the OSD and setting some options, so this is not supported; this can be specified during OSD creation only. Closes #142	2021-09-23 13:59:49 -04:00
Joshua M. Boniface	58db537093	Add memory and vCPU checks to VM define/modify Ensures that a VM won't: (a) Have provisioned more RAM than there is available on a given node. Due to memory overprovisioning, this is simply a "is the VM memory count more than the node count", and doesn't factor in free or used memory on a node, total cluster usage, etc. So if a node has 64GB total RAM, the VM limit is 64GB. It is up to an administrator to ensure sanity below that value. (b) Have provisioned more vCPUs than there are CPU cores on the node, minus 2 to account for hypervisor/storage processes. Will ensure there is no severe CPU contention caused by a single VM having more vCPUs than there are actual execution threads available. Closes #139	2021-09-13 01:51:21 -04:00
Joshua M. Boniface	e71a6c90bf	Add pool size check when resizing volumes Closes #140	2021-09-12 19:54:51 -04:00
Joshua M. Boniface	e962743e51	Add VM device hot attach/detach support Adds a new API endpoint to support hot attach/detach of devices, and the corresponding client-side logic to use this endpoint when doing VM network/storage add/remove actions. The live attach is now the default behaviour for these types of additions and removals, and can be disabled if needed. Closes #141	2021-09-12 19:33:00 -04:00
Joshua M. Boniface	73e8149cb0	Remove explicit image-features from rbd cmd This should be managed in ceph.conf with the `rbd default features` configuration option instead, and thus can be tailored to the underlying OS version.	2021-07-30 11:33:59 -04:00
Joshua M. Boniface	4a7246b8c0	Ensure RBD resize has bytes appended If this isn't, the resize will be interpreted as a MB value and result in an absurdly big volume instead. This is the same consistency validation that occurs on add.	2021-07-30 11:25:13 -04:00
Joshua M. Boniface	c49351469b	Revert "Ensure consistent sizing of volumes" This reverts commit `dc03e95bbf`.	2021-07-29 15:30:00 -04:00
Joshua M. Boniface	dc03e95bbf	Ensure consistent sizing of volumes Convert from human to bytes, then to megabytes and always pass this to the RBD command. This ensures consistency regardless of what is actually passed by the user.	2021-07-29 15:14:25 -04:00
Joshua M. Boniface	45f23c12ea	Remove logs from schema validation These are managed entirely by the logging subsystem not by the schema handler due to catch-22's.	2021-07-20 00:00:37 -04:00
Joshua M. Boniface	b14bc7e3a3	Add retry to log writes	2021-07-19 13:11:28 -04:00
Joshua M. Boniface	4d6842f942	Don't bail out if write fails, keep retrying	2021-07-19 13:09:36 -04:00
Joshua M. Boniface	e9df043c0a	Ensure ZK logging does not block startup	2021-07-19 12:19:59 -04:00
Joshua M. Boniface	5be968123f	Readd 1 second queue get timeout Otherwise daemon stops will sometimes inexplicably block.	2021-07-18 22:17:57 -04:00
Joshua M. Boniface	99fd7ebe63	Fix excessive CPU due to looping	2021-07-18 22:06:50 -04:00
Joshua M. Boniface	cffc96d156	Fix failure in creating base keys	2021-07-18 21:00:23 -04:00
Joshua M. Boniface	b770e15a91	Fix final termination of logger We need to do a bit more finagling with the logger on termination to ensure that all messages are written and the queue drained before actually terminating.	2021-07-18 19:53:00 -04:00
Joshua M. Boniface	982dfd52c6	Adjust date output format	2021-07-18 19:00:54 -04:00
Joshua M. Boniface	a088aa4484	Add node log functions to API and CLI	2021-07-18 18:54:28 -04:00
Joshua M. Boniface	323c7c41ae	Implement node logging into Zookeeper Adds the ability to send node daemon logs to Zookeeper to facilitate a command like "pvc node log", similar to "pvc vm log". Each node stores its logs in a separate tree under "/logs" which can then be combined or queried. By default, set by config, only 2000 lines are kept.	2021-07-18 17:11:43 -04:00
Joshua M. Boniface	75fb60b1b4	Add VM list filtering by tag Uses same method as state or node filtering, rather than altering how the main LIMIT field works.	2021-07-14 00:59:20 -04:00
Joshua M. Boniface	9ea9ac3b8a	Revamp tag handling and display Add an additional protected class, limit manipulation to one at a time, and ensure future flexibility. Also makes display consistent with other VM elements.	2021-07-13 22:39:52 -04:00
Joshua M. Boniface	c0a3467b70	Simplify VM metadata reads Directly call the new common getDomainMetadata function to avoid excessive Zookeeper calls for this information.	2021-07-13 19:05:33 -04:00
Joshua M. Boniface	9a199992a1	Add functions for manipulating VM tags Adds tags to schema (v3), to VM definition, adds function to modify tags, adds function to get tags, and adds tags to VM data output. Tags will enable more granular classification of VMs based either on administrator configuration or from automated system events.	2021-07-13 19:05:33 -04:00
Joshua M. Boniface	c76149141f	Only log ZK connections when persistent Prevents spam in the API logs.	2021-07-10 23:35:49 -04:00
Joshua M. Boniface	0699c48d10	Fix bad schema path name	2021-07-09 16:47:09 -04:00
Joshua M. Boniface	4832245d9c	Handle non-RBD disks and non-RBD errors better	2021-07-09 15:48:57 -04:00
Joshua M. Boniface	2138f2f59f	Fail VM removal on disk removal failures Prevents bad states where the VM is "removed" but some of its disks remain due to e.g. stuck watchers. Rearrange the sequence so it goes stop, delete disks, then delete VM, and then return a failure if any of the disk(s) fail to remove, allowing the task to be rerun after fixing the problem.	2021-07-09 15:39:06 -04:00
Joshua M. Boniface	d1d355a96b	Avoid errors if stats data is None	2021-07-09 13:13:54 -04:00
Joshua M. Boniface	c0c9327a7d	Return an empty log if the value is None	2021-07-09 13:08:00 -04:00
Joshua M. Boniface	5ffabcfef5	Avoid failing if we can't get the future data	2021-07-09 13:05:37 -04:00
Joshua M. Boniface	80fe96b24d	Add some additional docstrings	2021-07-07 12:28:08 -04:00
Joshua M. Boniface	80f04ce8ee	Remove connection renewal in state handler Regenerating the ZK connection was fraught with issues, including duplicate connections, strange failures to reconnect, and various other wonkiness. Instead let Kazoo handle states sensibly. Kazoo moves to SUSPENDED state when it loses connectivity, and stays there indefinitely (based on cursory tests). And Kazoo seems to always resume from this just fine on its own. Thus all that hackery did nothing but complicate reconnection. This therefore turns the listener into a purely informational function, providing logs of when/why it failed, and we also add some additional output messages during initial connection and final disconnection.	2021-07-07 11:55:12 -04:00
Joshua M. Boniface	a8c28786dd	Better handle empty ipaths in schema When trying to write to sub-item paths that don't yet exist, the previous method would just blindly write to whatever the root key is, which is never what we actually want. Instead, check explicitly for a "base path" situation, and handle that. Then, if we try to get a subpath that isn't valid, return None. Finally in the various functions, if the path is None, just continue (or return false/None) and (try to) chug along.	2021-07-05 23:35:03 -04:00
Joshua M. Boniface	c45804e8c1	Revert "Return none if a schema path is not found" This reverts commit `b1fcf6a4a5`.	2021-07-05 23:16:39 -04:00
Joshua M. Boniface	b1fcf6a4a5	Return none if a schema path is not found This can cause overwriting of unintended keys, so should not be happening. Will have to find the bugs this causes.	2021-07-05 17:15:55 -04:00
Joshua M. Boniface	a69105569f	Add node PVC version data to Node information Allows API client to see the currently-active version of the node daemon.	2021-07-05 09:57:38 -04:00
Joshua M. Boniface	e44f3d623e	Remove unnecessary try/except blocks from VM reads The zkhandler read() function takes care of ensuring there is a None value returned if these fail, so these aren't required. Makes the code a fair bit more readable here.	2021-07-02 12:01:58 -04:00
Joshua M. Boniface	43009486ae	Move Ceph pool/volume list assembly to thread pool Same reasons as the VM list, though less impactful.	2021-07-01 17:33:13 -04:00
Joshua M. Boniface	58789f1db4	Move VM list assembly to thread pool This helps parallelize the numerous Zookeeper calls a little bit, at least within the bounds of the GIL, to improve performance when getting a large list of VMs. The max_workers value is capped at 32 to avoid causing too many threads during concurrent executions, but still provides a noticeable speedup (on the order of 0.2-0.4 seconds with 75 VMs, scaling up further as counts grow).	2021-07-01 17:32:47 -04:00
Joshua M. Boniface	baf4c3fbc7	Add performance profiler function Usable anywhere that the global daemon "config" parameter can be passed in (e.g. pvcapid/helper.py, pvcnoded/Daemon.py, etc.). Stores results in a subdirectory of the PVC logdir called "profiler" if this directory can be created, or prints results. The debug config parameter ensures that the profiler can be added to functions and not run unless the server is explicitly in debug mode. Might not be useful as I don't initially plan to add this to every function (only when investigating performance problems), but this flexibility allows that to change later.	2021-07-01 14:01:33 -04:00
Joshua M. Boniface	e093efceb1	Add NoNodeError handlers in ZK locks Instead of looping 5+ times acquiring an impossible lock on a nonexistent key, just fail on a different error and return failure immediately. This is likely a major corner case that shouldn't happen, but better to be safe than 500.	2021-07-01 01:17:38 -04:00
Joshua M. Boniface	a080598781	Avoid superfluous ZK exists calls These cause a major (2x) slowdown in read calls since Zookeeper connections are expensive/slow. Instead, just try the thing and return None if there's no key there. Also wrap the children command in similar error handling since that did not exist and could likely cause some bugs at some point.	2021-07-01 01:15:51 -04:00
Joshua M. Boniface	6adaf1f669	Fix incorrect handling of deletions in init	2021-06-29 18:41:02 -04:00
Joshua M. Boniface	f91c07fdcf	Re-add UUID limit matching for full UUIDs This was valuable when passing a full UUID in, so go back to that. Verify first that the limit string is an actual UUID, and then compare against it if applicable.	2021-06-28 12:27:43 -04:00
Joshua M. Boniface	c54f66efa8	Limit match only on VM name I can see no possible reason to want to do limits against UUIDs, but supporting that means match is not what one would expect since a random UUID could match the limit. So only limit based on the name.	2021-06-23 19:17:35 -04:00
Joshua M. Boniface	cd860bae6b	Optimize VM list in API With many VMs this slows down linearly. Rework it a bit so there are fewer calls to getInformationFromXML and so the processing could happen in parallel at some point.	2021-06-23 19:14:26 -04:00
Joshua M. Boniface	07dbd55f03	Use list comprehension to compare against source	2021-06-22 02:31:14 -04:00
Joshua M. Boniface	6cd0ccf0ad	Fix network check on VM config modification	2021-06-22 02:21:55 -04:00

1 2 3 4 5 ...

338 Commits