parallelvirtualcluster/pvc - pvc

Commit Graph

Author	SHA1	Message	Date
Joshua Boniface	79eb54d5da	Move fault generation to common library	2023-12-06 13:17:10 -05:00
Joshua Boniface	2267a9c85d	Improve output formatting for simplicity	2023-12-05 10:37:35 -05:00
Joshua Boniface	672e58133f	Implement interfaces to faults	2023-12-04 01:37:54 -05:00
Joshua Boniface	3dc48c1783	Lower default monitoring interval to 15s Faults are also reported on the monitoring interval, so 60s seems like too long. Lower this to 15 seconds by default instead.	2023-12-01 17:38:28 -05:00
Joshua Boniface	9c2b1b29ee	Add node health to fault states Adjusts ordering and ensures that node health states are included in faults if they are less than 50%. Also adjusts fault ID generation and runs fault checks only coordinator nodes to avoid too many runs.	2023-12-01 17:38:28 -05:00
Joshua Boniface	8594eb697f	Add initial fault generation in pvchealthd References: #164	2023-12-01 17:38:27 -05:00
Joshua Boniface	7cb9ebae6b	Remove legacy configuration handler This is not going to be needed.	2023-12-01 01:25:40 -05:00
Joshua Boniface	102c3c3106	Port all Celery worker functions to discrete pkg Moves all tasks run by the Celery worker into a discrete package/module for easier installation. Also adjusts several parameters throughout to accomplish this.	2023-11-30 02:24:54 -05:00
Joshua Boniface	03a738f878	Move config parser into daemon_lib And reformat/add config values for API.	2023-11-30 00:05:37 -05:00
Joshua Boniface	11db3c5b20	Fix ordering during termination	2023-11-29 21:21:51 -05:00
Joshua Boniface	fa12a3c9b1	Permit buffered log appending	2023-11-29 21:21:51 -05:00
Joshua Boniface	787f4216b3	Expand Zookeeper log daemon prefix to match	2023-11-29 21:21:51 -05:00
Joshua Boniface	83ceb41138	Add daemon name to Logger entries	2023-11-29 15:18:37 -05:00
Joshua Boniface	2e5958640a	Remove erroneous time from message	2023-11-29 15:12:41 -05:00
Joshua Boniface	7abc697c8a	Improve Zookeeper log handling Ensures that messages are fully read before each append. Adds more Zookeeper hits, but ensures logs won't be overwritten by multiple daemons. Also don't use a set on the client side, to avoid "removing duplicate" entries erroneously.	2023-11-29 15:12:41 -05:00
Joshua Boniface	dd6a38d5ea	Properly pass the name of the exception	2023-11-16 18:05:52 -05:00
Joshua Boniface	f50f170d4e	Convert vmbuilder to use new Celery step structure	2023-11-16 16:08:49 -05:00
Joshua Boniface	83c4c6633d	Readd RBD lock detection and clearing on startup This is still needed due to the nature of the locks and freeing them on startup, and to preserve lock=fail behaviour on VM startup. Also fixes the fencing lock flush to directly use the client library outside of Celery. I don't like this hack but it seems prudent until we move fencing to the workers as well.	2023-11-10 01:33:48 -05:00
Joshua Boniface	b522306f87	Increase Celery wait times It's a bit inefficient, but provides nicer output and a bit of settling time between each stage.	2023-11-09 23:54:05 -05:00
Joshua Boniface	07026efb63	Ensure OSD checks in before completing Avoids issues where the new OSD doesn't check in; at least the administrator will know. Also fixes some issues with osd_db in removal.	2023-11-09 23:51:05 -05:00
Joshua Boniface	08411708f6	Clean up dangling references to cmd pipes Also removes the schema references for these CMD pipes as they are no longer required.	2023-11-09 23:28:14 -05:00
Joshua Boniface	ce17c60a20	Port OSD on-node tasks to Celery worker system Adds Celery versions of the osd_add, osd_replace, osd_refresh, osd_remove, and osd_db_vg_add functions.	2023-11-09 23:28:08 -05:00
Joshua Boniface	89681d54b9	Port VM on-node tasks to Celery worker system Adds Celery versions of the flush_locks, device_attach, and device_detach functions.	2023-11-06 20:40:46 -05:00
Joshua Boniface	a016337f57	Remove block verify in APi This doesn't work right and is handled by the node anyways.	2023-11-04 02:45:10 -04:00
Joshua Boniface	7f5dd385b5	Use right key for FSID elsewhere	2023-11-03 23:51:01 -04:00
Joshua Boniface	ec42b19d0e	Send FSID to clients too	2023-11-03 16:37:55 -04:00
Joshua Boniface	64e37ae963	Update OSD replacement functionality 1. Simplify this by leveraging the existing remove_osd/add_osd functions, since its task was functionally identical to those two in sequential order. 2. Add support for split OSDs within the command (replacing all OSDs on the block device(s) as required). 3. Add additional configurability and flexibility around the old device, weight, and external DB LVs.	2023-11-03 01:45:49 -04:00
Joshua Boniface	980ea6a9e9	Adjust handling of ext_db and _count options Avoid the use of superfluous flag options, default them to none, and add support for fixed-size DB LVs.	2023-11-02 13:29:47 -04:00
Joshua Boniface	526a5f4a74	Add support for split OSD adds Allows creating multiple OSDs on a single (NVMe) block device, leveraging the "ceph-volume lvm batch" command. Replaces the previous method of creating OSDs. Also adds a new ZK item for each OSD indicating if it is split or not.	2023-11-01 21:31:35 -04:00
Joshua Boniface	35f80e544c	Use more hierarchical backup path structure	2023-10-24 02:04:16 -04:00
Joshua Boniface	83b937654c	Avoid removing nonexistent snapshots Store retain_snapshot in JSON and use that to check during delete.	2023-10-24 01:35:00 -04:00
Joshua Boniface	714bde89e6	Fix incorrect variable ref	2023-10-24 01:25:01 -04:00
Joshua Boniface	c87736eb0a	Use consistent path name and format	2023-10-24 01:20:44 -04:00
Joshua Boniface	63d0a85e29	Add backup deletion command	2023-10-24 01:18:27 -04:00
Joshua Boniface	55ca131c2c	Handle snapshots on restore and provide options Also rename the retain option to remove superfluous plural.	2023-10-24 00:25:06 -04:00
Joshua Boniface	8d256a1737	Complete VM restore functionality	2023-10-23 22:23:17 -04:00
Joshua Boniface	d3b3fdfc80	Revert "Export backup images to a tar archive" This reverts commit `38abd078af`.	2023-10-23 11:01:16 -04:00
Joshua Boniface	f1b29ea94e	Initial VM restore work	2023-10-23 11:00:54 -04:00
Joshua Boniface	38abd078af	Export backup images to a tar archive This helps ensure an easier restore as the tar archive(s) can be sent directly to the API via the normal process of image uploading, instead of individual disks.	2023-10-23 09:56:50 -04:00
Joshua Boniface	fabb97cf48	Only split a command_string if its not a list	2023-10-23 09:50:58 -04:00
Joshua Boniface	68124db323	Remove extra spaces	2023-10-17 13:01:38 -04:00
Joshua Boniface	8921efd269	Fix incorrect tuple construct	2023-10-17 12:55:44 -04:00
Joshua Boniface	3d12915989	Further improve return messages	2023-10-17 12:53:08 -04:00
Joshua Boniface	67b0b19bca	Use better time functionality	2023-10-17 12:39:37 -04:00
Joshua Boniface	5d0c674d1d	Add runtime and adjust ordering	2023-10-17 12:32:40 -04:00
Joshua Boniface	f441b0d823	Improve missing parent message	2023-10-17 12:17:29 -04:00
Joshua Boniface	a5d0f219e4	Improve return messages	2023-10-17 12:10:55 -04:00
Joshua Boniface	0169510df0	Fix up datestring generation	2023-10-17 12:05:45 -04:00
Joshua Boniface	a58c1d5a8c	Fix bad snapshot removals	2023-10-17 12:02:24 -04:00
Joshua Boniface	a8e4b01b67	Handle return data even better	2023-10-17 11:51:03 -04:00
Joshua Boniface	45c4c86911	Handle extra return variable	2023-10-17 11:47:01 -04:00
Joshua Boniface	6448b31d2c	Improve VM list arguments Use kwargs here instead of fixed args to allow default None values.	2023-10-17 11:01:38 -04:00
Joshua Boniface	b997c6f31e	Add support for full VM backups Adds support for exporting full VM backups, including configuration, metainfo, and RBD disk images, with incremental support.	2023-10-17 10:15:06 -04:00
Joshua Boniface	a0b45a2bcd	Always create RBDs with bytes value Converting into human results in imprecise values when specifying bytes directly, which in turn breaks VMDK image uploads. Instead, just use the raw bytes value when creating the volume instead of converting it back.	2023-09-30 12:37:43 -04:00
Joshua Boniface	c4397219da	Ensure fencing states are properly reflected	2023-09-18 09:59:18 -04:00
Joshua Boniface	311bb69785	Format based on updated Black	2023-09-12 16:41:02 -04:00
Joshua Boniface	653b95ee25	Normalize return messages for node commands	2023-05-04 17:02:46 -04:00
Joshua Boniface	78322f4de4	Improve size handling during volume add/resize	2023-04-28 12:16:16 -04:00
Joshua Boniface	c1782c5004	Add full/nearfull OSD health detection	2023-04-28 11:33:39 -04:00
Joshua Boniface	e773211293	Add PVC version to cluster status output	2023-02-22 16:09:24 -05:00
Joshua Boniface	70ba364f1d	Flip VM state condition to remove shutdown Don't cause health degredation for shutdown state, and flip the list around to make it clearer.	2023-02-16 20:32:33 -05:00
Joshua Boniface	1f8561d59a	Format cluster health like node healths Make a cleaner construct here.	2023-02-16 12:33:36 -05:00
Joshua Boniface	1093ca6264	Disallow health less than 0	2023-02-15 16:50:24 -05:00
Joshua Boniface	29584e5636	Add per-node health entries for 3rd party checks	2023-02-15 16:44:49 -05:00
Joshua Boniface	f4e8449356	Fix bugs and formatting of health messages	2023-02-15 16:28:56 -05:00
Joshua Boniface	ec79acf061	Fix linting of cluster.py file	2023-02-15 15:48:31 -05:00
Joshua Boniface	00586074cf	Modify cluster health to use new values	2023-02-15 15:45:43 -05:00
Joshua Boniface	f4eef30770	Add JSON health to cluster data	2023-02-15 15:26:57 -05:00
Joshua Boniface	b07396c39a	Fix bugs if plugins fail to load	2023-02-13 21:51:48 -05:00
Joshua Boniface	e6f9e6e0e8	Fix several bugs and optimize output	2023-02-13 16:36:15 -05:00
Joshua Boniface	9c14d84bfc	Add node health value and send out API	2023-02-13 15:53:39 -05:00
Joshua Boniface	3c742a827b	Initial implementation of monitoring plugin system	2023-02-13 12:06:26 -05:00
Joshua Boniface	671a907236	Allow rename in disable state	2023-01-30 11:48:43 -05:00
Joshua Boniface	38d63d9837	Flip behaviour of memory selectors It didn't make any sense to me for mem(prov) to be the default selector, since this has too many caveats versus mem(free). Switch to using mem(free) as the default (i.e. "mem") and make memprov the alternative.	2022-11-15 15:45:59 -05:00
Joshua Boniface	79eb994a5e	Ensure equality of none and None for selector	2022-11-07 11:59:53 -05:00
Joshua Boniface	8af7189dd0	Add module tag for daemon lib	2022-11-04 03:47:18 -04:00
Joshua Boniface	726d0a562b	Update copyright header year	2022-10-06 11:55:27 -04:00
Joshua Boniface	881550b610	Actually fix VM sorting Due to the executor the previous attempt did not work.	2022-08-12 17:46:29 -04:00
Joshua Boniface	bcabd7d079	Always sort VM list Same justification as previous commit.	2022-08-09 12:05:40 -04:00
Joshua Boniface	05a316cdd6	Ensure the node list is sorted Otherwise the node entries could come back in an arbitrary order; since this is an ordered list of dictionaries that might not be expected by the API consumers, so ensure it's always sorted.	2022-08-09 12:03:49 -04:00
Joshua Boniface	d8d3feee22	Add selector help and adjust flag name 1. Add documentation on the node selector flags. In the API, reference the daemon configuration manual which now includes details in this section; in the CLI, provide the help in "pvc vm define" in detail and then reference that command's help in the other commands that use this field. 2. Ensure the naming is consistent in the CLI, using the flag name "--node-selector" everywhere (was "--selector" for "pvc vm" commands and "--node-selector" for "pvc provisioner" commands).	2022-06-10 02:42:06 -04:00
Joshua Boniface	f8cdcb30ba	Add migration selector via free memory Closes #152	2022-05-18 03:47:16 -04:00
Joshua Boniface	c401a1f655	Use consistent language for primary mode I didn't call it "router" anywhere else, but the state in the list is called "coordinator" so, call it "coordinator mode".	2022-05-06 15:40:52 -04:00
Joshua Boniface	7a40c7a55b	Add support for replacing/refreshing OSDs Adds commands to both replace an OSD disk, and refresh (reimport) an existing OSD disk on a new node. This handles the cases where an OSD disk should be replaced (either due to upgrades or failures) or where a node is rebuilt in-place and an existing OSD must be re-imported to it. This should avoid the need to do a full remove/add sequence for either case. Also cleans up some aspects of OSD removal that are identical between methods (e.g. using safe-to-destroy and sleeping after stopping) and fixes a bug if an OSD does not truly exist when the daemon starts up.	2022-05-06 15:32:06 -04:00
Joshua Boniface	464f0e0356	Store additional OSD information in ZK Ensures that information like the FSIDs and the OSD LVM volume are stored in Zookeeper at creation time and updated at daemon start time (to ensure the data is populated at least once, or if the /dev/sdX path changes). This will allow safer operation of OSD removals and the potential implementation of re-activation after node replacements.	2022-05-02 12:11:39 -04:00
Joshua Boniface	d6ca74376a	Fix bugs with forced removal	2022-04-29 14:03:07 -04:00
Joshua Boniface	4d698be34b	Add OSD removal force option Ensures a removal can continue even in situations where some step(s) might fail, for instance removing an obsolete OSD from a replaced node.	2022-04-29 11:16:33 -04:00
Joshua Boniface	1142454934	Add pool PGs count modification Allows an administrator to adjust the PG count of a given pool. This can be used to increase the PGs (for example after adding more OSDs) or decrease it (to remove OSDs, reduce CPU load, etc.).	2021-12-28 21:53:29 -05:00
Joshua Boniface	bbfad340a1	Add PGs count to pool list	2021-12-28 21:12:02 -05:00
Joshua Boniface	c73939e1c5	Fix issue if pool stats have not updated yet	2021-12-28 21:03:10 -05:00
Joshua Boniface	25fe45dd28	Add device class tiers to Ceph pools Allows specifying a particular device class ("tier") for a given pool, for instance SSD-only or NVMe-only. This is implemented with Crush rules on the Ceph side, and via an additional new key in the pool Zookeeper schema which is defaulted to "default".	2021-12-28 20:58:15 -05:00
Joshua Boniface	6ccd19e636	Standardize fuzzy matching and use fullmatch Solves two problems: 1. How match fuzziness was used was very inconsistent; make them all the same, i.e. "if is_fuzzy and limit, apply .* to both sides". 2. Use re.fullmatch instead of re.match to ensure exact matching of the regex to the value. Without fuzziness, this would sometimes cause inconsistent behavior, for instance if a limit was non-fuzzy "vm", expecting to match the actual "vm", but also matching "vm1" too.	2021-12-06 16:35:29 -05:00
Joshua Boniface	0d857d5ab8	Use positive check rather than negative Ensure the VM is start before doing shutdown/stop, rather than being stopped. Prevents overwrite of existing disable state and other weirdness.	2021-11-06 04:08:33 -04:00
Joshua Boniface	5f193a6134	Perform automatic shutdown/stop on VM disable Instead of requiring the VM to already be stopped, instead allow disable state changes to perform a shutdown first. Also add a force option which will do a hard stop instead of a shutdown. References #148	2021-11-06 03:57:24 -04:00
Joshua Boniface	c41664d2da	Reformat code with Black code formatter Unify the code style along PEP and Black principles using the tool.	2021-11-06 03:02:43 -04:00
Joshua Boniface	87cda72ca9	Fix invalid schema key Addresses #144	2021-10-09 18:42:33 -04:00
Joshua Boniface	24de0f4189	Add MTU to network creation/modification Addresses #144	2021-10-09 17:51:32 -04:00
Joshua Boniface	50d8aa0586	Add handlers for client network MTUs Refactors some of the code in VXNetworkInterface to handle MTUs in a more streamlined fashion. Also fixes a bug whereby bridge client networks were being explicitly given the cluster dev MTU which might not be correct. Now adds support for this option explicitly in the configs, and defaults to 1500 for safety (the standard Ethernet MTU). Addresses #144	2021-10-09 17:02:27 -04:00
Joshua Boniface	c0f7ba0125	Add limit negation to VM list When using the "state", "node", or "tag" arguments to a VM list, add support for a "negate" flag to look for all VMs not in the state, node, or tag state.	2021-10-07 11:50:52 -04:00
Joshua Boniface	65df807b09	Add support for configurable OSD DB ratios The default of 0.05 (5%) is likely ideal in the initial implementation, but allow this to be set explicitly for maximum flexibility in space-constrained or performance-critical use-cases.	2021-09-24 01:06:39 -04:00

1 2 3 4 5 ...

386 Commits