parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	44a4f0e1f7	Use new info detail output instead of new lists Avoids multiple additional ZK calls by using data that is now in the status detail output.	2023-12-10 22:19:09 -05:00
Joshua M. Boniface	5d53a3e529	Add state and faults detail to cluster information We already parse this information out anyways, so might as well add it to the API output JSON. This can be leveraged by the Prometheus endpoint as well to avoid duplicate listings.	2023-12-10 17:29:32 -05:00
Joshua M. Boniface	35e22cb50f	Simplify cluster status handling This significantly simplifies cluster state handling by removing most of the superfluous get_list() calls, replacing them with basic child reads since most of them are just for a count anyways. The ones that require states simplify this down to a child read plus direct reads for the exact items required while leveraging the new read_many() function.	2023-12-10 17:05:46 -05:00
Joshua M. Boniface	a3171b666b	Split node health into separate function	2023-12-10 16:52:10 -05:00
Joshua M. Boniface	48e41d7b05	Port Faults getFault and getAllFaults to read_many	2023-12-10 16:05:16 -05:00
Joshua M. Boniface	d6aecf195e	Port Node getNodeInformation to read_many	2023-12-10 15:53:28 -05:00
Joshua M. Boniface	9329784010	Implement async ZK read function Adds a function, "read_many", which can take in multiple ZK keys and return the values from all of them, using asyncio to avoid reading sequentially. Initial tests show a marked improvement in read performance of multiple read()-heavy functions (e.g. "get_list()" functions) with this method.	2023-12-10 15:35:40 -05:00
Joshua M. Boniface	b9fbfe2ed5	Improve fault ID format Instead of using random hex characters from an md5sum, use a nice name in all-caps similar to how Ceph does. This further helps prevent dupes but also permits a changing health delta within a single event (which would really only ever apply to plugin faults).	2023-12-09 16:48:14 -05:00
Joshua M. Boniface	7e6d922877	Improve fault detail handling further Since we already had a "details" field, simply move where it gets added to the message later, in generate_fault, after the main message value was used to generate the ID.	2023-12-09 16:13:36 -05:00
Joshua M. Boniface	4003204f14	Remove bracketed text from fault_str This ensures that certain faults e.g. Ceph status faults, will be combined despite the added text in brackets, while still keeping them mostly separate. Also ensure the health text is updated each time to assist with this, as this health text may now change independent of the fault ID.	2023-12-09 15:34:18 -05:00
Joshua M. Boniface	2bea78d25e	Make all remaining limits optional	2023-12-09 13:43:58 -05:00
Joshua M. Boniface	fd717b702d	Use external list of fault states	2023-12-09 12:51:41 -05:00
Joshua M. Boniface	317ca4b98c	Move defined state combinations into common	2023-12-09 12:36:32 -05:00
Joshua M. Boniface	0bda095571	Move libvirt_schema and fix other imports	2023-12-09 12:20:29 -05:00
Joshua M. Boniface	813aef1463	Fix incorrect UUID key name	2023-12-09 12:14:57 -05:00
Joshua M. Boniface	5a7ea25266	Fix incorrect database name entries	2023-12-09 12:12:00 -05:00
Joshua M. Boniface	61b39d0739	Fix incorrect cluster health calculation	2023-12-07 11:13:36 -05:00
Joshua M. Boniface	4bf80a5913	Fix missing datetime shrink	2023-12-06 17:15:36 -05:00
Joshua M. Boniface	e0bf7f7d1a	Fix bad ID values in acknowledge	2023-12-06 14:18:31 -05:00
Joshua M. Boniface	20acf3295f	Add mass ack/delete of faults	2023-12-06 13:59:39 -05:00
Joshua M. Boniface	d1e34e7333	Store fault times only to the second Any more precision is unnecessary and saves 6 chars when displaying these times elsewhere.	2023-12-06 13:20:18 -05:00
Joshua M. Boniface	79eb54d5da	Move fault generation to common library	2023-12-06 13:17:10 -05:00
Joshua M. Boniface	2267a9c85d	Improve output formatting for simplicity	2023-12-05 10:37:35 -05:00
Joshua M. Boniface	672e58133f	Implement interfaces to faults	2023-12-04 01:37:54 -05:00
Joshua M. Boniface	3dc48c1783	Lower default monitoring interval to 15s Faults are also reported on the monitoring interval, so 60s seems like too long. Lower this to 15 seconds by default instead.	2023-12-01 17:38:28 -05:00
Joshua M. Boniface	9c2b1b29ee	Add node health to fault states Adjusts ordering and ensures that node health states are included in faults if they are less than 50%. Also adjusts fault ID generation and runs fault checks only coordinator nodes to avoid too many runs.	2023-12-01 17:38:28 -05:00
Joshua M. Boniface	8594eb697f	Add initial fault generation in pvchealthd References: #164	2023-12-01 17:38:27 -05:00
Joshua M. Boniface	7cb9ebae6b	Remove legacy configuration handler This is not going to be needed.	2023-12-01 01:25:40 -05:00
Joshua M. Boniface	102c3c3106	Port all Celery worker functions to discrete pkg Moves all tasks run by the Celery worker into a discrete package/module for easier installation. Also adjusts several parameters throughout to accomplish this.	2023-11-30 02:24:54 -05:00
Joshua M. Boniface	03a738f878	Move config parser into daemon_lib And reformat/add config values for API.	2023-11-30 00:05:37 -05:00
Joshua M. Boniface	11db3c5b20	Fix ordering during termination	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	fa12a3c9b1	Permit buffered log appending	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	787f4216b3	Expand Zookeeper log daemon prefix to match	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	83ceb41138	Add daemon name to Logger entries	2023-11-29 15:18:37 -05:00
Joshua M. Boniface	2e5958640a	Remove erroneous time from message	2023-11-29 15:12:41 -05:00
Joshua M. Boniface	7abc697c8a	Improve Zookeeper log handling Ensures that messages are fully read before each append. Adds more Zookeeper hits, but ensures logs won't be overwritten by multiple daemons. Also don't use a set on the client side, to avoid "removing duplicate" entries erroneously.	2023-11-29 15:12:41 -05:00
Joshua M. Boniface	dd6a38d5ea	Properly pass the name of the exception	2023-11-16 18:05:52 -05:00
Joshua M. Boniface	f50f170d4e	Convert vmbuilder to use new Celery step structure	2023-11-16 16:08:49 -05:00
Joshua M. Boniface	83c4c6633d	Readd RBD lock detection and clearing on startup This is still needed due to the nature of the locks and freeing them on startup, and to preserve lock=fail behaviour on VM startup. Also fixes the fencing lock flush to directly use the client library outside of Celery. I don't like this hack but it seems prudent until we move fencing to the workers as well.	2023-11-10 01:33:48 -05:00
Joshua M. Boniface	b522306f87	Increase Celery wait times It's a bit inefficient, but provides nicer output and a bit of settling time between each stage.	2023-11-09 23:54:05 -05:00
Joshua M. Boniface	07026efb63	Ensure OSD checks in before completing Avoids issues where the new OSD doesn't check in; at least the administrator will know. Also fixes some issues with osd_db in removal.	2023-11-09 23:51:05 -05:00
Joshua M. Boniface	08411708f6	Clean up dangling references to cmd pipes Also removes the schema references for these CMD pipes as they are no longer required.	2023-11-09 23:28:14 -05:00
Joshua M. Boniface	ce17c60a20	Port OSD on-node tasks to Celery worker system Adds Celery versions of the osd_add, osd_replace, osd_refresh, osd_remove, and osd_db_vg_add functions.	2023-11-09 23:28:08 -05:00
Joshua M. Boniface	89681d54b9	Port VM on-node tasks to Celery worker system Adds Celery versions of the flush_locks, device_attach, and device_detach functions.	2023-11-06 20:40:46 -05:00
Joshua M. Boniface	a016337f57	Remove block verify in APi This doesn't work right and is handled by the node anyways.	2023-11-04 02:45:10 -04:00
Joshua M. Boniface	7f5dd385b5	Use right key for FSID elsewhere	2023-11-03 23:51:01 -04:00
Joshua M. Boniface	ec42b19d0e	Send FSID to clients too	2023-11-03 16:37:55 -04:00
Joshua M. Boniface	64e37ae963	Update OSD replacement functionality 1. Simplify this by leveraging the existing remove_osd/add_osd functions, since its task was functionally identical to those two in sequential order. 2. Add support for split OSDs within the command (replacing all OSDs on the block device(s) as required). 3. Add additional configurability and flexibility around the old device, weight, and external DB LVs.	2023-11-03 01:45:49 -04:00
Joshua M. Boniface	980ea6a9e9	Adjust handling of ext_db and _count options Avoid the use of superfluous flag options, default them to none, and add support for fixed-size DB LVs.	2023-11-02 13:29:47 -04:00
Joshua M. Boniface	526a5f4a74	Add support for split OSD adds Allows creating multiple OSDs on a single (NVMe) block device, leveraging the "ceph-volume lvm batch" command. Replaces the previous method of creating OSDs. Also adds a new ZK item for each OSD indicating if it is split or not.	2023-11-01 21:31:35 -04:00

1 2 3 4 5 ...

357 Commits