Joshua Boniface
e781d742e6
Fix bug with volume and snapshot listing
2023-12-11 10:21:46 -05:00
Joshua Boniface
741dafb26b
Port VM functions to read_many
2023-12-11 03:34:36 -05:00
Joshua Boniface
5d9e83e8ed
Fix output bugs in VM information
2023-12-11 03:04:46 -05:00
Joshua Boniface
7c116b2fbc
Ensure node health value is an int
2023-12-10 23:56:50 -05:00
Joshua Boniface
1023c55087
Fix bug in VM state list
2023-12-10 23:44:01 -05:00
Joshua Boniface
9235187c6f
Port Ceph functions to read_many
...
Only ports getOSDInformation, as all the others feature 3 or less reads
which is acceptable sequentially.
2023-12-10 22:24:38 -05:00
Joshua Boniface
0c94f1b4f8
Port Network functions to read_many
2023-12-10 22:19:21 -05:00
Joshua Boniface
44a4f0e1f7
Use new info detail output instead of new lists
...
Avoids multiple additional ZK calls by using data that is now in the
status detail output.
2023-12-10 22:19:09 -05:00
Joshua Boniface
5d53a3e529
Add state and faults detail to cluster information
...
We already parse this information out anyways, so might as well add it
to the API output JSON. This can be leveraged by the Prometheus endpoint
as well to avoid duplicate listings.
2023-12-10 17:29:32 -05:00
Joshua Boniface
35e22cb50f
Simplify cluster status handling
...
This significantly simplifies cluster state handling by removing most of
the superfluous get_list() calls, replacing them with basic child reads
since most of them are just for a count anyways. The ones that require
states simplify this down to a child read plus direct reads for the
exact items required while leveraging the new read_many() function.
2023-12-10 17:05:46 -05:00
Joshua Boniface
a3171b666b
Split node health into separate function
2023-12-10 16:52:10 -05:00
Joshua Boniface
48e41d7b05
Port Faults getFault and getAllFaults to read_many
2023-12-10 16:05:16 -05:00
Joshua Boniface
d6aecf195e
Port Node getNodeInformation to read_many
2023-12-10 15:53:28 -05:00
Joshua Boniface
9329784010
Implement async ZK read function
...
Adds a function, "read_many", which can take in multiple ZK keys and
return the values from all of them, using asyncio to avoid reading
sequentially.
Initial tests show a marked improvement in read performance of multiple
read()-heavy functions (e.g. "get_list()" functions) with this method.
2023-12-10 15:35:40 -05:00
Joshua Boniface
b9fbfe2ed5
Improve fault ID format
...
Instead of using random hex characters from an md5sum, use a nice name
in all-caps similar to how Ceph does. This further helps prevent dupes
but also permits a changing health delta within a single event (which
would really only ever apply to plugin faults).
2023-12-09 16:48:14 -05:00
Joshua Boniface
7e6d922877
Improve fault detail handling further
...
Since we already had a "details" field, simply move where it gets added
to the message later, in generate_fault, after the main message value
was used to generate the ID.
2023-12-09 16:13:36 -05:00
Joshua Boniface
4003204f14
Remove bracketed text from fault_str
...
This ensures that certain faults e.g. Ceph status faults, will be
combined despite the added text in brackets, while still keeping them
mostly separate.
Also ensure the health text is updated each time to assist with this, as
this health text may now change independent of the fault ID.
2023-12-09 15:34:18 -05:00
Joshua Boniface
2bea78d25e
Make all remaining limits optional
2023-12-09 13:43:58 -05:00
Joshua Boniface
fd717b702d
Use external list of fault states
2023-12-09 12:51:41 -05:00
Joshua Boniface
317ca4b98c
Move defined state combinations into common
2023-12-09 12:36:32 -05:00
Joshua Boniface
0bda095571
Move libvirt_schema and fix other imports
2023-12-09 12:20:29 -05:00
Joshua Boniface
813aef1463
Fix incorrect UUID key name
2023-12-09 12:14:57 -05:00
Joshua Boniface
5a7ea25266
Fix incorrect database name entries
2023-12-09 12:12:00 -05:00
Joshua Boniface
61b39d0739
Fix incorrect cluster health calculation
2023-12-07 11:13:36 -05:00
Joshua Boniface
4bf80a5913
Fix missing datetime shrink
2023-12-06 17:15:36 -05:00
Joshua Boniface
e0bf7f7d1a
Fix bad ID values in acknowledge
2023-12-06 14:18:31 -05:00
Joshua Boniface
20acf3295f
Add mass ack/delete of faults
2023-12-06 13:59:39 -05:00
Joshua Boniface
d1e34e7333
Store fault times only to the second
...
Any more precision is unnecessary and saves 6 chars when displaying
these times elsewhere.
2023-12-06 13:20:18 -05:00
Joshua Boniface
79eb54d5da
Move fault generation to common library
2023-12-06 13:17:10 -05:00
Joshua Boniface
2267a9c85d
Improve output formatting for simplicity
2023-12-05 10:37:35 -05:00
Joshua Boniface
672e58133f
Implement interfaces to faults
2023-12-04 01:37:54 -05:00
Joshua Boniface
3dc48c1783
Lower default monitoring interval to 15s
...
Faults are also reported on the monitoring interval, so 60s seems like
too long. Lower this to 15 seconds by default instead.
2023-12-01 17:38:28 -05:00
Joshua Boniface
9c2b1b29ee
Add node health to fault states
...
Adjusts ordering and ensures that node health states are included in
faults if they are less than 50%.
Also adjusts fault ID generation and runs fault checks only coordinator
nodes to avoid too many runs.
2023-12-01 17:38:28 -05:00
Joshua Boniface
8594eb697f
Add initial fault generation in pvchealthd
...
References: #164
2023-12-01 17:38:27 -05:00
Joshua Boniface
7cb9ebae6b
Remove legacy configuration handler
...
This is not going to be needed.
2023-12-01 01:25:40 -05:00
Joshua Boniface
102c3c3106
Port all Celery worker functions to discrete pkg
...
Moves all tasks run by the Celery worker into a discrete package/module
for easier installation. Also adjusts several parameters throughout to
accomplish this.
2023-11-30 02:24:54 -05:00
Joshua Boniface
03a738f878
Move config parser into daemon_lib
...
And reformat/add config values for API.
2023-11-30 00:05:37 -05:00
Joshua Boniface
11db3c5b20
Fix ordering during termination
2023-11-29 21:21:51 -05:00
Joshua Boniface
fa12a3c9b1
Permit buffered log appending
2023-11-29 21:21:51 -05:00
Joshua Boniface
787f4216b3
Expand Zookeeper log daemon prefix to match
2023-11-29 21:21:51 -05:00
Joshua Boniface
83ceb41138
Add daemon name to Logger entries
2023-11-29 15:18:37 -05:00
Joshua Boniface
2e5958640a
Remove erroneous time from message
2023-11-29 15:12:41 -05:00
Joshua Boniface
7abc697c8a
Improve Zookeeper log handling
...
Ensures that messages are fully read before each append. Adds more
Zookeeper hits, but ensures logs won't be overwritten by multiple
daemons.
Also don't use a set on the client side, to avoid "removing duplicate"
entries erroneously.
2023-11-29 15:12:41 -05:00
Joshua Boniface
dd6a38d5ea
Properly pass the name of the exception
2023-11-16 18:05:52 -05:00
Joshua Boniface
f50f170d4e
Convert vmbuilder to use new Celery step structure
2023-11-16 16:08:49 -05:00
Joshua Boniface
83c4c6633d
Readd RBD lock detection and clearing on startup
...
This is still needed due to the nature of the locks and freeing them on
startup, and to preserve lock=fail behaviour on VM startup.
Also fixes the fencing lock flush to directly use the client library
outside of Celery. I don't like this hack but it seems prudent until we
move fencing to the workers as well.
2023-11-10 01:33:48 -05:00
Joshua Boniface
b522306f87
Increase Celery wait times
...
It's a bit inefficient, but provides nicer output and a bit of settling
time between each stage.
2023-11-09 23:54:05 -05:00
Joshua Boniface
07026efb63
Ensure OSD checks in before completing
...
Avoids issues where the new OSD doesn't check in; at least the
administrator will know.
Also fixes some issues with osd_db in removal.
2023-11-09 23:51:05 -05:00
Joshua Boniface
08411708f6
Clean up dangling references to cmd pipes
...
Also removes the schema references for these CMD pipes as they are no
longer required.
2023-11-09 23:28:14 -05:00
Joshua Boniface
ce17c60a20
Port OSD on-node tasks to Celery worker system
...
Adds Celery versions of the osd_add, osd_replace, osd_refresh,
osd_remove, and osd_db_vg_add functions.
2023-11-09 23:28:08 -05:00
Joshua Boniface
89681d54b9
Port VM on-node tasks to Celery worker system
...
Adds Celery versions of the flush_locks, device_attach, and
device_detach functions.
2023-11-06 20:40:46 -05:00
Joshua Boniface
a016337f57
Remove block verify in APi
...
This doesn't work right and is handled by the node anyways.
2023-11-04 02:45:10 -04:00
Joshua Boniface
7f5dd385b5
Use right key for FSID elsewhere
2023-11-03 23:51:01 -04:00
Joshua Boniface
ec42b19d0e
Send FSID to clients too
2023-11-03 16:37:55 -04:00
Joshua Boniface
64e37ae963
Update OSD replacement functionality
...
1. Simplify this by leveraging the existing remove_osd/add_osd
functions, since its task was functionally identical to those two in
sequential order.
2. Add support for split OSDs within the command (replacing all OSDs on
the block device(s) as required).
3. Add additional configurability and flexibility around the old device,
weight, and external DB LVs.
2023-11-03 01:45:49 -04:00
Joshua Boniface
980ea6a9e9
Adjust handling of ext_db and _count options
...
Avoid the use of superfluous flag options, default them to none, and add
support for fixed-size DB LVs.
2023-11-02 13:29:47 -04:00
Joshua Boniface
526a5f4a74
Add support for split OSD adds
...
Allows creating multiple OSDs on a single (NVMe) block device,
leveraging the "ceph-volume lvm batch" command. Replaces the previous
method of creating OSDs.
Also adds a new ZK item for each OSD indicating if it is split or not.
2023-11-01 21:31:35 -04:00
Joshua Boniface
35f80e544c
Use more hierarchical backup path structure
2023-10-24 02:04:16 -04:00
Joshua Boniface
83b937654c
Avoid removing nonexistent snapshots
...
Store retain_snapshot in JSON and use that to check during delete.
2023-10-24 01:35:00 -04:00
Joshua Boniface
714bde89e6
Fix incorrect variable ref
2023-10-24 01:25:01 -04:00
Joshua Boniface
c87736eb0a
Use consistent path name and format
2023-10-24 01:20:44 -04:00
Joshua Boniface
63d0a85e29
Add backup deletion command
2023-10-24 01:18:27 -04:00
Joshua Boniface
55ca131c2c
Handle snapshots on restore and provide options
...
Also rename the retain option to remove superfluous plural.
2023-10-24 00:25:06 -04:00
Joshua Boniface
8d256a1737
Complete VM restore functionality
2023-10-23 22:23:17 -04:00
Joshua Boniface
d3b3fdfc80
Revert "Export backup images to a tar archive"
...
This reverts commit 38abd078af
.
2023-10-23 11:01:16 -04:00
Joshua Boniface
f1b29ea94e
Initial VM restore work
2023-10-23 11:00:54 -04:00
Joshua Boniface
38abd078af
Export backup images to a tar archive
...
This helps ensure an easier restore as the tar archive(s) can be sent
directly to the API via the normal process of image uploading, instead
of individual disks.
2023-10-23 09:56:50 -04:00
Joshua Boniface
fabb97cf48
Only split a command_string if its not a list
2023-10-23 09:50:58 -04:00
Joshua Boniface
68124db323
Remove extra spaces
2023-10-17 13:01:38 -04:00
Joshua Boniface
8921efd269
Fix incorrect tuple construct
2023-10-17 12:55:44 -04:00
Joshua Boniface
3d12915989
Further improve return messages
2023-10-17 12:53:08 -04:00
Joshua Boniface
67b0b19bca
Use better time functionality
2023-10-17 12:39:37 -04:00
Joshua Boniface
5d0c674d1d
Add runtime and adjust ordering
2023-10-17 12:32:40 -04:00
Joshua Boniface
f441b0d823
Improve missing parent message
2023-10-17 12:17:29 -04:00
Joshua Boniface
a5d0f219e4
Improve return messages
2023-10-17 12:10:55 -04:00
Joshua Boniface
0169510df0
Fix up datestring generation
2023-10-17 12:05:45 -04:00
Joshua Boniface
a58c1d5a8c
Fix bad snapshot removals
2023-10-17 12:02:24 -04:00
Joshua Boniface
a8e4b01b67
Handle return data even better
2023-10-17 11:51:03 -04:00
Joshua Boniface
45c4c86911
Handle extra return variable
2023-10-17 11:47:01 -04:00
Joshua Boniface
6448b31d2c
Improve VM list arguments
...
Use kwargs here instead of fixed args to allow default None values.
2023-10-17 11:01:38 -04:00
Joshua Boniface
b997c6f31e
Add support for full VM backups
...
Adds support for exporting full VM backups, including configuration,
metainfo, and RBD disk images, with incremental support.
2023-10-17 10:15:06 -04:00
Joshua Boniface
a0b45a2bcd
Always create RBDs with bytes value
...
Converting into human results in imprecise values when specifying bytes
directly, which in turn breaks VMDK image uploads. Instead, just use the
raw bytes value when creating the volume instead of converting it back.
2023-09-30 12:37:43 -04:00
Joshua Boniface
c4397219da
Ensure fencing states are properly reflected
2023-09-18 09:59:18 -04:00
Joshua Boniface
311bb69785
Format based on updated Black
2023-09-12 16:41:02 -04:00
Joshua Boniface
653b95ee25
Normalize return messages for node commands
2023-05-04 17:02:46 -04:00
Joshua Boniface
78322f4de4
Improve size handling during volume add/resize
2023-04-28 12:16:16 -04:00
Joshua Boniface
c1782c5004
Add full/nearfull OSD health detection
2023-04-28 11:33:39 -04:00
Joshua Boniface
e773211293
Add PVC version to cluster status output
2023-02-22 16:09:24 -05:00
Joshua Boniface
70ba364f1d
Flip VM state condition to remove shutdown
...
Don't cause health degredation for shutdown state, and flip the list
around to make it clearer.
2023-02-16 20:32:33 -05:00
Joshua Boniface
1f8561d59a
Format cluster health like node healths
...
Make a cleaner construct here.
2023-02-16 12:33:36 -05:00
Joshua Boniface
1093ca6264
Disallow health less than 0
2023-02-15 16:50:24 -05:00
Joshua Boniface
29584e5636
Add per-node health entries for 3rd party checks
2023-02-15 16:44:49 -05:00
Joshua Boniface
f4e8449356
Fix bugs and formatting of health messages
2023-02-15 16:28:56 -05:00
Joshua Boniface
ec79acf061
Fix linting of cluster.py file
2023-02-15 15:48:31 -05:00
Joshua Boniface
00586074cf
Modify cluster health to use new values
2023-02-15 15:45:43 -05:00
Joshua Boniface
f4eef30770
Add JSON health to cluster data
2023-02-15 15:26:57 -05:00
Joshua Boniface
b07396c39a
Fix bugs if plugins fail to load
2023-02-13 21:51:48 -05:00
Joshua Boniface
e6f9e6e0e8
Fix several bugs and optimize output
2023-02-13 16:36:15 -05:00
Joshua Boniface
9c14d84bfc
Add node health value and send out API
2023-02-13 15:53:39 -05:00
Joshua Boniface
3c742a827b
Initial implementation of monitoring plugin system
2023-02-13 12:06:26 -05:00