Joshua Boniface
d727764ebc
Remove obsolete status and add cluster task
...
Removes the obsoleted "pvc provisioner status" command and replaces it
with a generalized "pvc cluster task" command to show all
currently-active or pending tasks on the cluster workers.
2023-11-16 02:13:26 -05:00
Joshua Boniface
484e6542c2
Port remaining tasks to new task handler
...
Move the create_vm and run_benchmark tasks to use the new Celery
subsystem, handlers, and wait command. Remove the obsolete, dedicated
API endpoints.
Standardize the CLI client and move the repeated handler code into a
separate common function.
2023-11-16 02:00:23 -05:00
Joshua Boniface
aef38639cf
Rename pvcapid-worker to pvcworkerd
2023-11-15 20:31:39 -05:00
Joshua Boniface
5f1432ccdd
Fix memory allocation updates and add more debug
...
Previously, we were assigning memalloc/memprov/vcpualloc during an
earlier phase using the main d_domain list. I'm not sure exactly why,
but this was throwing off stats after a fence. Instead, set these values
later on while parsing the actually-active VMs.
2023-11-10 10:29:32 -05:00
Joshua Boniface
d6b8808448
Clean up fencing handler
...
1. Remove all format strings in favour of f-strings
2. Ensure all logger messages have a prefix
3. Add a few more logger messages for clarity
2023-11-10 10:09:54 -05:00
Joshua Boniface
83c4c6633d
Readd RBD lock detection and clearing on startup
...
This is still needed due to the nature of the locks and freeing them on
startup, and to preserve lock=fail behaviour on VM startup.
Also fixes the fencing lock flush to directly use the client library
outside of Celery. I don't like this hack but it seems prudent until we
move fencing to the workers as well.
2023-11-10 01:33:48 -05:00
Joshua Boniface
2a9bc632fa
Add node monitoring plugin for KeyDB/Redis
2023-11-10 00:56:46 -05:00
Joshua Boniface
b5e4c52387
Increase worker concurrency to 3
2023-11-10 00:39:42 -05:00
Joshua Boniface
b522306f87
Increase Celery wait times
...
It's a bit inefficient, but provides nicer output and a bit of settling
time between each stage.
2023-11-09 23:54:05 -05:00
Joshua Boniface
07026efb63
Ensure OSD checks in before completing
...
Avoids issues where the new OSD doesn't check in; at least the
administrator will know.
Also fixes some issues with osd_db in removal.
2023-11-09 23:51:05 -05:00
Joshua Boniface
d7ea705e31
Improve waiter output
...
Add an extra newline, show the name of the task (from start()), and show
the first step as a "Gathering information" message on the progressbar.
2023-11-09 23:28:18 -05:00
Joshua Boniface
08411708f6
Clean up dangling references to cmd pipes
...
Also removes the schema references for these CMD pipes as they are no
longer required.
2023-11-09 23:28:14 -05:00
Joshua Boniface
ce17c60a20
Port OSD on-node tasks to Celery worker system
...
Adds Celery versions of the osd_add, osd_replace, osd_refresh,
osd_remove, and osd_db_vg_add functions.
2023-11-09 23:28:08 -05:00
Joshua Boniface
89681d54b9
Port VM on-node tasks to Celery worker system
...
Adds Celery versions of the flush_locks, device_attach, and
device_detach functions.
2023-11-06 20:40:46 -05:00
Joshua Boniface
f0c2e9d295
Don't start pvcapid-worker on primary
...
It will be running anyways
2023-11-05 19:44:00 -05:00
Joshua Boniface
2c15036f86
Add KeyDB to node startup services
...
Also ensure API worker starts on all nodes, not just coordinators.
2023-11-05 19:26:38 -05:00
Joshua Boniface
42ed6f6420
Remove redis as a dependency
2023-11-05 18:23:34 -05:00
Joshua Boniface
3dc1f57de2
Revert "Switch to ZK+PG over Redis for Celery queue"
...
This reverts commit 54215bab6c
.
2023-11-05 17:10:46 -05:00
Joshua Boniface
b99b4e64b2
Ensure store path is passed properly
2023-11-05 16:48:47 -05:00
Joshua Boniface
91af1175ef
Fix missing CLI_CONFIG in echo()
2023-11-04 15:17:50 -04:00
Joshua Boniface
af8a8d969e
Ensure queues are set up for non-coordinator nodes
...
Allows a runner to operate on every possible node, not just
coordinators, as OSDs or other things could be on any node.
Also add more comments.
2023-11-04 15:05:07 -04:00
Joshua Boniface
a6caac1b78
Add Celery queue routing for tasks
...
By default, tasks will continue to run as they did, on the primary
coordinator's task runner. However this opens the possibility for
defining more tasks that will run on other nodes or coordinators.
2023-11-04 14:29:59 -04:00
Joshua Boniface
30d7e49401
Start API worker with node daemon on coordinators
2023-11-04 13:08:16 -04:00
Joshua Boniface
ab629f6b51
Use per-host hostname and queues in worker
...
Opens up the ability to direct tasks to specific workers.
2023-11-04 13:02:30 -04:00
Joshua Boniface
54215bab6c
Switch to ZK+PG over Redis for Celery queue
...
Redis did not provide a distributed solution for the worker, which
precluded several important planned functions. So instead, move to using
Zookeeper + PostgreSQL as the broker and result backend respectively.
Should be a seamless drop-in change but for future uses requires the
database host to be the primary coordinator IP rather than localhost, so
that writes can occur to the database from non-primary hosts.
2023-11-04 12:46:34 -04:00
Joshua Boniface
7490f13b7c
Check for partition tables on new devices
2023-11-04 03:13:58 -04:00
Joshua Boniface
d1602f35de
Adjust split indicator
2023-11-04 02:56:21 -04:00
Joshua Boniface
7cdedde2fb
Adjust wording about extdb
2023-11-04 02:54:25 -04:00
Joshua Boniface
ab156b14b7
Update help messages for OSD refresh
2023-11-04 02:47:04 -04:00
Joshua Boniface
a016337f57
Remove block verify in APi
...
This doesn't work right and is handled by the node anyways.
2023-11-04 02:45:10 -04:00
Joshua Boniface
e32054be81
Refactor refresh as well
2023-11-04 02:44:52 -04:00
Joshua Boniface
18d32fede3
Fix wording of detect strings
2023-11-04 01:37:07 -04:00
Joshua Boniface
b3d13fe9be
Add log message for zap
2023-11-04 01:02:51 -04:00
Joshua Boniface
48b2ccbd95
Add timeout for safe-to-destroy
...
Continuously take the OSD down and out while doing so.
2023-11-04 00:55:05 -04:00
Joshua Boniface
1535078842
Fix lvremove, lvcreate, and update ZK details
2023-11-04 00:30:14 -04:00
Joshua Boniface
0e45613634
Use right key with correct data
2023-11-04 00:02:00 -04:00
Joshua Boniface
75135f6d5f
Avoid broken output format for new OSDs
2023-11-03 23:54:10 -04:00
Joshua Boniface
7f5dd385b5
Use right key for FSID elsewhere
2023-11-03 23:51:01 -04:00
Joshua Boniface
befce62925
Add OSD destroy before purge
2023-11-03 23:44:27 -04:00
Joshua Boniface
b0909aed61
Get proper FSID value
2023-11-03 23:38:24 -04:00
Joshua Boniface
f418b40527
Use proper FSID instead of hack
2023-11-03 16:38:19 -04:00
Joshua Boniface
ec42b19d0e
Send FSID to clients too
2023-11-03 16:37:55 -04:00
Joshua Boniface
dd0177ce10
Rework replacement procedure again
...
Avoid calling other functions; replicate the actual process from Ceph
docs (https://docs.ceph.com/en/pacific/rados/operations/add-or-rm-osds/ )
to ensure things work out well (e.g. preserving OSD IDs).
2023-11-03 16:31:56 -04:00
Joshua Boniface
ed5bc9fb43
Fix numerous formatting and function bugs
2023-11-03 14:00:05 -04:00
Joshua Boniface
94d8d2cf75
Fix skip_zap_flag anomaly and add crush rm
2023-11-03 02:35:12 -04:00
Joshua Boniface
20497cf89d
Fix bugs and skip safe_to_destroy on force
2023-11-03 02:29:50 -04:00
Joshua Boniface
64e37ae963
Update OSD replacement functionality
...
1. Simplify this by leveraging the existing remove_osd/add_osd
functions, since its task was functionally identical to those two in
sequential order.
2. Add support for split OSDs within the command (replacing all OSDs on
the block device(s) as required).
3. Add additional configurability and flexibility around the old device,
weight, and external DB LVs.
2023-11-03 01:45:49 -04:00
Joshua Boniface
3cb8a70f04
Add forcing to OSD purge
2023-11-02 23:20:48 -04:00
Joshua Boniface
44d2f98e75
Remove Var field from OSDs
...
Not super duper useful and increases length
2023-11-02 22:55:39 -04:00
Joshua Boniface
cb91bf18a7
Fix incorrect variables
2023-11-02 22:39:32 -04:00