Move to Celery jobs for per-node tasks #166

Closed
opened 2023-11-04 15:15:41 -04:00 by joshuaboniface · 3 comments

Currently, there are at least 2 broad categories of tasks that use a custom hacky Zookeeper-as-message-queue system to deliver tasks from the API to a particular node:

  • OSD tasks, specifically using base.cmd.ceph, in daemon-common/ceph.py
  • VM lock, hotattach, and hotdetach tasks, specifically using base.cmd.domain, in daemon-common/vm.py

The current implementation is very suboptimal for a few reasons:

  1. Execution is handled by the node daemon, rather than the API, which contrasts with all other commands. This makes the API code harder to understand and manage as it is split into two places.

  2. Execution is potentially flaky, with nodes possibly ignoring commands based on the node daemon state or other variables.

  3. There is no possibility of callbacks to the API for status, especially on errors, leading to suboptimal "see node logs" messages.

With the work done in #165, it is now possible to use the API worker to execute these tasks instead.

Move the called tasks in the node daemon into the daemon lib, and execute them as Celery tasks through the API instead, using the newly created @celery.task(..., routing_key='blah') and corresponding blah=<node> argument to determine where to run them from the API, similar to how the provisioner create and storage benchmark commands currently work. The routing key will likely just be run_on for both, and thus the target node can be passed to a custom run_on kwarg in the calling function. All of these can then have task status commands, which may require a client refactor but will allow for consistent updates to the status.

We can then deprecate the old base.cmd endpoints in Zookeeper as they will no longer be used.

Currently, there are at least 2 broad categories of tasks that use a custom hacky Zookeeper-as-message-queue system to deliver tasks from the API to a particular node: * OSD tasks, specifically using `base.cmd.ceph`, in `daemon-common/ceph.py` * VM lock, hotattach, and hotdetach tasks, specifically using `base.cmd.domain`, in `daemon-common/vm.py` The current implementation is very suboptimal for a few reasons: 1. Execution is handled by the node daemon, rather than the API, which contrasts with all other commands. This makes the API code harder to understand and manage as it is split into two places. 2. Execution is potentially flaky, with nodes possibly ignoring commands based on the node daemon state or other variables. 3. There is no possibility of callbacks to the API for status, especially on errors, leading to suboptimal "see node logs" messages. With the work done in #165, it is now possible to use the API worker to execute these tasks instead. Move the called tasks in the node daemon into the daemon lib, and execute them as Celery tasks through the API instead, using the newly created `@celery.task(..., routing_key='blah')` and corresponding `blah=<node>` argument to determine where to run them from the API, similar to how the provisioner create and storage benchmark commands currently work. The routing key will likely just be `run_on` for both, and thus the target node can be passed to a custom `run_on` kwarg in the calling function. All of these can then have task status commands, which may require a client refactor but will allow for consistent updates to the status. We can then deprecate the old `base.cmd` endpoints in Zookeeper as they will no longer be used.
joshuaboniface added the
API
Daemon
improvement
labels 2023-11-04 15:15:52 -04:00
Author
Owner

VM tasks are easier to test, so going to do those first.

VM tasks are easier to test, so going to do those first.
Author
Owner

VM tasks are implemented; next up are OSD tasks.

VM tasks are implemented; next up are OSD tasks.
Author
Owner

Completed with version 0.9.81 release.

Completed with version 0.9.81 release.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: parallelvirtualcluster/pvc#166
No description provided.