Move to Zookeeper as API worker broker #165

Closed
opened 2023-11-04 03:26:52 -04:00 by joshuaboniface · 3 comments

The Redis broker has served well, but has some big limitations:

  1. No replication between coordinators. While replication is possible, it is statically configured and not multi-master (dealbreaker).

  2. The lack of replication makes each node separate, causing issues.

  3. Prevents using the Celery worker for anything that isn't on "this node" i.e. the primary coordinator.

Investigate the experimental Zookeeper broker instead. If this works, it can open up several interesting avenues:

  1. Provisioner can work against any arbitrary node, and can continue to report status if the primary node changes.

  2. Other tasks can be delegated to the API worker, e.g. Ceph OSD operations or other similar node-specific operations.

The Redis broker has served well, but has some big limitations: 1. No replication between coordinators. While replication is possible, it is statically configured and not multi-master (dealbreaker). 2. The lack of replication makes each node separate, causing issues. 3. Prevents using the Celery worker for anything that isn't on "this node" i.e. the primary coordinator. Investigate the experimental Zookeeper broker instead. If this works, it can open up several interesting avenues: 1. Provisioner can work against any arbitrary node, and can continue to report status if the primary node changes. 2. Other tasks can be delegated to the API worker, e.g. Ceph OSD operations or other similar node-specific operations.
joshuaboniface added the
API
improvement
labels 2023-11-04 03:27:11 -04:00
Author
Owner

Well, that's impossible, because the Redis devs are lazy. Support every random-ass thing under the sun but not ETCd or Zookeeper (i.e. sensible KV stores).

I guess the only option is to hack in Redis multi-master using Dynomite.

https://fatihmalakci.com/how-to-setup-redis-master-master-replication/

Well, that's impossible, because the Redis devs are lazy. Support every random-ass thing under the sun but not ETCd or Zookeeper (i.e. sensible KV stores). I guess the only option is to hack in Redis multi-master using Dynomite. https://fatihmalakci.com/how-to-setup-redis-master-master-replication/
Author
Owner

Was able to get this to work with Zookeeper as the message transport and PostgreSQL as the results backend, and it seems to work fine. I was just confused as Zookeeper is only usable as a Message Broker while I needed something else as the Results Backend, which PostgreSQL works fine as.

Was able to get this to work with Zookeeper as the message transport and PostgreSQL as the results backend, and it seems to work fine. I was just confused as Zookeeper is only usable as a Message Broker while I needed something else as the Results Backend, which PostgreSQL works fine as.
Author
Owner

Code written and merged. Can use a custom kwarg in the function calls (likely always run_on in combination with the @celery.task(..., routing_key='run_on'), to set the worker delegation for tasks that require it. All nodes will have workers, started along side the node daemon, so these tasks can be properly delegated to them leveraging the above queues.

This necessitated a couple config changes:

  1. In pvcapid.yaml, remove the queue: subsection as it is not required.
  2. In pvcapid.yaml, adjust the default PostgreSQL connection to be the cluster floating IP address instead of localhost.
  3. Logic for this in https://git.bonifacelabs.ca/parallelvirtualcluster/pvc-ansible
Code written and merged. Can use a custom kwarg in the function calls (likely always `run_on` in combination with the `@celery.task(..., routing_key='run_on')`, to set the worker delegation for tasks that require it. All nodes will have workers, started along side the node daemon, so these tasks can be properly delegated to them leveraging the above queues. This necessitated a couple config changes: 1. In `pvcapid.yaml`, remove the `queue:` subsection as it is not required. 2. In `pvcapid.yaml`, adjust the default PostgreSQL connection to be the cluster floating IP address instead of localhost. 3. Logic for this in https://git.bonifacelabs.ca/parallelvirtualcluster/pvc-ansible
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: parallelvirtualcluster/pvc#165
No description provided.