Use queuing for VM migrations #108

Closed
opened 2020-10-18 01:36:36 -04:00 by JoshuaBoniface · 5 comments
JoshuaBoniface commented 2020-10-18 01:36:36 -04:00 (Migrated from git.bonifacelabs.ca)

The current migration code is quite old and is prone to strange failures. Rewrite it using some sort of queuing system, so that jobs are added and executed by a worker process/thread, instead of the main threads. Also include better handling of failures so that they can be retried by re-adding the VM to the queue.

The current migration code is quite old and is prone to strange failures. Rewrite it using some sort of queuing system, so that jobs are added and executed by a worker process/thread, instead of the main threads. Also include better handling of failures so that they can be retried by re-adding the VM to the queue.
JoshuaBoniface commented 2020-10-18 01:36:36 -04:00 (Migrated from git.bonifacelabs.ca)

changed milestone to %4

changed milestone to %4
JoshuaBoniface commented 2020-10-20 13:08:33 -04:00 (Migrated from git.bonifacelabs.ca)

mentioned in commit f9e7e9884f

mentioned in commit f9e7e9884f9c98c4b5e1f8a3b2a8b5526991599f
JoshuaBoniface commented 2020-10-21 03:37:20 -04:00 (Migrated from git.bonifacelabs.ca)

I've implemented a much saner migration algorithm based on the methods used during primary/secondary node transitions. This includes lockstepping via Zookeeper key locks and after initial testing on my main cluster provides a much more robust, and less spaghettified, code path for these.

Ultimately, building some sort of queue would be interesting, but is not a priority as, with this locking mechanism, many migrations could occur in parallel without conflicting, though automatic migrations (node flush/unflush) are still done sequentially.

Leaving for now but free to close in future.

I've implemented a much saner migration algorithm based on the methods used during primary/secondary node transitions. This includes lockstepping via Zookeeper key locks and after initial testing on my main cluster provides a much more robust, and less spaghettified, code path for these. Ultimately, building some sort of queue would be interesting, but is not a priority as, with this locking mechanism, many migrations could occur in parallel without conflicting, though automatic migrations (node flush/unflush) are still done sequentially. Leaving for now but free to close in future.
JoshuaBoniface commented 2020-11-08 14:31:05 -05:00 (Migrated from git.bonifacelabs.ca)

Reviewing further I conclude that this is unnecessary. Because of the locking being done in the newer migration code, for each VM, state change operations can "queue" themselves as the previous operation proceed. That said, it is not proper queueing as only the last operation will "succeed". That said this does take care of 99.9% of situations (e.g. clobbering a migrate with another migrate/unmigrate) and I think that's sufficient.

Reviewing further I conclude that this is unnecessary. Because of the locking being done in the newer migration code, for each VM, state change operations can "queue" themselves as the previous operation proceed. That said, it is not proper queueing as only the last operation will "succeed". That said this does take care of 99.9% of situations (e.g. clobbering a migrate with another migrate/unmigrate) and I think that's sufficient.
JoshuaBoniface commented 2020-11-08 14:31:05 -05:00 (Migrated from git.bonifacelabs.ca)

closed

closed
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: parallelvirtualcluster/pvc#108
No description provided.