Use queuing for VM migrations #108

New Issue

2020-10-18T01:36:36-04:00

JoshuaBoniface commented

2020-10-18 01:36:36 -04:00

(Migrated from git.bonifacelabs.ca)

The current migration code is quite old and is prone to strange failures. Rewrite it using some sort of queuing system, so that jobs are added and executed by a worker process/thread, instead of the main threads. Also include better handling of failures so that they can be retried by re-adding the VM to the queue.

JoshuaBoniface commented

2020-10-18 01:36:36 -04:00

(Migrated from git.bonifacelabs.ca)

changed milestone to %4

JoshuaBoniface commented

2020-10-20 13:08:33 -04:00

(Migrated from git.bonifacelabs.ca)

mentioned in commit f9e7e9884f

mentioned in commit f9e7e9884f9c98c4b5e1f8a3b2a8b5526991599f

JoshuaBoniface commented

2020-10-21 03:37:20 -04:00

(Migrated from git.bonifacelabs.ca)

I've implemented a much saner migration algorithm based on the methods used during primary/secondary node transitions. This includes lockstepping via Zookeeper key locks and after initial testing on my main cluster provides a much more robust, and less spaghettified, code path for these.

Ultimately, building some sort of queue would be interesting, but is not a priority as, with this locking mechanism, many migrations could occur in parallel without conflicting, though automatic migrations (node flush/unflush) are still done sequentially.

Leaving for now but free to close in future.

I've implemented a much saner migration algorithm based on the methods used during primary/secondary node transitions. This includes lockstepping via Zookeeper key locks and after initial testing on my main cluster provides a much more robust, and less spaghettified, code path for these. Ultimately, building some sort of queue would be interesting, but is not a priority as, with this locking mechanism, many migrations could occur in parallel without conflicting, though automatic migrations (node flush/unflush) are still done sequentially. Leaving for now but free to close in future.

JoshuaBoniface commented

2020-11-08 14:31:05 -05:00

(Migrated from git.bonifacelabs.ca)

Reviewing further I conclude that this is unnecessary. Because of the locking being done in the newer migration code, for each VM, state change operations can "queue" themselves as the previous operation proceed. That said, it is not proper queueing as only the last operation will "succeed". That said this does take care of 99.9% of situations (e.g. clobbering a migrate with another migrate/unmigrate) and I think that's sufficient.

JoshuaBoniface commented

2020-11-08 14:31:05 -05:00

(Migrated from git.bonifacelabs.ca)

closed

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: parallelvirtualcluster/pvc#108