Use queuing for VM migrations #108
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The current migration code is quite old and is prone to strange failures. Rewrite it using some sort of queuing system, so that jobs are added and executed by a worker process/thread, instead of the main threads. Also include better handling of failures so that they can be retried by re-adding the VM to the queue.
changed milestone to %4
mentioned in commit
f9e7e9884f
I've implemented a much saner migration algorithm based on the methods used during primary/secondary node transitions. This includes lockstepping via Zookeeper key locks and after initial testing on my main cluster provides a much more robust, and less spaghettified, code path for these.
Ultimately, building some sort of queue would be interesting, but is not a priority as, with this locking mechanism, many migrations could occur in parallel without conflicting, though automatic migrations (node flush/unflush) are still done sequentially.
Leaving for now but free to close in future.
Reviewing further I conclude that this is unnecessary. Because of the locking being done in the newer migration code, for each VM, state change operations can "queue" themselves as the previous operation proceed. That said, it is not proper queueing as only the last operation will "succeed". That said this does take care of 99.9% of situations (e.g. clobbering a migrate with another migrate/unmigrate) and I think that's sufficient.
closed