Work around synchronization lock issues
Make the block on stage C only wait for 900 seconds (15 minutes) to prevent indefinite blocking. The issue comes if a VM is being received, and the current unflush is cancelled for a flush. When this happens, this lock acquisition seems to block for no obvious reason, and no other changes seem to affect it. This is certainly some sort of locking bug within Kazoo but I can't diagnose it as-is. Leave a TODO to look into this again in the future.
This commit is contained in:
parent
bf7823deb5
commit
5355f6ff48
|
@ -555,9 +555,16 @@ class VMInstance(object):
|
|||
time.sleep(0.5)
|
||||
|
||||
self.logger.out('Acquiring lock for phase C', state='i', prefix='Domain {}'.format(self.domuuid))
|
||||
lock.acquire()
|
||||
# This is strictly a synchronizng step
|
||||
lock.release()
|
||||
try:
|
||||
# Wait for only 900 seconds on this step since we don't do anything and it can fail
|
||||
# if a flush or unflush is cancelled. 900 seconds should be plenty for real long
|
||||
# migations while still avoiding an indefinite blocking here.
|
||||
# TODO: Really dig into why
|
||||
lock.acquire(timeout=900)
|
||||
# This is strictly a synchronizng step
|
||||
lock.release()
|
||||
except Exception:
|
||||
self.logger.out('Failed to acquire lock for phase C within 15 minutes, continuing', state='w', prefix='Domain {}'.format(self.domuuid))
|
||||
|
||||
time.sleep(0.5)
|
||||
|
||||
|
|
Loading…
Reference in New Issue