Commit Graph

1869 Commits

Author SHA1 Message Date
Joshua Boniface fbbdb209c3 Remove Python OpenSSL dependency
Not actually required for the SSL configuration.
2020-10-26 02:02:15 -04:00
Joshua Boniface f85c2c2a75 Remove PyWSGI and move to Flask server
Gevent was completely failure. The API would block during large file
uploads with no obvious solutions beyond "use gunicorn", which is not
suited to this. I originally had this working with the Flask "debug"
server, so just move to using that all the time. SSL is added using a
custom context with the OpenSSL library, so include that as a
dependency.
2020-10-26 01:58:43 -04:00
Joshua Boniface adfe302f71 Move monkeypatch before all imports 2020-10-24 20:53:44 -04:00
Joshua Boniface 890023cbfc Make sender wait dynamic based on receiver 2020-10-21 14:43:54 -04:00
Joshua Boniface 28abb018e3 Improve some timeouts and conditionals 2020-10-21 12:00:10 -04:00
Joshua Boniface d42bb74dc9 Use explicit acquire/release instead of with
The with blocks did not seem to work as expected. Go back to exclusive
locks as well since these are more consistent.
2020-10-21 11:38:23 -04:00
Joshua Boniface 42c5f84ba7 Do multiple lock attempts 2020-10-21 11:21:37 -04:00
Joshua Boniface 88556f4a33 Convert from exclusive to write lock 2020-10-21 11:12:36 -04:00
Joshua Boniface 017953c2e6 Move lock release to phase D 2020-10-21 11:07:01 -04:00
Joshua Boniface 82b4d3ed1b Add missing prefix statements to loggers 2020-10-21 10:52:53 -04:00
Joshua Boniface 3839040092 Add exclusive lock function 2020-10-21 10:46:41 -04:00
Joshua Boniface bae366a316 Add waits and only receive check on send 2020-10-21 10:43:42 -04:00
Joshua Boniface 84ade53fae Add locks for VM state changes
Use exclusive locks during API events which change VM state. This is
fairly critical to avoid potential duplicate updates. Only implemented
for these specifically required functions to avoid major performance
hits elsewhere.
2020-10-21 10:40:00 -04:00
Joshua Boniface 72f47f216a Revert "Add locking in common zkhander"
This reverts commit 53c0d2b4f6.

This resulted in a massive performance hit and some inconsistent
behaviour. Revert for now an re-investigate later.
2020-10-21 03:49:13 -04:00
Joshua Boniface 9bfcab5e2b Improve documentation around n-1 situations
Closes #104
2020-10-21 03:30:33 -04:00
Joshua Boniface 53c0d2b4f6 Add locking in common zkhander
Ensures that every changed made here is locked, thus preventing
duplicate updates, etc.
2020-10-21 03:17:18 -04:00
Joshua Boniface 351076c15e Check if node changed during final check
Avoids situations where two migrates, to different nodes, happen in
rapid succession. Aborts the migration if the current target node no
longer matches what was set at the start of the execution.
2020-10-21 02:52:36 -04:00
Joshua Boniface 42514b9a50 Improve messages further 2020-10-21 02:41:42 -04:00
Joshua Boniface 611e47f338 Add messages to migration aborts
Results in some information duplication, but ensures logging of the
reason a migration was aborted separate from the error(s) this may
generate.
2020-10-21 02:38:42 -04:00
Joshua Boniface d96a23276b Mention recommendations about system disks
Advise SSDs always, mention the situations where slower media can be
acceptable and the risks therein.
2020-10-21 02:31:09 -04:00
Joshua Boniface 1523959074 Move where setting last_ vars happens 2020-10-21 02:24:00 -04:00
Joshua Boniface ef762359f4 Adjust timing to avoid migrating to self quickly
Add another separate state lock, release it earlier, and ensure timings
are good to avoid double-migrating one VM.
2020-10-21 02:17:55 -04:00
Joshua Boniface 398d33778f Avoid stopping duplicates, just lock our own key 2020-10-20 16:10:39 -04:00
Joshua Boniface a6d492ed9f Remove spurious writes and adjust sleep 2020-10-20 16:04:26 -04:00
Joshua Boniface 11fa3b0df3 Remove additional wait and add last_node entries
These allow for aborting a migration to retain the previous settings and
override what the client set.
2020-10-20 15:58:55 -04:00
Joshua Boniface 442aa4e420 Tweak timers further 2020-10-20 15:43:59 -04:00
Joshua Boniface 3910843660 Add missing break 2020-10-20 15:39:29 -04:00
Joshua Boniface 70f3fdbfb9 Tweak the delays slightly on receive 2020-10-20 15:38:07 -04:00
Joshua Boniface 7cb0241a12 Attempt live migrates 3 times before proceeding 2020-10-20 15:33:41 -04:00
Joshua Boniface 9fb33ed7a7 Increase peer lock acquiring timers 2020-10-20 15:26:59 -04:00
Joshua Boniface abfe0108ab Better handle aborting migrations 2020-10-20 15:22:16 -04:00
Joshua Boniface 567fe8f36b Wait for existing migrations before proceeding 2020-10-20 15:12:32 -04:00
Joshua Boniface ec7b78b9b8 Add additional short sleep in receive 2020-10-20 13:29:17 -04:00
Joshua Boniface 224c8082ef Alter text of synchronization messages 2020-10-20 13:08:18 -04:00
Joshua Boniface f9e7e9884f Improve handling of VM migrations
The VM migration code was very old, very spaghettified, and prone to
strange failures.

Improve this by taking cues from the node primary migration. Use
synchronization between the nodes to ensure lockstep completion of the
migration in discrete steps.

A proper queue can be built later to integrate with this code more
cleanly.

References #108
2020-10-20 13:01:55 -04:00
Joshua Boniface 726501f4d4 Add additional logging to flush selector
Adds additional debug logging to the flush selector to determine how any
why any given node is selected. Useful for troubleshooting strange
choices.
2020-10-20 12:34:18 -04:00
Joshua Boniface 7cc33451b9 Improve Munin check with extinfo 2020-10-19 11:01:00 -04:00
Joshua Boniface ffaa4c033f Improve handling of large file uploads
By default, Werkzeug would require the entire file (be it an OVA or
image file) to be uploaded and saved to a temporary, fake file under
`/tmp`, before any further processing could occur. This blocked most of
the execution of these functions until the upload was completed.

This entirely defeated the purpose of what I was trying to do, which was
to save the uploads directly to the temporary blockdev in each case,
thus avoiding any sort of memory or (host) disk usage.

The solution is two-fold:

  1. First, ensure that the `location='args'` value is set in
  RequestParser; without this, the `files` portion would be parsed
  during the argument parsing, which was the original source of this
  blocking behaviour.

  2. Instead of the convoluted request handling that was being done
  originally here, instead entirely defer the parsing of the `files`
  arguments until the point in the code where they are ready to be
  saved. Then, using an override stream_factory that simply opens the
  temporary blockdev, the upload can commence while being written
  directly out to it, rather than using `/tmp` space.

This does alter the error handling slightly; it is impossible to check
if the argument was passed until this point in the code, so it may take
longer to fail if the API consumer does not specify a file as they
should. This is a minor trade-off and I would expect my API consumers to
be sane here.
2020-10-19 01:00:34 -04:00
Joshua Boniface 7a27503f1b Allow network-less managed networks
Allows the specification of network-less managed networks, acting like
bridged networks but over the VXLAN system instead.

Closes #107
2020-10-18 23:13:12 -04:00
Joshua Boniface e7ab1bfddd Add cluster overprovision determination
Adds a check of (n-1) memory overprovisioning. (n-1) is considered to be
the configuration that excludes the "largest" node. The cluster will
report degraded when in this state.
2020-10-18 14:57:22 -04:00
Joshua Boniface c6e34c7dc6 Bump base version to 0.9 2020-10-18 14:31:19 -04:00
Joshua Boniface f749633f7c Use provisioned memory for mem migration selector
Use the new "provisioned" memory field, instead of the "allocated"
memory field, to determine the optimal node when using the "mem"
migration selector. This will take into account non-running VMs in the
calculation as well as running VMs.
2020-10-18 14:17:15 -04:00
Joshua Boniface a4b80be5ed Add provisioned memory to node info
Adds a separate field to the node memory, "provisioned", which totals
the amount of memory provisioned to all VMs on the node, regardless of
state, and in contrast to "allocated" which only counts running VMs.

Allows for the detection of potential overprovisioned states when
factoring in non-running VMs.

Includes the supporting code to get this data, since the original
implementation of VM memory selection was dependent on the VM being
running and getting this from libvirt. Now, if the VM is not active, it
gets this from the domain XML instead.
2020-10-18 14:17:15 -04:00
Joshua Boniface 9d7067469a Correct proper type of uploads 2020-10-16 11:47:09 -04:00
Joshua Boniface 891aeca388 Bump Debian changelog version 2020-10-15 11:02:41 -04:00
Joshua Boniface aa5f8c93fd Entirely disable IPv6 on bridged interfaces
Prevents any potential leakage due to autoconfigured IPv6 on bridged
interfaces. These are exclusively VM-side bridges, and the PVC host
should not have any IPv6 configuration on them, ever.
2020-10-15 11:00:59 -04:00
Joshua Boniface 9366977fe6 Copy d_domain before iterating
Prevents a bug where the thread can crash due to a change in the
d_domain object while running the for loop. By copying and iterating
over the copy, this becomes safer.
2020-09-16 15:12:37 -04:00
Joshua Boniface 973c78b8e0 Use monkeypatch to allow multithreaded prod flask
Without this tasks were blocking when other task were active (for
instance, any task with --wait). Using the moneypatch, these no longer
block.
2020-08-28 02:09:31 -04:00
Joshua Boniface 65b44f2955 Avoid breaking keepalive during incoming migration
The keepalive was getting stuck gathering memoryStats from the
non-running VM, since it was in a paused state. Avoid this by just
skipping past the rest of the stats gathering if the VM isn't running.
2020-08-28 01:47:36 -04:00
Joshua Boniface 7ce1bfd930 Fix bad integer/string in base convert 2020-08-28 01:08:48 -04:00