Joshua Boniface
89c7e225a0
Move OSD stats uploading to primary only
...
Instead of each node uploading its own OSD stats, which would not work
if the PVC daemon wasn't running, instead have the primary upload stats
for all OSDs in the cluster.
2020-12-09 02:46:09 -05:00
Joshua Boniface
b36ec43a2d
Bump version to 0.9.9
2020-12-09 02:20:20 -05:00
Joshua Boniface
ce5ee11841
Bump version to 0.9.8
2020-11-24 12:26:57 -05:00
Joshua Boniface
d4a28d7a58
Bump version to 0.9.7
2020-11-19 10:48:28 -05:00
Joshua Boniface
e69eb93cb3
Bump version to 0.9.6
2020-11-17 13:01:54 -05:00
Joshua Boniface
70dfcd434f
Ensure inmigrate is cleared on failure
2020-11-17 12:57:37 -05:00
Joshua Boniface
a4e5323e81
Bump version to 0.9.5
2020-11-17 12:34:04 -05:00
Joshua Boniface
9053edacd8
Bump version to 0.9.4
2020-11-10 15:33:50 -05:00
Joshua Boniface
baac8f24fd
Bump version to 0.9.3
2020-11-09 10:28:15 -05:00
Joshua Boniface
11702f4bc8
Bump version to 0.9.2
2020-11-08 02:03:29 -05:00
Joshua Boniface
6f66b77a00
Lint: E121/E126 continuation line under/over-indented for hanging indent
2020-11-07 15:06:21 -05:00
Joshua Boniface
9135c5e3e4
Lint: E241 multiple spaces after ','
2020-11-07 14:52:39 -05:00
Joshua Boniface
260b39ebf2
Lint: E302 expected 2 blank lines, found X
2020-11-07 14:45:24 -05:00
Joshua Boniface
ab0b932fe3
Lint: E125 continuation line with same indent as next logical line
2020-11-07 13:49:54 -05:00
Joshua Boniface
f5988ad53d
Lint: F821 undefined name 'pool'/'volume'
...
This class is actually entirely unused but is kept for consistency with
the others. It may be used someday for something.
2020-11-07 13:34:18 -05:00
Joshua Boniface
c3dfe2e381
Lint: F821 undefined name 'myshorthostname'
2020-11-07 13:31:19 -05:00
Joshua Boniface
961ebb4c01
Lint: E305 expected 2 blank lines after class or function definition, found X
2020-11-07 13:17:49 -05:00
Joshua Boniface
e553c5d42a
Lint: E122 continuation line missing indentation or outdented
2020-11-07 13:12:26 -05:00
Joshua Boniface
7932be3948
Lint: E261 at least two spaces before inline comment
2020-11-07 13:11:03 -05:00
Joshua Boniface
d2490419c5
Lint: E202 whitespace before ']'
2020-11-07 13:02:54 -05:00
Joshua Boniface
d2e5ede399
Lint: E202 whitespace before ')'
2020-11-07 12:58:54 -05:00
Joshua Boniface
3f242cd437
Lint: E202 whitespace before '}'
2020-11-07 12:57:42 -05:00
Joshua Boniface
b7daa8e1f6
E201 whitespace after '['
2020-11-07 12:39:59 -05:00
Joshua Boniface
c88965e898
Lint: E201 whitespace after '('
2020-11-07 12:39:27 -05:00
Joshua Boniface
e333f2b935
Lint: E201 whitespace after '{'
2020-11-07 12:38:31 -05:00
Joshua Boniface
3cb92fed75
Lint: E401 multiple imports on one line
2020-11-07 12:29:32 -05:00
Joshua Boniface
27c6ac2b66
Lint: W605 invalid escape sequence '\d'
...
This is the only one where forcing an `r` type to the string was
required; the remainder of W605 were replaced with character class
enclosures.
2020-11-07 12:22:20 -05:00
Joshua Boniface
8ba267a59e
Lint: E211 whitespace before '['/'('
2020-11-07 12:20:01 -05:00
Joshua Boniface
39cc992e9b
Lint: E306 expected 1 blank line before a nested definition, found 0
2020-11-07 12:17:38 -05:00
Joshua Boniface
8c623023d5
Lint: F811 redefinition of unused '<function>'
2020-11-07 12:14:29 -05:00
Joshua Boniface
5b3ee363b2
Lint: E222 multiple spaces after operator
2020-11-07 12:10:24 -05:00
Joshua Boniface
fad27a7f4d
Lint: E131 continuation line unaligned for hanging indent
2020-11-06 22:29:49 -05:00
Joshua Boniface
2eef6a1c21
Lint: E265 block comment should start with '# '
2020-11-06 21:32:17 -05:00
Joshua Boniface
4b47a2424c
Lint: E303 too many blank lines (2)
2020-11-06 21:16:52 -05:00
Joshua Boniface
cb2defbde9
Lint: W391 blank line at end of file
2020-11-06 21:14:19 -05:00
Joshua Boniface
5da314902f
Lint: F841 local variable '<variable>' is assigned to but never used
2020-11-06 21:13:13 -05:00
Joshua Boniface
98a573bbc7
Lint: E402 module level import not at top of file
2020-11-06 20:40:32 -05:00
Joshua Boniface
aecb845d6a
Lint: E713 test for membership should be 'not in'
2020-11-06 20:37:52 -05:00
Joshua Boniface
fde8ea2fea
Lint: W291 trailing whitespace
2020-11-06 19:44:14 -05:00
Joshua Boniface
57c51d3234
Lint: E711 comparison to None should be 'if cond is not None:'
2020-11-06 19:37:13 -05:00
Joshua Boniface
ce01b41d81
Lint: E711 comparison to None should be 'if cond is None:'
2020-11-06 19:36:36 -05:00
Joshua Boniface
4d6f36aca0
Lint: E712 comparison to False should be 'if cond is False:' or 'if not cond:'
2020-11-06 19:35:51 -05:00
Joshua Boniface
fb4aafcea9
Lint: E111 indentation is not a multiple of four
2020-11-06 19:26:22 -05:00
Joshua Boniface
d9e7b7ec15
Lint: F401 <library> imported but unused
2020-11-06 19:22:49 -05:00
Joshua Boniface
ebf254f62d
Lint: W293 blank line contains whitespace
2020-11-06 19:11:07 -05:00
Joshua Boniface
63f4f9aed7
Lint: E722 do not use bare 'except'
2020-11-06 18:55:10 -05:00
Joshua Boniface
56ba7b1457
Bump version to 0.9.1
2020-10-29 12:16:38 -04:00
Joshua Boniface
ec0b8acf90
Support per-VM migration type selectors
...
Allow a VM to specify its migration type as a default choice. The valid
options are "default" (i.e. behave as now), "live" which forces a live
migration only, and "shutdown" which forces a shutdown migration only.
The new option is treated as a VM meta option and is set to default if
not found.
2020-10-29 12:01:29 -04:00
Joshua Boniface
5d08ad9573
Fix incorrect keepalive interval setting
2020-10-26 11:44:45 -04:00
Joshua Boniface
0f299777f1
Modify version to 3-digit numbering
...
I expect 0.9 will be fairly long-lived, so add another decimal place so
I may continue adding tweaks to it.
THIS IS NOT SEMVER.
2020-10-26 02:13:11 -04:00
Joshua Boniface
890023cbfc
Make sender wait dynamic based on receiver
2020-10-21 14:43:54 -04:00
Joshua Boniface
28abb018e3
Improve some timeouts and conditionals
2020-10-21 12:00:10 -04:00
Joshua Boniface
017953c2e6
Move lock release to phase D
2020-10-21 11:07:01 -04:00
Joshua Boniface
82b4d3ed1b
Add missing prefix statements to loggers
2020-10-21 10:52:53 -04:00
Joshua Boniface
bae366a316
Add waits and only receive check on send
2020-10-21 10:43:42 -04:00
Joshua Boniface
351076c15e
Check if node changed during final check
...
Avoids situations where two migrates, to different nodes, happen in
rapid succession. Aborts the migration if the current target node no
longer matches what was set at the start of the execution.
2020-10-21 02:52:36 -04:00
Joshua Boniface
42514b9a50
Improve messages further
2020-10-21 02:41:42 -04:00
Joshua Boniface
611e47f338
Add messages to migration aborts
...
Results in some information duplication, but ensures logging of the
reason a migration was aborted separate from the error(s) this may
generate.
2020-10-21 02:38:42 -04:00
Joshua Boniface
1523959074
Move where setting last_ vars happens
2020-10-21 02:24:00 -04:00
Joshua Boniface
ef762359f4
Adjust timing to avoid migrating to self quickly
...
Add another separate state lock, release it earlier, and ensure timings
are good to avoid double-migrating one VM.
2020-10-21 02:17:55 -04:00
Joshua Boniface
398d33778f
Avoid stopping duplicates, just lock our own key
2020-10-20 16:10:39 -04:00
Joshua Boniface
a6d492ed9f
Remove spurious writes and adjust sleep
2020-10-20 16:04:26 -04:00
Joshua Boniface
11fa3b0df3
Remove additional wait and add last_node entries
...
These allow for aborting a migration to retain the previous settings and
override what the client set.
2020-10-20 15:58:55 -04:00
Joshua Boniface
442aa4e420
Tweak timers further
2020-10-20 15:43:59 -04:00
Joshua Boniface
3910843660
Add missing break
2020-10-20 15:39:29 -04:00
Joshua Boniface
70f3fdbfb9
Tweak the delays slightly on receive
2020-10-20 15:38:07 -04:00
Joshua Boniface
7cb0241a12
Attempt live migrates 3 times before proceeding
2020-10-20 15:33:41 -04:00
Joshua Boniface
9fb33ed7a7
Increase peer lock acquiring timers
2020-10-20 15:26:59 -04:00
Joshua Boniface
abfe0108ab
Better handle aborting migrations
2020-10-20 15:22:16 -04:00
Joshua Boniface
567fe8f36b
Wait for existing migrations before proceeding
2020-10-20 15:12:32 -04:00
Joshua Boniface
ec7b78b9b8
Add additional short sleep in receive
2020-10-20 13:29:17 -04:00
Joshua Boniface
224c8082ef
Alter text of synchronization messages
2020-10-20 13:08:18 -04:00
Joshua Boniface
f9e7e9884f
Improve handling of VM migrations
...
The VM migration code was very old, very spaghettified, and prone to
strange failures.
Improve this by taking cues from the node primary migration. Use
synchronization between the nodes to ensure lockstep completion of the
migration in discrete steps.
A proper queue can be built later to integrate with this code more
cleanly.
References #108
2020-10-20 13:01:55 -04:00
Joshua Boniface
726501f4d4
Add additional logging to flush selector
...
Adds additional debug logging to the flush selector to determine how any
why any given node is selected. Useful for troubleshooting strange
choices.
2020-10-20 12:34:18 -04:00
Joshua Boniface
7cc33451b9
Improve Munin check with extinfo
2020-10-19 11:01:00 -04:00
Joshua Boniface
c6e34c7dc6
Bump base version to 0.9
2020-10-18 14:31:19 -04:00
Joshua Boniface
f749633f7c
Use provisioned memory for mem migration selector
...
Use the new "provisioned" memory field, instead of the "allocated"
memory field, to determine the optimal node when using the "mem"
migration selector. This will take into account non-running VMs in the
calculation as well as running VMs.
2020-10-18 14:17:15 -04:00
Joshua Boniface
a4b80be5ed
Add provisioned memory to node info
...
Adds a separate field to the node memory, "provisioned", which totals
the amount of memory provisioned to all VMs on the node, regardless of
state, and in contrast to "allocated" which only counts running VMs.
Allows for the detection of potential overprovisioned states when
factoring in non-running VMs.
Includes the supporting code to get this data, since the original
implementation of VM memory selection was dependent on the VM being
running and getting this from libvirt. Now, if the VM is not active, it
gets this from the domain XML instead.
2020-10-18 14:17:15 -04:00
Joshua Boniface
aa5f8c93fd
Entirely disable IPv6 on bridged interfaces
...
Prevents any potential leakage due to autoconfigured IPv6 on bridged
interfaces. These are exclusively VM-side bridges, and the PVC host
should not have any IPv6 configuration on them, ever.
2020-10-15 11:00:59 -04:00
Joshua Boniface
9366977fe6
Copy d_domain before iterating
...
Prevents a bug where the thread can crash due to a change in the
d_domain object while running the for loop. By copying and iterating
over the copy, this becomes safer.
2020-09-16 15:12:37 -04:00
Joshua Boniface
65b44f2955
Avoid breaking keepalive during incoming migration
...
The keepalive was getting stuck gathering memoryStats from the
non-running VM, since it was in a paused state. Avoid this by just
skipping past the rest of the stats gathering if the VM isn't running.
2020-08-28 01:47:36 -04:00
Joshua Boniface
78dec77987
Bump version to 0.8
2020-08-26 10:24:44 -04:00
Joshua Boniface
1dcc1f6d55
Rename sample database for API
...
From pvcprov to pvcapi to facilitate the changing nature of this
database and its expansion to benchmark results.
2020-08-25 01:59:35 -04:00
Joshua Boniface
921e57ca78
Fix syntax error
2020-08-20 23:05:56 -04:00
Joshua Boniface
3cc7df63f2
Add configurable VM shutdown timeout
...
Closes #102
2020-08-20 21:26:12 -04:00
Joshua Boniface
7e2114b536
Add initial monitoring configurations to daemon
...
Initial work to support multiple monitoring agents including Munin,
Check_MK, and NRPE at the least.
2020-08-17 17:05:55 -04:00
Joshua Boniface
e8e65934e3
Use logger prefix for thread debug logs
2020-08-17 14:30:21 -04:00
Joshua Boniface
24fda8a73f
Use new debug logger for DNS Aggregator
2020-08-17 14:26:43 -04:00
Joshua Boniface
9b3ef6d610
Add connect timeout to Ceph
...
This doesn't seem to actually do anything (like most of these
timeouts...) but add it just for posterity.
2020-08-17 13:58:14 -04:00
Joshua Boniface
b451c0e8e3
Add additional start/finish debug messages
2020-08-17 13:11:03 -04:00
Joshua Boniface
f9b126a106
Make zkhandler accept failures more robustly
...
Most of these would silently fail if there was e.g. an issue with the ZK
connection. Instead, encase things in try blocks and handle the
exceptions in a more graceful way, returning None or False if
applicable. Except for locks, which should retry 5 times before
aborting.
2020-08-17 13:03:36 -04:00
Joshua Boniface
553f96e7ef
Use logger for debug output
...
Using simple print statements was annoying (lack of timing info and
formatting), so move to using the debug logger for these instead with a
custom state ('d') with white text to differentiate them. Also indicate
which subthread of the keepalive each task is being executed in for
easier tracing of issues.
2020-08-17 12:46:52 -04:00
Joshua Boniface
65add58c9a
Properly properly handle issue
2020-08-16 11:38:39 -04:00
Joshua Boniface
0a01d84290
Tie fence timers to keepalive_interval
...
Also wait 2 full keepalive intervals after fencing before doing anything
else, to give the Ceph cluster a chance to recover.
2020-08-15 12:38:03 -04:00
Joshua Boniface
4afb288429
Properly handle missing domain_name fail
2020-08-15 12:07:23 -04:00
Joshua Boniface
985ad5edc0
Warn if fencing will fail
...
Verify our IPMI state on startup, and then warn if fencing will fail.
For now, this is sufficient, but in future (requires refactoring) we
might want to adjust how fencing occurs based on this information.
2020-08-13 14:42:18 -04:00
Joshua Boniface
0587bcbd67
Go back to manual command for OSD stats
...
Using the Ceph library was a disaster here; it had no timeout or way to
force it to continue, so keepalives would become stuck and trigger fence
storms. Go back to the manual osd dump command with a 2s timeout which
is far more reliable and can be adequately terminated if it runs long.
2020-08-12 22:31:25 -04:00
Joshua Boniface
09c1bb6a46
Increase start delay of flush service
2020-08-11 14:17:35 -04:00
Joshua Boniface
e0cb4a58c3
Ensure zk_listener is readded after reconnect
2020-08-11 12:46:15 -04:00
Joshua Boniface
099c58ead8
Fix missing char in log message
2020-08-11 12:40:35 -04:00