Commit Graph

656 Commits

Author SHA1 Message Date
Joshua Boniface 8a4a41e092 Convert NodeInstance to new zkhandler 2021-06-01 11:27:35 -04:00
Joshua Boniface a48bf2d71e More gracefully handle none selectors
Allow selection of "none" as the node selector, and handle this by
always using the cluster default instead of writing it in.
2021-06-01 11:13:13 -04:00
Joshua Boniface a0b9087167 Set Daemon migration selector in zookeeper 2021-06-01 10:52:41 -04:00
Joshua Boniface 33a54cf7f2 Move configuration keys to /config tree 2021-06-01 10:48:55 -04:00
Joshua Boniface d6a8cf9780 Convert MetadataAPIInstance to new zkhandler 2021-05-31 19:55:09 -04:00
Joshua Boniface abd619a3c1 Convert DNSAggregatorInstance to new zkhandler 2021-05-31 19:55:01 -04:00
Joshua Boniface ef5fe78125 Convert CepnInstance to new zkhandler 2021-05-31 19:51:27 -04:00
Joshua Boniface f6d0e89568 Properly add absent node type 2021-05-31 19:26:27 -04:00
Joshua Boniface ede3e88cd7 Modify node daemon root to use updated zkhandler 2021-05-31 03:14:09 -04:00
Joshua Boniface c23a53d082 Add daemon_lib symlink to pvcnoded 2021-05-30 00:00:07 -04:00
Joshua Boniface 0c75a127b2 Bump version to 0.9.18 2021-05-23 17:23:10 -04:00
Joshua Boniface 9de14c46fb Bump version to 0.9.17 2021-05-19 17:06:29 -04:00
Joshua Boniface fe15bdb854 Bump version to 0.9.16 2021-05-10 01:13:21 -04:00
Joshua Boniface b851a6209c Catch all other exceptions in subprocess run
Found a rare glitch where the subprocess pipes would not engage, causing
a daemon crash. Catch these exceptions with a retcode of 255 instead of
bailing out.

Closes #124
2021-05-10 01:07:25 -04:00
Joshua Boniface 5ceb57e540 Handle emptying corrupted console log files
Libvirt will someones write junk out to console log files, which breaks
the log parser deque with a UnicodeDecodeError.

If this happens, clear the log and re-open the deque again for newer
updates.

Closes #123
2021-05-10 01:03:04 -04:00
Joshua Boniface 669338c22b Bump version to 0.9.15 2021-04-08 13:37:47 -04:00
Joshua Boniface c4ac75b973 Bump version to 0.9.14 2021-03-30 10:27:37 -04:00
Joshua Boniface 0bf276fd51 Update copyright year in headers 2021-03-25 17:01:55 -04:00
Joshua Boniface f4ec161aa2 Update file copyright header.
Remove the option to select a later version of the GPL.
2021-03-25 16:58:02 -04:00
Joshua Boniface 0ccfc41398 Bump version to 0.9.13 2021-02-17 11:37:59 -05:00
Joshua Boniface 9100c63e99 Add stored_bytes to pool stats information 2021-02-09 01:46:01 -05:00
Joshua Boniface aba567d6c9 Add nice startup banners to both daemons
Add nicer easy-to-find (yay ASCII art) banners for the startup printouts
of both the node and API daemons. Also adds the safe loader to pvcnoded
to prevent hassle messages and a version string in the API daemon file.
2021-02-08 02:51:43 -05:00
Joshua Boniface 0db8fd9da6 Bump version to 0.9.12 2021-01-28 16:29:58 -05:00
Joshua Boniface a44f134230 Remove systemd deps on zookeeper and libvirt
This caused a serious race condition, since the IPs managed by PVC had
not yet come up, but Zookeeper was trying to start and bind to them,
which of course failed.

Remove these dependencies entirely - the daemon itself starts these
services during initialization and they do not need to be started by
systemd first.
2021-01-28 16:25:02 -05:00
Joshua Boniface 9fbe35fd24 Bump version to 0.9.11 2021-01-05 15:58:26 -05:00
Joshua Boniface a24724d9f0 Use external ceph cmd for ceph df 2020-12-26 14:04:21 -05:00
Joshua Boniface 78c017d51d Remove erroneous extra colon in log output 2020-12-20 16:06:35 -05:00
Joshua Boniface 1b6613c280 Add live VNC information to domain output
Sets in the node daemon, returns via the API, and shows in the CLI,
information about the live VNC listen address and port for VNC-enabled
VMs.

Closes #115
2020-12-20 16:00:55 -05:00
Joshua Boniface d6ef722997 Fix bad log message 2020-12-15 10:51:52 -05:00
Joshua Boniface 518d699c15 Bump version to 0.9.10 2020-12-15 10:45:15 -05:00
Joshua Boniface ac3ef3d792 Revamp fencing order
Prevents unnecessarily excessive timeouts if IPMI connections time out;
before, would have to go through 3 timed out commands at ~20s each
before failure was registered; reduced to 1 if the first times out.
2020-12-15 02:48:25 -05:00
Joshua Boniface 3705daff43 Better handle failing RBD lock frees
If the VM is not in a stop state, failing to free the lock is now
considered a fatal error and will put the domain into fail state,
aborting the start. This is better than being unsafe or trying to start
a VM which will fail to boot due to read-only volumes.
2020-12-14 16:04:38 -05:00
Joshua Boniface 7c99a7bda7 Safely reset RBD locks on failed VMs
Should correct issues on cold start as well as if a VM crashes
uncleanly, which would prevent the VM from starting due to stale RBD
locks.

This implementation has four parts:
  1. Update how IP addresses are handled, specifically by replacing all
  previous instances of "vni_ipaddr" with "vni_floatingipaddr", and then
  adding the "vni_ipaddr" with the real data for this node's IPs. Also
  include the storage IPs in this where they weren't before, so each
  this_node actually has the local IPs plus floating IPs. This enables
  the next two steps.
  2. Modify flush_locks to take this_node as an argument, and update the
  run_command function to only operate against this node, rather than on
  the primary coordinator.
  3. Have the flush_locks check each lock against the current node, to
  verify that the lock is actually held by the current node. This is the
  only way to do this safely. During fencing, we override this by not
  passing a this_node which bypasses this check.
  4. Have the VM start do the check for VM failure/startup and execute a
  flush_locks before actually starting the VM.
2020-12-14 15:53:18 -05:00
Joshua Boniface 89c7e225a0 Move OSD stats uploading to primary only
Instead of each node uploading its own OSD stats, which would not work
if the PVC daemon wasn't running, instead have the primary upload stats
for all OSDs in the cluster.
2020-12-09 02:46:09 -05:00
Joshua Boniface b36ec43a2d Bump version to 0.9.9 2020-12-09 02:20:20 -05:00
Joshua Boniface ce5ee11841 Bump version to 0.9.8 2020-11-24 12:26:57 -05:00
Joshua Boniface d4a28d7a58 Bump version to 0.9.7 2020-11-19 10:48:28 -05:00
Joshua Boniface e69eb93cb3 Bump version to 0.9.6 2020-11-17 13:01:54 -05:00
Joshua Boniface 70dfcd434f Ensure inmigrate is cleared on failure 2020-11-17 12:57:37 -05:00
Joshua Boniface a4e5323e81 Bump version to 0.9.5 2020-11-17 12:34:04 -05:00
Joshua Boniface 9053edacd8 Bump version to 0.9.4 2020-11-10 15:33:50 -05:00
Joshua Boniface baac8f24fd Bump version to 0.9.3 2020-11-09 10:28:15 -05:00
Joshua Boniface 11702f4bc8 Bump version to 0.9.2 2020-11-08 02:03:29 -05:00
Joshua Boniface 6f66b77a00 Lint: E121/E126 continuation line under/over-indented for hanging indent 2020-11-07 15:06:21 -05:00
Joshua Boniface 9135c5e3e4 Lint: E241 multiple spaces after ',' 2020-11-07 14:52:39 -05:00
Joshua Boniface 260b39ebf2 Lint: E302 expected 2 blank lines, found X 2020-11-07 14:45:24 -05:00
Joshua Boniface ab0b932fe3 Lint: E125 continuation line with same indent as next logical line 2020-11-07 13:49:54 -05:00
Joshua Boniface f5988ad53d Lint: F821 undefined name 'pool'/'volume'
This class is actually entirely unused but is kept for consistency with
the others. It may be used someday for something.
2020-11-07 13:34:18 -05:00
Joshua Boniface c3dfe2e381 Lint: F821 undefined name 'myshorthostname' 2020-11-07 13:31:19 -05:00
Joshua Boniface 961ebb4c01 Lint: E305 expected 2 blank lines after class or function definition, found X 2020-11-07 13:17:49 -05:00
Joshua Boniface e553c5d42a Lint: E122 continuation line missing indentation or outdented 2020-11-07 13:12:26 -05:00
Joshua Boniface 7932be3948 Lint: E261 at least two spaces before inline comment 2020-11-07 13:11:03 -05:00
Joshua Boniface d2490419c5 Lint: E202 whitespace before ']' 2020-11-07 13:02:54 -05:00
Joshua Boniface d2e5ede399 Lint: E202 whitespace before ')' 2020-11-07 12:58:54 -05:00
Joshua Boniface 3f242cd437 Lint: E202 whitespace before '}' 2020-11-07 12:57:42 -05:00
Joshua Boniface b7daa8e1f6 E201 whitespace after '[' 2020-11-07 12:39:59 -05:00
Joshua Boniface c88965e898 Lint: E201 whitespace after '(' 2020-11-07 12:39:27 -05:00
Joshua Boniface e333f2b935 Lint: E201 whitespace after '{' 2020-11-07 12:38:31 -05:00
Joshua Boniface 3cb92fed75 Lint: E401 multiple imports on one line 2020-11-07 12:29:32 -05:00
Joshua Boniface 27c6ac2b66 Lint: W605 invalid escape sequence '\d'
This is the only one where forcing an `r` type to the string was
required; the remainder of W605 were replaced with character class
enclosures.
2020-11-07 12:22:20 -05:00
Joshua Boniface 8ba267a59e Lint: E211 whitespace before '['/'(' 2020-11-07 12:20:01 -05:00
Joshua Boniface 39cc992e9b Lint: E306 expected 1 blank line before a nested definition, found 0 2020-11-07 12:17:38 -05:00
Joshua Boniface 8c623023d5 Lint: F811 redefinition of unused '<function>' 2020-11-07 12:14:29 -05:00
Joshua Boniface 5b3ee363b2 Lint: E222 multiple spaces after operator 2020-11-07 12:10:24 -05:00
Joshua Boniface fad27a7f4d Lint: E131 continuation line unaligned for hanging indent 2020-11-06 22:29:49 -05:00
Joshua Boniface 2eef6a1c21 Lint: E265 block comment should start with '# ' 2020-11-06 21:32:17 -05:00
Joshua Boniface 4b47a2424c Lint: E303 too many blank lines (2) 2020-11-06 21:16:52 -05:00
Joshua Boniface cb2defbde9 Lint: W391 blank line at end of file 2020-11-06 21:14:19 -05:00
Joshua Boniface 5da314902f Lint: F841 local variable '<variable>' is assigned to but never used 2020-11-06 21:13:13 -05:00
Joshua Boniface 98a573bbc7 Lint: E402 module level import not at top of file 2020-11-06 20:40:32 -05:00
Joshua Boniface aecb845d6a Lint: E713 test for membership should be 'not in' 2020-11-06 20:37:52 -05:00
Joshua Boniface fde8ea2fea Lint: W291 trailing whitespace 2020-11-06 19:44:14 -05:00
Joshua Boniface 57c51d3234 Lint: E711 comparison to None should be 'if cond is not None:' 2020-11-06 19:37:13 -05:00
Joshua Boniface ce01b41d81 Lint: E711 comparison to None should be 'if cond is None:' 2020-11-06 19:36:36 -05:00
Joshua Boniface 4d6f36aca0 Lint: E712 comparison to False should be 'if cond is False:' or 'if not cond:' 2020-11-06 19:35:51 -05:00
Joshua Boniface fb4aafcea9 Lint: E111 indentation is not a multiple of four 2020-11-06 19:26:22 -05:00
Joshua Boniface d9e7b7ec15 Lint: F401 <library> imported but unused 2020-11-06 19:22:49 -05:00
Joshua Boniface ebf254f62d Lint: W293 blank line contains whitespace 2020-11-06 19:11:07 -05:00
Joshua Boniface 63f4f9aed7 Lint: E722 do not use bare 'except' 2020-11-06 18:55:10 -05:00
Joshua Boniface 56ba7b1457 Bump version to 0.9.1 2020-10-29 12:16:38 -04:00
Joshua Boniface ec0b8acf90 Support per-VM migration type selectors
Allow a VM to specify its migration type as a default choice. The valid
options are "default" (i.e. behave as now), "live" which forces a live
migration only, and "shutdown" which forces a shutdown migration only.
The new option is treated as a VM meta option and is set to default if
not found.
2020-10-29 12:01:29 -04:00
Joshua Boniface 5d08ad9573 Fix incorrect keepalive interval setting 2020-10-26 11:44:45 -04:00
Joshua Boniface 0f299777f1 Modify version to 3-digit numbering
I expect 0.9 will be fairly long-lived, so add another decimal place so
I may continue adding tweaks to it.

THIS IS NOT SEMVER.
2020-10-26 02:13:11 -04:00
Joshua Boniface 890023cbfc Make sender wait dynamic based on receiver 2020-10-21 14:43:54 -04:00
Joshua Boniface 28abb018e3 Improve some timeouts and conditionals 2020-10-21 12:00:10 -04:00
Joshua Boniface 017953c2e6 Move lock release to phase D 2020-10-21 11:07:01 -04:00
Joshua Boniface 82b4d3ed1b Add missing prefix statements to loggers 2020-10-21 10:52:53 -04:00
Joshua Boniface bae366a316 Add waits and only receive check on send 2020-10-21 10:43:42 -04:00
Joshua Boniface 351076c15e Check if node changed during final check
Avoids situations where two migrates, to different nodes, happen in
rapid succession. Aborts the migration if the current target node no
longer matches what was set at the start of the execution.
2020-10-21 02:52:36 -04:00
Joshua Boniface 42514b9a50 Improve messages further 2020-10-21 02:41:42 -04:00
Joshua Boniface 611e47f338 Add messages to migration aborts
Results in some information duplication, but ensures logging of the
reason a migration was aborted separate from the error(s) this may
generate.
2020-10-21 02:38:42 -04:00
Joshua Boniface 1523959074 Move where setting last_ vars happens 2020-10-21 02:24:00 -04:00
Joshua Boniface ef762359f4 Adjust timing to avoid migrating to self quickly
Add another separate state lock, release it earlier, and ensure timings
are good to avoid double-migrating one VM.
2020-10-21 02:17:55 -04:00
Joshua Boniface 398d33778f Avoid stopping duplicates, just lock our own key 2020-10-20 16:10:39 -04:00
Joshua Boniface a6d492ed9f Remove spurious writes and adjust sleep 2020-10-20 16:04:26 -04:00
Joshua Boniface 11fa3b0df3 Remove additional wait and add last_node entries
These allow for aborting a migration to retain the previous settings and
override what the client set.
2020-10-20 15:58:55 -04:00
Joshua Boniface 442aa4e420 Tweak timers further 2020-10-20 15:43:59 -04:00
Joshua Boniface 3910843660 Add missing break 2020-10-20 15:39:29 -04:00
Joshua Boniface 70f3fdbfb9 Tweak the delays slightly on receive 2020-10-20 15:38:07 -04:00
Joshua Boniface 7cb0241a12 Attempt live migrates 3 times before proceeding 2020-10-20 15:33:41 -04:00
Joshua Boniface 9fb33ed7a7 Increase peer lock acquiring timers 2020-10-20 15:26:59 -04:00
Joshua Boniface abfe0108ab Better handle aborting migrations 2020-10-20 15:22:16 -04:00
Joshua Boniface 567fe8f36b Wait for existing migrations before proceeding 2020-10-20 15:12:32 -04:00
Joshua Boniface ec7b78b9b8 Add additional short sleep in receive 2020-10-20 13:29:17 -04:00
Joshua Boniface 224c8082ef Alter text of synchronization messages 2020-10-20 13:08:18 -04:00
Joshua Boniface f9e7e9884f Improve handling of VM migrations
The VM migration code was very old, very spaghettified, and prone to
strange failures.

Improve this by taking cues from the node primary migration. Use
synchronization between the nodes to ensure lockstep completion of the
migration in discrete steps.

A proper queue can be built later to integrate with this code more
cleanly.

References #108
2020-10-20 13:01:55 -04:00
Joshua Boniface 726501f4d4 Add additional logging to flush selector
Adds additional debug logging to the flush selector to determine how any
why any given node is selected. Useful for troubleshooting strange
choices.
2020-10-20 12:34:18 -04:00
Joshua Boniface 7cc33451b9 Improve Munin check with extinfo 2020-10-19 11:01:00 -04:00
Joshua Boniface c6e34c7dc6 Bump base version to 0.9 2020-10-18 14:31:19 -04:00
Joshua Boniface f749633f7c Use provisioned memory for mem migration selector
Use the new "provisioned" memory field, instead of the "allocated"
memory field, to determine the optimal node when using the "mem"
migration selector. This will take into account non-running VMs in the
calculation as well as running VMs.
2020-10-18 14:17:15 -04:00
Joshua Boniface a4b80be5ed Add provisioned memory to node info
Adds a separate field to the node memory, "provisioned", which totals
the amount of memory provisioned to all VMs on the node, regardless of
state, and in contrast to "allocated" which only counts running VMs.

Allows for the detection of potential overprovisioned states when
factoring in non-running VMs.

Includes the supporting code to get this data, since the original
implementation of VM memory selection was dependent on the VM being
running and getting this from libvirt. Now, if the VM is not active, it
gets this from the domain XML instead.
2020-10-18 14:17:15 -04:00
Joshua Boniface aa5f8c93fd Entirely disable IPv6 on bridged interfaces
Prevents any potential leakage due to autoconfigured IPv6 on bridged
interfaces. These are exclusively VM-side bridges, and the PVC host
should not have any IPv6 configuration on them, ever.
2020-10-15 11:00:59 -04:00
Joshua Boniface 9366977fe6 Copy d_domain before iterating
Prevents a bug where the thread can crash due to a change in the
d_domain object while running the for loop. By copying and iterating
over the copy, this becomes safer.
2020-09-16 15:12:37 -04:00
Joshua Boniface 65b44f2955 Avoid breaking keepalive during incoming migration
The keepalive was getting stuck gathering memoryStats from the
non-running VM, since it was in a paused state. Avoid this by just
skipping past the rest of the stats gathering if the VM isn't running.
2020-08-28 01:47:36 -04:00
Joshua Boniface 78dec77987 Bump version to 0.8 2020-08-26 10:24:44 -04:00
Joshua Boniface 1dcc1f6d55 Rename sample database for API
From pvcprov to pvcapi to facilitate the changing nature of this
database and its expansion to benchmark results.
2020-08-25 01:59:35 -04:00
Joshua Boniface 921e57ca78 Fix syntax error 2020-08-20 23:05:56 -04:00
Joshua Boniface 3cc7df63f2 Add configurable VM shutdown timeout
Closes #102
2020-08-20 21:26:12 -04:00
Joshua Boniface 7e2114b536 Add initial monitoring configurations to daemon
Initial work to support multiple monitoring agents including Munin,
Check_MK, and NRPE at the least.
2020-08-17 17:05:55 -04:00
Joshua Boniface e8e65934e3 Use logger prefix for thread debug logs 2020-08-17 14:30:21 -04:00
Joshua Boniface 24fda8a73f Use new debug logger for DNS Aggregator 2020-08-17 14:26:43 -04:00
Joshua Boniface 9b3ef6d610 Add connect timeout to Ceph
This doesn't seem to actually do anything (like most of these
timeouts...) but add it just for posterity.
2020-08-17 13:58:14 -04:00
Joshua Boniface b451c0e8e3 Add additional start/finish debug messages 2020-08-17 13:11:03 -04:00
Joshua Boniface f9b126a106 Make zkhandler accept failures more robustly
Most of these would silently fail if there was e.g. an issue with the ZK
connection. Instead, encase things in try blocks and handle the
exceptions in a more graceful way, returning None or False if
applicable. Except for locks, which should retry 5 times before
aborting.
2020-08-17 13:03:36 -04:00
Joshua Boniface 553f96e7ef Use logger for debug output
Using simple print statements was annoying (lack of timing info and
formatting), so move to using the debug logger for these instead with a
custom state ('d') with white text to differentiate them. Also indicate
which subthread of the keepalive each task is being executed in for
easier tracing of issues.
2020-08-17 12:46:52 -04:00
Joshua Boniface 65add58c9a Properly properly handle issue 2020-08-16 11:38:39 -04:00
Joshua Boniface 0a01d84290 Tie fence timers to keepalive_interval
Also wait 2 full keepalive intervals after fencing before doing anything
else, to give the Ceph cluster a chance to recover.
2020-08-15 12:38:03 -04:00
Joshua Boniface 4afb288429 Properly handle missing domain_name fail 2020-08-15 12:07:23 -04:00
Joshua Boniface 985ad5edc0 Warn if fencing will fail
Verify our IPMI state on startup, and then warn if fencing will fail.
For now, this is sufficient, but in future (requires refactoring) we
might want to adjust how fencing occurs based on this information.
2020-08-13 14:42:18 -04:00
Joshua Boniface 0587bcbd67 Go back to manual command for OSD stats
Using the Ceph library was a disaster here; it had no timeout or way to
force it to continue, so keepalives would become stuck and trigger fence
storms. Go back to the manual osd dump command with a 2s timeout which
is far more reliable and can be adequately terminated if it runs long.
2020-08-12 22:31:25 -04:00
Joshua Boniface 09c1bb6a46 Increase start delay of flush service 2020-08-11 14:17:35 -04:00
Joshua Boniface e0cb4a58c3 Ensure zk_listener is readded after reconnect 2020-08-11 12:46:15 -04:00
Joshua Boniface 099c58ead8 Fix missing char in log message 2020-08-11 12:40:35 -04:00
Joshua Boniface 0e5c681ada Clean up imports
Make several imports more specific to reduce redundant code imports and
improve memory utilization.
2020-08-11 12:09:10 -04:00
Joshua Boniface 46ffe352e3 Better handle subthread timeouts in keepalive
Prevent the main keepalive thread from getting stuck due to a subthread
taking an enormous time. If this happens, the rest of the main keepalive
will continue onward, thus ensuring that the main keepalive does not
fail for a significant number of cycles, which would cause a fence.
2020-08-11 11:37:26 -04:00
Joshua Boniface ccee124c8b Adjust fence failcount limit to 6 (30s)
The previous saving throw limit (3/15s) seems to have been too low. I
was observing bizarre failures where a node would be fenced while it was
still starting up. Some of this may have been related to Zookeeper
connections taking too long, but this was inconsistent.

Increase this to 6 saving throws (30s). This provides significantly more
time for a node to properly check in on startup before another node
fences it. In the real world, 15s vs 30s isn't that big of a downtime
change, but prevents false-positive fences.
2020-08-05 22:40:07 -04:00
Joshua Boniface 02343079c0 Improve fencing migrate layout
Open the option to do this in parallel with some threads
2020-08-05 22:26:01 -04:00
Joshua Boniface 37b83aad6a Add logging and use better conditional 2020-08-05 21:57:36 -04:00
Joshua Boniface 876f2424e0 Ensure dead state isn't written erroneously 2020-08-05 21:57:11 -04:00
Joshua Boniface 5871380e1b Avoid crashing VM stats thread if domain migrated 2020-06-10 17:10:46 -04:00
Joshua Boniface 654a3cb7fa Improve debug output and use ceph df util data 2020-06-06 22:52:49 -04:00
Joshua Boniface 9b65d3271a Improve handling of Ceph status gathering
Use the Rados library instead of random OS commands, which massively
improves the performance of these tasks.

Closes #97
2020-06-06 22:30:25 -04:00
Joshua Boniface 598b2025e8 Use Rados and add Ceph entries to pvcnoded.yaml 2020-06-06 21:12:51 -04:00
Joshua Boniface 70b787d1fd Move all VM functions into thread 2020-06-06 15:44:05 -04:00
Joshua Boniface e1310a05f2 Implement recording of VM stats during keepalive 2020-06-06 15:34:03 -04:00
Joshua Boniface 2ad6860dfe Move Ceph statistics gathering into thread 2020-06-06 13:25:02 -04:00
Joshua Boniface cebb4bbc1a Comment cleanup 2020-06-06 13:20:40 -04:00
Joshua Boniface a672e06dd2 Move fencing to end of keepalive function 2020-06-06 13:19:11 -04:00
Joshua Boniface 1db73bb892 Move libvirt closure into previous section 2020-06-06 13:18:37 -04:00
Joshua Boniface c1956072f0 Rename update_zookeeper function to node_keepalive 2020-06-06 12:49:50 -04:00
Joshua Boniface ce60836c34 Allow enforcement of live migration
Provides a CLI and API argument to force live migration, which triggers
a new VM state "migrate-live". The node daemon VMInstance during migrate
will read this flag from the state and, if enforced, will not trigger a
shutdown migration.

Closes #95
2020-06-06 12:00:44 -04:00
Joshua Boniface b5434ba744 Fix typo in variable name 2020-06-06 11:29:48 -04:00
Joshua Boniface b9e5b14f94 Update lastnode too if a self-migrate is aborted
References #92
2020-06-04 10:28:04 -04:00
Joshua Boniface 5d2031d99e Prevent a VM migrating to the same node
Prevents a rare edge case where a node can end up "migrating" to itself.
Quick hack to fix this, though like most of the VM management should
probably be rethought/rewritten later.

Fixes #92
2020-06-04 10:26:47 -04:00
Joshua Boniface 5f9836f96d Add error message to OSD parse fail 2020-05-12 11:04:38 -04:00
Joshua Boniface 95c59ba629 Improve flush handling slightly 2020-05-12 11:04:38 -04:00
Joshua Boniface 72a38fd437 Correct changed dhcp_reservations key name 2020-05-09 10:00:53 -04:00
Joshua Boniface b580760537 Add missing fmt_cyan variable 2020-05-08 18:15:02 -04:00
Joshua Boniface 331027d124 Add further tweaks to takeover state checks
Just ensure that everything is proper state before proceeding
2020-04-22 11:16:19 -04:00
Joshua Boniface ae4f36b881 Hook flush into more services
Trying to ensure that pvc-flush completes before anything tries to shut
down.
2020-04-14 19:58:53 -04:00
Joshua Boniface 611e0edd80 Reorder last keepalive during cleanup
Make sure the stopping of the keepalive timer and final keepalive update
are done as the last step before complete shutdown. The previous setup
could conceivably result in a node being fenced should the cleanup
operations take longer than ~45 seconds, for instance if primary node
switchover took too long or blocked, or log watchers failed to stop
quickly enough. Ensures that keepalives will continue to be run during
the shutdown process until the last possible moment.
2020-04-12 03:49:29 -04:00
Joshua Boniface b413e042a6 Improve handling of primary contention
Previously, contention could occasionally cause a flap/dual primary
contention state due to the lack of checking within this function. This
could cause a state where a node transitions to primary than is almost
immediately shifted away, which could cause undefined behaviour in the
cluster.

The solution includes several elements:
    * Implement an exclusive lock operation in zkhandler
    * Switch the become_primary function to use this exclusive lock
    * Implement exclusive locking during the contention process
    * As a failsafe, check stat versions before setting the node as the
      primary node, in case another node already has
    * Delay the start of takeover/relinquish operations by slightly
      longer than the lock timeout
    * Make the current router_state conditions more explicit (positive
      conditionals rather than negative conditionals)

The new scenario ensures that during contention, only one secondary will
ever succeed at acquiring the lock. Ideally, the other would then grab
the lock and pass, but in testing this does not seem to be the case -
the lock always times out, so the failsafe check is technically not
needed but has been left as an added safety mechanism. With this setup,
the node that fails the contention will never block the switchover nor
will it try to force itself onto the cluster after another node has
successfully won contention.

Timeouts may need to be adjusted in the future, but the base timeout of
0.4 seconds (and transition delay of 0.5 seconds) seems to work reliably
during preliminary tests.
2020-04-12 03:40:17 -04:00
Joshua Boniface e672d799a6 Set flush after pvcapid.service
This may or may not help, but should in theory prevent the flush from
trying to run after a (locally-running) API daemon is terminated, which
could cause an API failure and a failure to flush.
2020-04-12 01:48:50 -04:00
Joshua Boniface a130f19a19 Depend pvcnoded on Zookeeper (harder) and libvirtd 2020-04-09 09:57:53 -04:00
Joshua Boniface a671d9d457 Use consistent tense in messages 2020-04-08 22:00:51 -04:00
Joshua Boniface fee1c7dd6c Reorder cleanup and gracefully wait for flushes 2020-04-08 22:00:08 -04:00
Joshua Boniface 5d58bee34f Add some time around noded startup/shutdown
Otherwise, systemd kills networking before the node daemon fully stops
and it goes into "dead" status, which is super annoying.
2020-04-01 23:59:14 -04:00
Joshua Boniface f668412941 Don't use Requires as the dep is too hard
Requires seems to flush on every service restart which is NOT what we
want. Use Wants instead.
2020-04-01 15:15:37 -04:00
Joshua Boniface a0ebc0d3a7 Add more robust requirements to pvc-flush service 2020-04-01 15:09:44 -04:00
Joshua Boniface 98a7005c1b Add significant TimeoutSec to pvc-flush service
This will stop systemd from killing the service in the middle of a flush
or unflush operation, which completely defeats the purpose. 30 minutes
was chosen as this is a very large but still somewhat manageable value,
which should cover even a very large very loaded cluster with room to
spare.
2020-04-01 01:24:09 -04:00
Joshua Boniface 0a367898a0 Don't trigger aggregator fail if fine 2020-03-12 13:22:12 -04:00
Joshua Boniface c02bc0b46a Correct issues with VM lock freeing
Code was bad and using a depricated feature.
2020-03-02 12:45:12 -05:00
Joshua Boniface 1e4350ca6f Properly handle takeover state in VXNetworks
Most of these actions/conditionals were looking for primary state, but
were failing during node takeover. Update the conditionals to look for
both router states instead.

Also add a wait to lock flushing until a takeover is completed.
2020-03-02 10:41:00 -05:00
Joshua Boniface 57768f2583 Remove an obsolete script 2020-02-19 21:40:23 -05:00
Joshua Boniface e4e4e336b4 Handle invalid cursor setup cleanly
This seems to happen only during termination, so catch it and continue
so the loop terminates.
2020-02-19 16:29:59 -05:00
Joshua Boniface d2a5fe59c0 Use transitional takeover states for migration
Use a pair of transitional states, "takeover" and "relinquish", when
transitioning between primary and secondary coordinator states. This
provides a clsuter-wide record that the nodes are still working during
their synchronous transition states, and should allow clients to
determine when the node(s) have fully switched over. Also add an
additional 2 seconds of wait at the end of the transition jobs to ensure
everything has had a chance to start before proceeding.

References #72
2020-02-19 14:06:54 -05:00
Joshua Boniface 9c7041f12c Update package version to 0.7 2020-02-15 23:25:47 -05:00
Joshua Boniface 7ace5b5056 Remove /ceph/cmd pipe for (most) Ceph commands
Addresses #80
2020-02-08 23:40:02 -05:00
Joshua Boniface 37310e5455 Correct name of systemd target 2020-02-08 20:39:07 -05:00
Joshua Boniface ce985234c3 Use consistent naming of components
Rename "pvcd" to "pvcnoded", and "pvc-api" to "pvcapid" so names for the
daemons are fully consistent. Update the names of the configuration
files as well to match this new formatting.

References #79
2020-02-08 19:34:07 -05:00
Joshua Boniface 4505b239eb Rename API and common Debian packages
Closes #79
2020-02-08 18:50:38 -05:00
Joshua Boniface 74228eb063 Bump version to 0.6 2020-02-08 18:27:39 -05:00
Joshua Boniface 90e42683c6 Reduce sleep time during VM migrations 2020-02-04 17:52:37 -05:00
Joshua Boniface 20c8466296 Handle invalid search fields better 2020-02-04 17:35:24 -05:00
Joshua Boniface ab28bf40d1 Change ordering of services during primary switch
Fixes #77
2020-01-30 09:18:56 -05:00
Joshua Boniface 5d73974e95 Fix several bugs around load-based migrations 2020-01-29 17:35:10 -05:00
Joshua Boniface 0b31bab797 Add more helpful config parse error message 2020-01-22 12:09:31 -05:00
Joshua Boniface 4c1b78d7a4 Use dictionary get() to prevent crashes
Use the get() function throughout to prevent crashes in various
scenarios if the profile data isn't present or consistent.
2020-01-13 09:21:57 -05:00
Joshua Boniface 4ad29f669d Update default configuration samples 2020-01-12 21:33:15 -05:00
Joshua Boniface 0d2e22a111 Normalize all static networks with bridges
Modifies the storage and upstream networks to mirror the cluster
network, with a bridge on top of the underlying specified dev, and all
IPs bound to the bridge.

Allows creating VMs in the storage or upstream networks, as well as the
cluster network, should the administrator choose to do so (manually).
2020-01-12 19:04:31 -05:00
Joshua Boniface 1671a87dd4 Fix the flush service 2020-01-11 17:04:12 -05:00
Joshua Boniface b6474198a4 Implement cluster maintenance mode
Implements a "maintenance mode" for PVC clusters. For now, the only
thing this mode does is disable node fencing while the state is true.
This allows the administrator to tell PVC that network connectivity,
etc. might be interrupted and to avoid fencing nodes.

Closes #70
2020-01-09 10:53:27 -05:00
Joshua Boniface 4e5bce4975 Update copyright header year to 2020 2020-01-08 19:38:02 -05:00
Joshua Boniface c515d63340 Add provision state for VMs 2020-01-08 17:40:02 -05:00
Joshua Boniface 21d87f5e51 Add v6 configurations to dnsmasq
These options were only applied with v4 networks; now, use the v6
address in a dual-stack or v6-only network.
2020-01-06 23:48:04 -05:00
Joshua Boniface f326fd99e2 Properly fix IPv4 no-DHCP networking 2020-01-06 22:31:37 -05:00
Joshua Boniface 38dae8b32f Change name of cluster in patronictl command 2020-01-06 16:37:17 -05:00
Joshua Boniface 2d2bdb879e Use get() instead of direct dict reference 2020-01-06 16:34:39 -05:00
Joshua Boniface 30d4470c8f Only print AXFR errors in debug mode 2020-01-06 16:04:37 -05:00
Joshua Boniface bbfadac5e1 Fix dnsmasq options for DHCP-disabled networks 2020-01-06 16:04:26 -05:00
Joshua Boniface 7b3e267f7a Implement bridge_device for bridged VNIs
Required due to #64. Bridged networks were being created on top of a
vLAN if the Cluster network was a vLAN device, rather than being created
on the underlying device. This came from a previous revision of the
cluster architecture guidelines where Cluster was supposed to be a raw
device rather than a vLAN. This fixed the problem by implementing a
configuration field for a "bridge_device", a NIC device that can then
have the bridged vLANs created on top of it.

Fixes #64
2020-01-06 14:44:56 -05:00
Joshua Boniface 094ac8c3a8 Ensure stdout is used 2020-01-06 12:34:35 -05:00
Joshua Boniface 13548b791d Add additional debugging and fix pool_idx loop var 2020-01-06 11:31:22 -05:00
Joshua Boniface e7bc4f7328 Handle empty None-type hostname 2020-01-05 22:46:56 -05:00
Joshua Boniface be20ba02a7 Handle VM states in flush more accurately
We don't want to block forever on a failure, so limit valid waiting
states to just those we know it should be in during a migration.
2020-01-05 15:21:16 -05:00
Joshua Boniface 7311fa561b Fix bad join with new table name 2020-01-04 15:17:27 -05:00
Joshua Boniface bf89050e8b Update userdata table name 2020-01-04 15:10:37 -05:00
Joshua Boniface 20ae2186f9 Run VM state actions in a thread
Prevents blocking the main thread(s) while a VM is changing state. In
particular, this caused some issues with nodes not responding to
cancellation/reversal of a flush/ready state until the previous
migration was finished, which could cause issues. This entire subset of
actions is now threaded and so can run on its own in the background.
2019-12-26 11:08:16 -05:00
Joshua Boniface b3483fa810 Add explicit returns from flush/ready threads 2019-12-26 11:08:00 -05:00
Joshua Boniface 47cf0a8006 Ensure migration out occurs 2019-12-25 21:11:02 -05:00
Joshua Boniface 77db36a891 Ensure migration out occurs 2019-12-25 21:02:46 -05:00
Joshua Boniface 9a39d739e8 Ensure we empty of flush_thread 2019-12-25 20:29:17 -05:00
Joshua Boniface a66b834ae4 Fix several small bugs 2019-12-19 18:58:53 -05:00
Joshua Boniface b17b7bf22b Add black magic to minimize ping losses
This particular arping interval/count, along with forcing it to run in
the foreground, seems to minimize the packet loss when the primary
coordinator transitions. Through extensive testing, this value results
in the, consistently, least amount of loss: 1-2 pings, at an 0.025s ping
interval, return "TTL exceeded", with no other loss, and only when the
node the test VM is on is the one switching to secondary state. No other
combination of values here, nor tweaks to other parts of the code, seem
able to reduce this further, therefore this is likely the best
configuration possible.
2019-12-19 18:57:32 -05:00
Joshua Boniface 8c252aeecc Implemented coordinated locked node transitions
The previous method was a "throw it in the sea"-type migration with some
(very arbitrary) sleep statements thrown in for good measure.
Reimplement this with some hard locking. During each phase of the
transition, the nodes acquire read/write shared locks to a Zookeeper key
so that they can tightly coordinate the actions of transferring each
part of the primary state between them. This is done in a subthread to
prevent strange blocking issues that were encountered, likely due to
business in the existing main thread.
2019-12-19 10:56:34 -05:00
Joshua Boniface 0841ddf8b0 Handle integrity errors in DNS aggregator 2019-12-19 10:45:06 -05:00
Joshua Boniface 98764f1edd Clean up some aspects of node switchover 2019-12-18 21:39:40 -05:00
Joshua Boniface 23188199cb Handle failing Patroni events more gracefully 2019-12-18 21:12:22 -05:00
Joshua Boniface 2b1b78622e Fix invalid arping option
It made little difference and didn't error, but was incorrect.
2019-12-18 12:06:40 -05:00
Joshua Boniface 364ab10673 Add slight delay when stopping the metadata API 2019-12-18 11:56:04 -05:00
Joshua Boniface 39c9f911cc Increase arping interval to 0.2s 2019-12-15 14:55:34 -05:00
Joshua Boniface 686af31c08 Reduce arping interval to 0.1s 2019-12-15 12:30:45 -05:00
Joshua Boniface 0a94fac407 Fix bugs around passing master
Was not passing properly and getting stuck sometimes, so modify the
checking and route creation a bit to prevent it. Seems to work.
2019-12-15 00:08:18 -05:00
Joshua Boniface b3e21a5bf8 Integrate metadata API into node daemon 2019-12-14 16:41:01 -05:00
Joshua Boniface 8c36e7618a Modify node daemon to follow API 2019-12-14 14:13:26 -05:00
Joshua Boniface 78f053d81f Recreate network in aggregator if DNS changes 2019-12-13 00:03:47 -05:00
Joshua Boniface 0a8dd30a48 Restart dnsmasq when network details change 2019-12-12 23:51:22 -05:00
Joshua Boniface 6fa828e721 Don't stop the provisioner worker
It should probably just be running on all nodes all the time already,
but is started when a node first becomes primary.
2019-12-12 23:08:02 -05:00
Joshua Boniface c1b6ce0ff7 Reorder starting clients 2019-12-12 23:03:34 -05:00
Joshua Boniface b854d53fab Add API management to node daemon 2019-12-12 22:59:07 -05:00
Joshua Boniface 88a181b20d Allow metadata API in nft rules 2019-12-11 17:04:29 -05:00
Joshua Boniface 1fb560e996 Add DNS nameservers to networks 2019-12-08 23:55:45 -05:00
Joshua Boniface 9cb5561e77 Move default NS record to upstream_domain 2019-12-08 23:05:32 -05:00
Joshua Boniface 3471f4e57a Remove obsolete pvc-nsX and add pvc-ns name
Should point towards the floating IP.
2019-12-08 20:20:20 -05:00
Joshua Boniface 356c12db2e Add ceph df output to pool data
Allows additional information visible in the `ceph df` command,
including pool free space and used percentage.
2019-12-06 00:47:27 -05:00
Joshua Boniface 531578fd28 Use consistent tense for VM states
Replace "failed" with "fail" and "disabled" with "disable" for
consistency with the remaining states.
2019-10-23 23:57:59 -04:00
Joshua Boniface 040ca33683 Clean up handling of OSD dump command 2019-10-22 12:51:29 -04:00
Joshua Boniface 190623bdd9 Use empty string for node limit 2019-10-22 12:32:14 -04:00
Joshua Boniface f0e0a38a20 Fix bug in config element retrieval 2019-10-22 12:30:23 -04:00
Joshua Boniface 237a37015d Set upstream IP in key if changed 2019-10-21 16:50:41 -04:00
Joshua Boniface 10ae260b92 Properly handle empty node limit 2019-10-17 13:34:11 -04:00
Joshua Boniface 03447d3374 Update copyright string year to include 2019 2019-10-13 12:09:51 -04:00
Joshua Boniface 116013695f Fix bugs with bad strings 2019-10-12 18:43:29 -04:00
Joshua Boniface 18fc49fc6c Use node instead of hypervisor consistently 2019-10-12 01:59:08 -04:00
Joshua Boniface 8dc0c8f0ac Fix minor bugs 2019-10-12 01:36:50 -04:00
Joshua Boniface 5995353597 Implement VM metadata and use it
Implements the storing of three VM metadata attributes:
1. Node limits - allows specifying a list of hosts on which the VM must
run. This limit influences the migration behaviour of VMs.
2. Per-VM node selectors - allows each VM to have its migration
autoselection method specified, to automatically allow different methods
per VM based on the administrator's preferences.
3. VM autorestart - allows a VM to be automatically restarted from a
stopped state, presumably due to a failure to find a target node (either
due to limits or otherwise) during a flush/fence recovery, on the next
node unflush/ready state of its home hypervisor. Useful mostly in
conjunction with limits to ensure that VMs which were shut down due to
there being no valid migration targets are started back up when their
node becomes ready again.

Includes the full client interaction with these metadata options,
including printing, as well as defining a new function to modify this
metadata. For the CLI it is set/modified either on `vm define` or via the
`vm meta` command. For the API it is set/modified either on a POST to
the `/vm` endpoint (during VM definition) or on POST to the `/vm/<vm>`
endpoint. For the API this replaces the previous reserved word for VM
creation from scratch as this will no longer be implemented in-daemon
(see #22).

Closes #52
2019-10-12 01:17:39 -04:00
Joshua Boniface 76e6b42389 Add clone_volume backend command 2019-10-10 14:09:07 -04:00
Joshua Boniface 983daceaed Fix shutdown abort during restart
Restart state, being different from shutdown, would trigger an abort of
the shutdown. Fix this by including restart in the valid states to
continue.
2019-09-07 12:08:31 -04:00
Joshua Boniface 7c4d18691a Implement configurable replcfg (node-side)
Implements administrator-selectable replication configurations for new
pools in PVC clusters, overriding the default of copies=3,mincopies=2.
2019-08-23 21:58:54 -04:00
Joshua Boniface 267a3d16e5 Bump version to 0.5 2019-08-08 20:56:27 -04:00