Joshua Boniface
ef1701b4c8
Handle an additional exception case
2021-06-14 17:15:40 -04:00
Joshua Boniface
08dc756549
Actually disable the pvcapid service
...
Prevents it from trying to start itself during updates or reboots on
non-primary coordinators.
2021-06-14 17:13:22 -04:00
Joshua Boniface
0a9c0c1ccb
Use a nicer reload method on hot schema update
...
Instead of exiting and trusting systemd to restart us, instead leverage
the os.execv() call to reload the process in the current PID context.
Also improves the log messages so it's very clear what's going on.
2021-06-14 17:10:21 -04:00
Joshua Boniface
e34a7d4d2a
Handle hot reloads properly
...
A hot reload isn't possible due to DataWatch and ChildrenWatch
constructs, so we instead need to terminate the daemon to "apply" the
schema update. Thus we use exit code 150 (Application defined in LSB)
and reorder some of the elements of the schema validation to ensure
things happen in the right order.
2021-06-14 12:52:43 -04:00
Joshua Boniface
1f49bfa1b2
Fix name of schema element
2021-06-13 20:56:17 -04:00
Joshua Boniface
647bce2a22
Ensure we don't grab None data
2021-06-13 16:43:25 -04:00
Joshua Boniface
26b1f531e9
Fix bad variable interpolation
2021-06-13 14:37:23 -04:00
Joshua Boniface
be9f1e8636
Use more compatible is_alive in thread
2021-06-13 14:36:27 -04:00
Joshua Boniface
b694945010
Fix incorrect name bug
2021-06-10 01:11:14 -04:00
Joshua Boniface
058c2ceef3
Convert VXNetworkInstance to new ZK schema handler
2021-06-10 00:36:18 -04:00
Joshua Boniface
e7d60260a0
Fix typo in CephInstance path
2021-06-10 00:36:02 -04:00
Joshua Boniface
85aba7cc18
Convert VMInstance to new ZK schema handler
2021-06-09 23:15:08 -04:00
Joshua Boniface
7e42118e6f
Adjust lock schema in NodeInstance and VMInstance
...
Removes a superfluous lock and puts the sync_lock keys in more usable
places.
2021-06-09 22:51:00 -04:00
Joshua Boniface
2704badfbe
Convert VMConsole... to new ZK schema handler
2021-06-09 22:08:32 -04:00
Joshua Boniface
450bf6b153
Convert NodeInstance to new ZK schema handler
2021-06-09 22:07:32 -04:00
Joshua Boniface
b94fe88405
Convert fencing to new ZK schema handler
2021-06-09 21:29:01 -04:00
Joshua Boniface
610f6e8f2c
Convert CephInstance to new ZK schema handler
2021-06-09 21:17:09 -04:00
Joshua Boniface
f913f42a6d
Replace schema paths with updated zkhandler
2021-06-09 20:29:42 -04:00
Joshua Boniface
e475552391
Fix some bugs with hot reload
2021-06-09 00:03:26 -04:00
Joshua Boniface
5540bdc86b
Add automatic schema upgrade to nodes
...
Performs an automatic schema upgrade when all nodes are updated to the
latest version.
Addresses #129
2021-06-08 23:35:39 -04:00
Joshua Boniface
3c102b3769
Add per-node schema tracking
...
This will allow nodes to start with their own schema versions, and then
be updated simultaneously by the API.
References #129
2021-06-08 23:35:39 -04:00
Joshua Boniface
a4aaf89681
Add ZKSchema loading and validation to Daemon
...
Also removes some previous hack migrations from pre-0.9.19.
Addresses #129
2021-06-08 23:35:39 -04:00
Joshua Boniface
5843d8aff4
Fix fence call to findTargetNode
2021-06-08 23:34:49 -04:00
Joshua Boniface
cf96bb009f
Bump version to 0.9.19
2021-06-06 01:47:41 -04:00
Joshua Boniface
719954b70b
Fix missing list comma
2021-06-06 01:39:43 -04:00
Joshua Boniface
7dea5d2fac
Move logger to common, fix buffering
2021-06-01 18:50:26 -04:00
Joshua Boniface
3a5226b893
Add missing flushed output
2021-06-01 18:30:18 -04:00
Joshua Boniface
de2ff2e01b
Fix removed function args
2021-06-01 17:02:36 -04:00
Joshua Boniface
cd75413667
Increase initial lock timer
...
With the new library the reader seems to be a little too quick, so hold
the write lock for 1 second instead of 1/2 second to ensure it is
caught.
2021-06-01 17:00:11 -04:00
Joshua Boniface
9764090d6d
Merge node common with daemon common
2021-06-01 12:22:11 -04:00
Joshua Boniface
12ac3686de
Convert missed elements to new zkhandler
2021-06-01 11:57:21 -04:00
Joshua Boniface
5740d0f2d5
Remove obsolete zkhandler.py
2021-06-01 11:55:44 -04:00
Joshua Boniface
889f4cdf47
Convert common to new zkhandler
2021-06-01 11:55:32 -04:00
Joshua Boniface
8f66a8d00e
Fix missed zkhandler conversion
2021-06-01 11:53:33 -04:00
Joshua Boniface
6beea0693c
Convert fencing to new zkhandler
2021-06-01 11:53:21 -04:00
Joshua Boniface
1c9a7a6479
Convert VXNetworkInstance to new zkhandler
2021-06-01 11:49:39 -04:00
Joshua Boniface
790098f181
Convert VMInstance to new zkhandler
2021-06-01 11:46:27 -04:00
Joshua Boniface
8a4a41e092
Convert NodeInstance to new zkhandler
2021-06-01 11:27:35 -04:00
Joshua Boniface
a48bf2d71e
More gracefully handle none selectors
...
Allow selection of "none" as the node selector, and handle this by
always using the cluster default instead of writing it in.
2021-06-01 11:13:13 -04:00
Joshua Boniface
a0b9087167
Set Daemon migration selector in zookeeper
2021-06-01 10:52:41 -04:00
Joshua Boniface
33a54cf7f2
Move configuration keys to /config tree
2021-06-01 10:48:55 -04:00
Joshua Boniface
d6a8cf9780
Convert MetadataAPIInstance to new zkhandler
2021-05-31 19:55:09 -04:00
Joshua Boniface
abd619a3c1
Convert DNSAggregatorInstance to new zkhandler
2021-05-31 19:55:01 -04:00
Joshua Boniface
ef5fe78125
Convert CepnInstance to new zkhandler
2021-05-31 19:51:27 -04:00
Joshua Boniface
f6d0e89568
Properly add absent node type
2021-05-31 19:26:27 -04:00
Joshua Boniface
ede3e88cd7
Modify node daemon root to use updated zkhandler
2021-05-31 03:14:09 -04:00
Joshua Boniface
0c75a127b2
Bump version to 0.9.18
2021-05-23 17:23:10 -04:00
Joshua Boniface
9de14c46fb
Bump version to 0.9.17
2021-05-19 17:06:29 -04:00
Joshua Boniface
fe15bdb854
Bump version to 0.9.16
2021-05-10 01:13:21 -04:00
Joshua Boniface
b851a6209c
Catch all other exceptions in subprocess run
...
Found a rare glitch where the subprocess pipes would not engage, causing
a daemon crash. Catch these exceptions with a retcode of 255 instead of
bailing out.
Closes #124
2021-05-10 01:07:25 -04:00
Joshua Boniface
5ceb57e540
Handle emptying corrupted console log files
...
Libvirt will someones write junk out to console log files, which breaks
the log parser deque with a UnicodeDecodeError.
If this happens, clear the log and re-open the deque again for newer
updates.
Closes #123
2021-05-10 01:03:04 -04:00
Joshua Boniface
669338c22b
Bump version to 0.9.15
2021-04-08 13:37:47 -04:00
Joshua Boniface
c4ac75b973
Bump version to 0.9.14
2021-03-30 10:27:37 -04:00
Joshua Boniface
0bf276fd51
Update copyright year in headers
2021-03-25 17:01:55 -04:00
Joshua Boniface
f4ec161aa2
Update file copyright header.
...
Remove the option to select a later version of the GPL.
2021-03-25 16:58:02 -04:00
Joshua Boniface
0ccfc41398
Bump version to 0.9.13
2021-02-17 11:37:59 -05:00
Joshua Boniface
9100c63e99
Add stored_bytes to pool stats information
2021-02-09 01:46:01 -05:00
Joshua Boniface
aba567d6c9
Add nice startup banners to both daemons
...
Add nicer easy-to-find (yay ASCII art) banners for the startup printouts
of both the node and API daemons. Also adds the safe loader to pvcnoded
to prevent hassle messages and a version string in the API daemon file.
2021-02-08 02:51:43 -05:00
Joshua Boniface
0db8fd9da6
Bump version to 0.9.12
2021-01-28 16:29:58 -05:00
Joshua Boniface
9fbe35fd24
Bump version to 0.9.11
2021-01-05 15:58:26 -05:00
Joshua Boniface
a24724d9f0
Use external ceph cmd for ceph df
2020-12-26 14:04:21 -05:00
Joshua Boniface
78c017d51d
Remove erroneous extra colon in log output
2020-12-20 16:06:35 -05:00
Joshua Boniface
1b6613c280
Add live VNC information to domain output
...
Sets in the node daemon, returns via the API, and shows in the CLI,
information about the live VNC listen address and port for VNC-enabled
VMs.
Closes #115
2020-12-20 16:00:55 -05:00
Joshua Boniface
d6ef722997
Fix bad log message
2020-12-15 10:51:52 -05:00
Joshua Boniface
518d699c15
Bump version to 0.9.10
2020-12-15 10:45:15 -05:00
Joshua Boniface
ac3ef3d792
Revamp fencing order
...
Prevents unnecessarily excessive timeouts if IPMI connections time out;
before, would have to go through 3 timed out commands at ~20s each
before failure was registered; reduced to 1 if the first times out.
2020-12-15 02:48:25 -05:00
Joshua Boniface
3705daff43
Better handle failing RBD lock frees
...
If the VM is not in a stop state, failing to free the lock is now
considered a fatal error and will put the domain into fail state,
aborting the start. This is better than being unsafe or trying to start
a VM which will fail to boot due to read-only volumes.
2020-12-14 16:04:38 -05:00
Joshua Boniface
7c99a7bda7
Safely reset RBD locks on failed VMs
...
Should correct issues on cold start as well as if a VM crashes
uncleanly, which would prevent the VM from starting due to stale RBD
locks.
This implementation has four parts:
1. Update how IP addresses are handled, specifically by replacing all
previous instances of "vni_ipaddr" with "vni_floatingipaddr", and then
adding the "vni_ipaddr" with the real data for this node's IPs. Also
include the storage IPs in this where they weren't before, so each
this_node actually has the local IPs plus floating IPs. This enables
the next two steps.
2. Modify flush_locks to take this_node as an argument, and update the
run_command function to only operate against this node, rather than on
the primary coordinator.
3. Have the flush_locks check each lock against the current node, to
verify that the lock is actually held by the current node. This is the
only way to do this safely. During fencing, we override this by not
passing a this_node which bypasses this check.
4. Have the VM start do the check for VM failure/startup and execute a
flush_locks before actually starting the VM.
2020-12-14 15:53:18 -05:00
Joshua Boniface
89c7e225a0
Move OSD stats uploading to primary only
...
Instead of each node uploading its own OSD stats, which would not work
if the PVC daemon wasn't running, instead have the primary upload stats
for all OSDs in the cluster.
2020-12-09 02:46:09 -05:00
Joshua Boniface
b36ec43a2d
Bump version to 0.9.9
2020-12-09 02:20:20 -05:00
Joshua Boniface
ce5ee11841
Bump version to 0.9.8
2020-11-24 12:26:57 -05:00
Joshua Boniface
d4a28d7a58
Bump version to 0.9.7
2020-11-19 10:48:28 -05:00
Joshua Boniface
e69eb93cb3
Bump version to 0.9.6
2020-11-17 13:01:54 -05:00
Joshua Boniface
70dfcd434f
Ensure inmigrate is cleared on failure
2020-11-17 12:57:37 -05:00
Joshua Boniface
a4e5323e81
Bump version to 0.9.5
2020-11-17 12:34:04 -05:00
Joshua Boniface
9053edacd8
Bump version to 0.9.4
2020-11-10 15:33:50 -05:00
Joshua Boniface
baac8f24fd
Bump version to 0.9.3
2020-11-09 10:28:15 -05:00
Joshua Boniface
11702f4bc8
Bump version to 0.9.2
2020-11-08 02:03:29 -05:00
Joshua Boniface
6f66b77a00
Lint: E121/E126 continuation line under/over-indented for hanging indent
2020-11-07 15:06:21 -05:00
Joshua Boniface
9135c5e3e4
Lint: E241 multiple spaces after ','
2020-11-07 14:52:39 -05:00
Joshua Boniface
260b39ebf2
Lint: E302 expected 2 blank lines, found X
2020-11-07 14:45:24 -05:00
Joshua Boniface
ab0b932fe3
Lint: E125 continuation line with same indent as next logical line
2020-11-07 13:49:54 -05:00
Joshua Boniface
f5988ad53d
Lint: F821 undefined name 'pool'/'volume'
...
This class is actually entirely unused but is kept for consistency with
the others. It may be used someday for something.
2020-11-07 13:34:18 -05:00
Joshua Boniface
c3dfe2e381
Lint: F821 undefined name 'myshorthostname'
2020-11-07 13:31:19 -05:00
Joshua Boniface
961ebb4c01
Lint: E305 expected 2 blank lines after class or function definition, found X
2020-11-07 13:17:49 -05:00
Joshua Boniface
e553c5d42a
Lint: E122 continuation line missing indentation or outdented
2020-11-07 13:12:26 -05:00
Joshua Boniface
7932be3948
Lint: E261 at least two spaces before inline comment
2020-11-07 13:11:03 -05:00
Joshua Boniface
d2490419c5
Lint: E202 whitespace before ']'
2020-11-07 13:02:54 -05:00
Joshua Boniface
d2e5ede399
Lint: E202 whitespace before ')'
2020-11-07 12:58:54 -05:00
Joshua Boniface
3f242cd437
Lint: E202 whitespace before '}'
2020-11-07 12:57:42 -05:00
Joshua Boniface
b7daa8e1f6
E201 whitespace after '['
2020-11-07 12:39:59 -05:00
Joshua Boniface
c88965e898
Lint: E201 whitespace after '('
2020-11-07 12:39:27 -05:00
Joshua Boniface
e333f2b935
Lint: E201 whitespace after '{'
2020-11-07 12:38:31 -05:00
Joshua Boniface
3cb92fed75
Lint: E401 multiple imports on one line
2020-11-07 12:29:32 -05:00
Joshua Boniface
27c6ac2b66
Lint: W605 invalid escape sequence '\d'
...
This is the only one where forcing an `r` type to the string was
required; the remainder of W605 were replaced with character class
enclosures.
2020-11-07 12:22:20 -05:00
Joshua Boniface
8ba267a59e
Lint: E211 whitespace before '['/'('
2020-11-07 12:20:01 -05:00
Joshua Boniface
39cc992e9b
Lint: E306 expected 1 blank line before a nested definition, found 0
2020-11-07 12:17:38 -05:00
Joshua Boniface
8c623023d5
Lint: F811 redefinition of unused '<function>'
2020-11-07 12:14:29 -05:00
Joshua Boniface
5b3ee363b2
Lint: E222 multiple spaces after operator
2020-11-07 12:10:24 -05:00
Joshua Boniface
fad27a7f4d
Lint: E131 continuation line unaligned for hanging indent
2020-11-06 22:29:49 -05:00
Joshua Boniface
2eef6a1c21
Lint: E265 block comment should start with '# '
2020-11-06 21:32:17 -05:00
Joshua Boniface
4b47a2424c
Lint: E303 too many blank lines (2)
2020-11-06 21:16:52 -05:00
Joshua Boniface
cb2defbde9
Lint: W391 blank line at end of file
2020-11-06 21:14:19 -05:00
Joshua Boniface
5da314902f
Lint: F841 local variable '<variable>' is assigned to but never used
2020-11-06 21:13:13 -05:00
Joshua Boniface
98a573bbc7
Lint: E402 module level import not at top of file
2020-11-06 20:40:32 -05:00
Joshua Boniface
aecb845d6a
Lint: E713 test for membership should be 'not in'
2020-11-06 20:37:52 -05:00
Joshua Boniface
fde8ea2fea
Lint: W291 trailing whitespace
2020-11-06 19:44:14 -05:00
Joshua Boniface
57c51d3234
Lint: E711 comparison to None should be 'if cond is not None:'
2020-11-06 19:37:13 -05:00
Joshua Boniface
ce01b41d81
Lint: E711 comparison to None should be 'if cond is None:'
2020-11-06 19:36:36 -05:00
Joshua Boniface
4d6f36aca0
Lint: E712 comparison to False should be 'if cond is False:' or 'if not cond:'
2020-11-06 19:35:51 -05:00
Joshua Boniface
fb4aafcea9
Lint: E111 indentation is not a multiple of four
2020-11-06 19:26:22 -05:00
Joshua Boniface
d9e7b7ec15
Lint: F401 <library> imported but unused
2020-11-06 19:22:49 -05:00
Joshua Boniface
ebf254f62d
Lint: W293 blank line contains whitespace
2020-11-06 19:11:07 -05:00
Joshua Boniface
63f4f9aed7
Lint: E722 do not use bare 'except'
2020-11-06 18:55:10 -05:00
Joshua Boniface
56ba7b1457
Bump version to 0.9.1
2020-10-29 12:16:38 -04:00
Joshua Boniface
ec0b8acf90
Support per-VM migration type selectors
...
Allow a VM to specify its migration type as a default choice. The valid
options are "default" (i.e. behave as now), "live" which forces a live
migration only, and "shutdown" which forces a shutdown migration only.
The new option is treated as a VM meta option and is set to default if
not found.
2020-10-29 12:01:29 -04:00
Joshua Boniface
5d08ad9573
Fix incorrect keepalive interval setting
2020-10-26 11:44:45 -04:00
Joshua Boniface
0f299777f1
Modify version to 3-digit numbering
...
I expect 0.9 will be fairly long-lived, so add another decimal place so
I may continue adding tweaks to it.
THIS IS NOT SEMVER.
2020-10-26 02:13:11 -04:00
Joshua Boniface
890023cbfc
Make sender wait dynamic based on receiver
2020-10-21 14:43:54 -04:00
Joshua Boniface
28abb018e3
Improve some timeouts and conditionals
2020-10-21 12:00:10 -04:00
Joshua Boniface
017953c2e6
Move lock release to phase D
2020-10-21 11:07:01 -04:00
Joshua Boniface
82b4d3ed1b
Add missing prefix statements to loggers
2020-10-21 10:52:53 -04:00
Joshua Boniface
bae366a316
Add waits and only receive check on send
2020-10-21 10:43:42 -04:00
Joshua Boniface
351076c15e
Check if node changed during final check
...
Avoids situations where two migrates, to different nodes, happen in
rapid succession. Aborts the migration if the current target node no
longer matches what was set at the start of the execution.
2020-10-21 02:52:36 -04:00
Joshua Boniface
42514b9a50
Improve messages further
2020-10-21 02:41:42 -04:00
Joshua Boniface
611e47f338
Add messages to migration aborts
...
Results in some information duplication, but ensures logging of the
reason a migration was aborted separate from the error(s) this may
generate.
2020-10-21 02:38:42 -04:00
Joshua Boniface
1523959074
Move where setting last_ vars happens
2020-10-21 02:24:00 -04:00
Joshua Boniface
ef762359f4
Adjust timing to avoid migrating to self quickly
...
Add another separate state lock, release it earlier, and ensure timings
are good to avoid double-migrating one VM.
2020-10-21 02:17:55 -04:00
Joshua Boniface
398d33778f
Avoid stopping duplicates, just lock our own key
2020-10-20 16:10:39 -04:00
Joshua Boniface
a6d492ed9f
Remove spurious writes and adjust sleep
2020-10-20 16:04:26 -04:00
Joshua Boniface
11fa3b0df3
Remove additional wait and add last_node entries
...
These allow for aborting a migration to retain the previous settings and
override what the client set.
2020-10-20 15:58:55 -04:00
Joshua Boniface
442aa4e420
Tweak timers further
2020-10-20 15:43:59 -04:00
Joshua Boniface
3910843660
Add missing break
2020-10-20 15:39:29 -04:00
Joshua Boniface
70f3fdbfb9
Tweak the delays slightly on receive
2020-10-20 15:38:07 -04:00
Joshua Boniface
7cb0241a12
Attempt live migrates 3 times before proceeding
2020-10-20 15:33:41 -04:00
Joshua Boniface
9fb33ed7a7
Increase peer lock acquiring timers
2020-10-20 15:26:59 -04:00
Joshua Boniface
abfe0108ab
Better handle aborting migrations
2020-10-20 15:22:16 -04:00
Joshua Boniface
567fe8f36b
Wait for existing migrations before proceeding
2020-10-20 15:12:32 -04:00
Joshua Boniface
ec7b78b9b8
Add additional short sleep in receive
2020-10-20 13:29:17 -04:00
Joshua Boniface
224c8082ef
Alter text of synchronization messages
2020-10-20 13:08:18 -04:00
Joshua Boniface
f9e7e9884f
Improve handling of VM migrations
...
The VM migration code was very old, very spaghettified, and prone to
strange failures.
Improve this by taking cues from the node primary migration. Use
synchronization between the nodes to ensure lockstep completion of the
migration in discrete steps.
A proper queue can be built later to integrate with this code more
cleanly.
References #108
2020-10-20 13:01:55 -04:00
Joshua Boniface
726501f4d4
Add additional logging to flush selector
...
Adds additional debug logging to the flush selector to determine how any
why any given node is selected. Useful for troubleshooting strange
choices.
2020-10-20 12:34:18 -04:00
Joshua Boniface
c6e34c7dc6
Bump base version to 0.9
2020-10-18 14:31:19 -04:00
Joshua Boniface
f749633f7c
Use provisioned memory for mem migration selector
...
Use the new "provisioned" memory field, instead of the "allocated"
memory field, to determine the optimal node when using the "mem"
migration selector. This will take into account non-running VMs in the
calculation as well as running VMs.
2020-10-18 14:17:15 -04:00
Joshua Boniface
a4b80be5ed
Add provisioned memory to node info
...
Adds a separate field to the node memory, "provisioned", which totals
the amount of memory provisioned to all VMs on the node, regardless of
state, and in contrast to "allocated" which only counts running VMs.
Allows for the detection of potential overprovisioned states when
factoring in non-running VMs.
Includes the supporting code to get this data, since the original
implementation of VM memory selection was dependent on the VM being
running and getting this from libvirt. Now, if the VM is not active, it
gets this from the domain XML instead.
2020-10-18 14:17:15 -04:00
Joshua Boniface
aa5f8c93fd
Entirely disable IPv6 on bridged interfaces
...
Prevents any potential leakage due to autoconfigured IPv6 on bridged
interfaces. These are exclusively VM-side bridges, and the PVC host
should not have any IPv6 configuration on them, ever.
2020-10-15 11:00:59 -04:00
Joshua Boniface
9366977fe6
Copy d_domain before iterating
...
Prevents a bug where the thread can crash due to a change in the
d_domain object while running the for loop. By copying and iterating
over the copy, this becomes safer.
2020-09-16 15:12:37 -04:00
Joshua Boniface
65b44f2955
Avoid breaking keepalive during incoming migration
...
The keepalive was getting stuck gathering memoryStats from the
non-running VM, since it was in a paused state. Avoid this by just
skipping past the rest of the stats gathering if the VM isn't running.
2020-08-28 01:47:36 -04:00
Joshua Boniface
78dec77987
Bump version to 0.8
2020-08-26 10:24:44 -04:00
Joshua Boniface
921e57ca78
Fix syntax error
2020-08-20 23:05:56 -04:00