Joshua Boniface
ad284b13bc
Fix bugs with fencing
2019-07-09 19:17:53 -04:00
Joshua Boniface
7df200ac44
Improve ZK connection loss handling
2019-07-09 19:17:32 -04:00
Joshua Boniface
47f86475f8
Handle failures of Ceph commands gradefully
...
If these commands fail, catch the error, print a message, and set up
empty lists. Also handle later data parsing in this case.
2019-07-09 16:43:38 -04:00
Joshua Boniface
1a8e7509f7
Support run_os_command timeout; use timeouts
2019-07-09 15:09:13 -04:00
Joshua Boniface
83a4140703
Allow enabling debug mode in config
...
Makes debugging easier without modifying code.
2019-07-09 14:59:00 -04:00
Joshua Boniface
8eeba9bc9b
Make Ceph commands time out if needed
2019-07-09 14:35:53 -04:00
Joshua Boniface
19701c66e4
Move fencing to after keepalive output
...
Just makes the messages a little easier to read when triggered.
2019-07-09 14:24:31 -04:00
Joshua Boniface
17dfaf43c5
Move hypervisor selection out to common
2019-07-09 14:20:58 -04:00
Joshua Boniface
b551b54642
Rename message when contending
2019-07-09 14:03:48 -04:00
Joshua Boniface
4249d5d982
Always load and store IPMI on daemon start
...
Without this, the IPMI information set during initial node creation can
never be changed, which can cause issues later. Instead, always set it
fresh on each node boot.
2019-07-09 14:00:31 -04:00
Joshua Boniface
7f828a27a5
Free RBD locks when fencing node
2019-07-09 10:59:31 -04:00
Joshua Boniface
bc54ea2449
Log message when starting or stopping API client
2019-07-08 19:29:49 -04:00
Joshua Boniface
cda690e94f
Set RADOS df information in ZK
2019-07-08 10:19:56 -04:00
Joshua Boniface
d9ebd04264
Fix missing dom_uuid values in data reads
2019-07-07 15:30:28 -04:00
Joshua Boniface
b82ccaa84d
Improve flush handling
...
Similar to recent client changes, don't replace the previous node record
of an already-migrated VM. Wait for shutdown if required. Use a
continue statement instead of a needless else block.
2019-07-07 15:27:37 -04:00
Joshua Boniface
0d398f663b
Rename "Domain" to "VM" in various class names
...
The name "Domain", though technically correct from a Libvirt
perspective, was unnecessarily confusing. Call the class instances what
they are, VMs.
2019-07-07 15:20:37 -04:00
Joshua Boniface
8216125b02
Enable autostart of API client on Primary
...
Adds a config flag that turns on the API client following the Primary
coordinator. The retcode of the start/stop commands is ignore so this
can fail gracefully if e.g. the client isn't installed.
2019-07-06 02:42:56 -04:00
Joshua Boniface
3e591bd09e
Remove extra whitespaces on blank lines
2019-06-25 22:33:23 -04:00
Joshua Boniface
08cb16bfbc
Revamp VM migration handling
...
This was very old code that was hard to follow and quite fragile, with
failures and infinite loops occurring fairly frequently. These changes
make the code more robust, including the addition of timeouts, some code
cleanup, and some improvements to the logical flow.
Also forces the libvirt migration to occur on the cluster network, which
couples to changes in the libvirtd listen (via pvc-ansible) and in
Daemon.py via the previous commit.
2019-06-25 22:23:48 -04:00
Joshua Boniface
d336fce253
Connect to actual IP not localhost for Libvirt
2019-06-25 22:09:32 -04:00
Joshua Boniface
75d0e7f989
Revert "Only perform fencing duties on primary"
...
This reverts commit 464c69aac6
.
Actually, yea, this made sense - if the primary fails, it can't
fence itself.
2019-06-25 12:36:48 -04:00
Joshua Boniface
85a5a8a0c9
Disable tx offloading on bridge interfaces
...
Reference: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=717215#68
Without this, DHCP fails when traversing only the local bridge, for
Debian Jessie or earlier (and possibly other OSes as well), due to the
missing UDP checksums. This disables the offload and hence reenables
the checksums even on the software-only bridge.
Also rearranged the steps and added comments arround this section to
better clarify what each command is doing.
2019-06-25 12:36:37 -04:00
Joshua Boniface
464c69aac6
Only perform fencing duties on primary
...
There was really no need for this to be shared among all the
coordinators, which seemed more fragile. This way only the primary will
try to fence dead nodes.
2019-06-24 20:17:51 -04:00
Joshua Boniface
249611b161
Remove duplicate import
2019-06-24 20:14:43 -04:00
Joshua Boniface
ef272b0b7d
Add removal confirmations and zap disk before add
2019-06-21 15:52:28 -04:00
Joshua Boniface
867ad1fc1b
Support human-readable biconversion and in volumes
2019-06-21 09:23:52 -04:00
Joshua Boniface
ddedb1a992
Set image features to supported values
2019-06-19 15:19:36 -04:00
Joshua Boniface
0f15e7cda5
Set shutdown state after final keepalive
2019-06-19 14:52:47 -04:00
Joshua Boniface
0060c0313b
Put daemonstate to shutdown when stopping
...
This way it isn't "run" all the way until it shuts down.
2019-06-19 14:23:07 -04:00
Joshua Boniface
9a0554fdbe
Remove all volumes from pool on removal
...
Technically not needed, but otherwise random errors may be thrown,
so best to be explicit.
2019-06-19 12:49:03 -04:00
Joshua Boniface
87907d4ce8
Remove size field from volume objects
...
This data is just in the stats anyways.
2019-06-19 10:45:14 -04:00
Joshua Boniface
09562fdc06
Output in json format instead
2019-06-19 10:32:01 -04:00
Joshua Boniface
a940d03959
Fix some bugs and add RBD volume stats
2019-06-19 10:25:22 -04:00
Joshua Boniface
db0b382b3d
Don't bother with snapshot management by Daemon
...
This is *definitely* not needed in the end, and just uses RAM for
no conceivable purpose. Snapshots are fully client-managed.
2019-06-19 09:43:04 -04:00
Joshua Boniface
1c9f606480
Implement volume and snapshot handling by daemon
...
This seems like a super-gross way to do this, but at the moment
I don't have a better way. Maybe just remove this component since
none of the volume/snapshot stuff is dynamic; will see as this
progresses.
2019-06-19 09:40:32 -04:00
Joshua Boniface
784b428ed0
Add creation of volume and snapshot lists
2019-06-19 09:29:36 -04:00
Joshua Boniface
064e6455bc
Correct some more bugs
2019-06-19 00:29:21 -04:00
Joshua Boniface
a4ab3075ab
Correct some bugs around new code
2019-06-19 00:23:25 -04:00
Joshua Boniface
01959cb9e3
Implementation of RBD volumes and snapshots
...
Adds the ability to manage RBD volumes (add/remove) and RBD
snapshots (add/remove). (Working) list functions to come.
2019-06-19 00:12:44 -04:00
Joshua Boniface
2bbbda3da5
Only trigger pool updates on primary
2019-06-18 21:26:05 -04:00
Joshua Boniface
612f5ab52c
Strip pv_block from stdout
2019-06-18 20:34:25 -04:00
Joshua Boniface
1622226c32
Add more logging during OSD creation/deletion
2019-06-18 20:31:04 -04:00
Joshua Boniface
3adeef6fdd
Use the fsid to activate new OSDs
2019-06-18 20:22:28 -04:00
Joshua Boniface
443108f53d
Add support for enable/disable keepalive detail
2019-06-18 19:54:42 -04:00
Joshua Boniface
79f284a0a9
Pass logger into run_command
2019-06-18 13:45:59 -04:00
Joshua Boniface
080ca3201c
Correct actual problem with this_node
2019-06-18 13:43:54 -04:00
Joshua Boniface
d076f9f4eb
Use self.this_node everywhere
2019-06-18 13:25:16 -04:00
Joshua Boniface
aee078f3eb
Support disabling keepalive logging
2019-06-18 12:44:07 -04:00
Joshua Boniface
b0411e8e1a
Remove "error" message from Ceph commands
...
This triggeres at every node start and isn't useful.
2019-06-18 12:41:38 -04:00
Joshua Boniface
8d9007f697
Remove OSD stat collection if count is zero
...
Otherwise, ceph osd df will hang indefinitely trying to get data
for the zero OSDs.
2019-06-18 12:36:53 -04:00