Joshua Boniface
0ff2d7d537
Use shlex for command splitting
...
This will preserve quoted strings, required for the rbd lock commands.
2019-08-07 14:02:57 -04:00
Joshua Boniface
a2a630f6a0
Add pipeline for VM lock flush cmd
2019-08-07 13:49:33 -04:00
Joshua Boniface
496216321e
Move lock flushing to VMInstance
...
Prepares for reuse of this function via client commands.
2019-08-07 13:36:56 -04:00
Joshua Boniface
0446b2db02
Catch exceptions if Patroni is not up
2019-08-07 11:46:58 -04:00
Joshua Boniface
7e77752ce5
Add limit to Patroni switchover attempts
2019-08-07 11:46:42 -04:00
Joshua Boniface
33a963c2af
Improve fence output on failure and increase delay
2019-08-07 11:35:49 -04:00
Joshua Boniface
e92a57606d
Use better forceful arping command
...
Send ARP responses with the source IP in it to force update even if the
old primary did not cleanly terminate (during fencing for instance).
2019-08-07 11:29:38 -04:00
Joshua Boniface
ef3b6b3723
Arping 3 times instead of 2
...
During fence 2 is not always enough for the network to recognize the
change in primary coordinator.
2019-08-07 11:15:36 -04:00
Joshua Boniface
3b27a88128
Allow abort of shutdown state
...
Adds some logic to allow an active shutdown state to be aborted by
changing the VM to another state. Useful mostly if a VM is doing funky
things and not responding to the shutdown, but the administrator either
doesn't want to wait for the timer to expire (forcing an immediate
termination) or wishes to abort the shutdown attempt.
Fixes #49
2019-08-07 10:58:18 -04:00
Joshua Boniface
e2ae58b62c
Add the missing newline to the string compare
2019-08-04 17:00:33 -04:00
Joshua Boniface
d0d5ab4425
Fix bug if the switchover target is the same
2019-08-04 16:51:11 -04:00
Joshua Boniface
a329376d33
Lock primary_node key during primary switchover
...
Also implements a looping to switch over the Patroni leader to ensure
this always follows the primary and clean up the code around here a bit.
2019-08-04 16:42:06 -04:00
Joshua Boniface
710d2cf9c2
Fix record duplication bug and general cleanup
...
Fixes #47
2019-08-01 13:11:45 -04:00
Joshua Boniface
8bdec03cf1
Properly support debug logging via config
2019-08-01 11:22:27 -04:00
Joshua Boniface
c6e58796ba
Clean up redundant return section
2019-07-31 23:57:31 -04:00
Joshua Boniface
7380f45b1b
Improve dnsmasq interface handling
...
listen-address is enough; adding interface too causes weird issues where
dnsmasq is listening on an IPv6 global wildcard too which conflicts with
the PowerDNS instance.
2019-07-31 10:03:56 -04:00
Joshua Boniface
324990739e
Make DNS aggregator listen on port 53
...
Using the non-standard port was a pain. Now that all the DNSMasq stuff
works, move back to the default port.
2019-07-30 09:20:01 -04:00
Joshua Boniface
717d00cfcf
Implement snapshot rename in node daemon
...
[4/2] Implements #44
2019-07-28 23:06:12 -04:00
Joshua Boniface
83b806d0b5
Move intervals config one level up
...
Makes for a slightly-better-organized configuration and explanation.
2019-07-28 19:33:23 -04:00
Joshua Boniface
68ca493b3b
Fix bad error code
2019-07-26 20:53:01 -04:00
Joshua Boniface
837666a15e
Revamp renamekey function
...
The function had numerous bugs and didn't work. Fix them up.
2019-07-26 16:38:05 -04:00
Joshua Boniface
35363671a0
Implement Ceph volume resize and rename
...
Includes a simple implementation of a zookeeper "rename" facility,
allowing a key and all data to be replaced by a new key with a different
name but containing all the same child elements and data.
[2/2] Implements #44
2019-07-26 15:13:21 -04:00
Joshua Boniface
50367c9190
Improve OSD create messages
2019-07-26 11:41:51 -04:00
Joshua Boniface
96bc181877
Set the routerstate on daemon startup
...
Allows switching from coordinator to not coordinator with a service
restart.
2019-07-12 09:51:56 -04:00
Joshua Boniface
2a220cd16e
Nicer colour output for coordinator state client
2019-07-12 09:31:42 -04:00
Joshua Boniface
439c5f18c3
Add router_state to output of keepalives
2019-07-11 20:11:05 -04:00
Joshua Boniface
f30be555c1
Improve message output for logging
...
Improve some formatting of the messages being printed to make it nicer
for long-term logging.
2019-07-10 22:38:32 -04:00
Joshua Boniface
ac36870a86
Implement hup for log rotation
...
This function was long-existent, but never used; implement it.
2019-07-10 22:22:02 -04:00
Joshua Boniface
58f4222ee7
Support disabling log colours and dates
...
For usecases such as a pure-syslog, allow disabling of dates or colours
in the log messages (separately).
2019-07-10 22:17:23 -04:00
Joshua Boniface
32a6369de2
Add nicer message when live migrate fails
2019-07-10 17:42:24 -04:00
Joshua Boniface
8a28738bff
Use consistent terminology in fence message
2019-07-10 11:54:56 -04:00
Joshua Boniface
8f160abf90
Handle cancelling flushes when new ones run
...
Store the flush_thread of a node as a class object. Before starting a
new flush thread (either flush or unflush), stop the existing one if it
exists to prevent further migrations, then start the new thread. Set the
object to None on init and again once the task actually finishes. Remove
the inflush flag as this is not required when using these threads and
functionally does nothing any longer, but add the flush_stopper flag to
trigger cancellation of the current job.
2019-07-10 11:54:34 -04:00
Joshua Boniface
c7c8c8bcbb
Fix bug with flush
2019-07-10 00:43:55 -04:00
Joshua Boniface
7a8aee9fe7
Remove flush locking functionality
...
This just seemed like more trouble that it was worth. Flush locks were
originally intended as a way to counteract the weird issues around
flushing that were mostly fixed by the code refactoring, so this will
help test if those issues are truly gone. If not, will look into a
cleaner solution that doesn't result in unchangeable states.
2019-07-09 23:59:17 -04:00
Joshua Boniface
ad284b13bc
Fix bugs with fencing
2019-07-09 19:17:53 -04:00
Joshua Boniface
7df200ac44
Improve ZK connection loss handling
2019-07-09 19:17:32 -04:00
Joshua Boniface
47f86475f8
Handle failures of Ceph commands gradefully
...
If these commands fail, catch the error, print a message, and set up
empty lists. Also handle later data parsing in this case.
2019-07-09 16:43:38 -04:00
Joshua Boniface
1a8e7509f7
Support run_os_command timeout; use timeouts
2019-07-09 15:09:13 -04:00
Joshua Boniface
83a4140703
Allow enabling debug mode in config
...
Makes debugging easier without modifying code.
2019-07-09 14:59:00 -04:00
Joshua Boniface
8eeba9bc9b
Make Ceph commands time out if needed
2019-07-09 14:35:53 -04:00
Joshua Boniface
19701c66e4
Move fencing to after keepalive output
...
Just makes the messages a little easier to read when triggered.
2019-07-09 14:24:31 -04:00
Joshua Boniface
17dfaf43c5
Move hypervisor selection out to common
2019-07-09 14:20:58 -04:00
Joshua Boniface
b551b54642
Rename message when contending
2019-07-09 14:03:48 -04:00
Joshua Boniface
4249d5d982
Always load and store IPMI on daemon start
...
Without this, the IPMI information set during initial node creation can
never be changed, which can cause issues later. Instead, always set it
fresh on each node boot.
2019-07-09 14:00:31 -04:00
Joshua Boniface
7f828a27a5
Free RBD locks when fencing node
2019-07-09 10:59:31 -04:00
Joshua Boniface
bc54ea2449
Log message when starting or stopping API client
2019-07-08 19:29:49 -04:00
Joshua Boniface
cda690e94f
Set RADOS df information in ZK
2019-07-08 10:19:56 -04:00
Joshua Boniface
d9ebd04264
Fix missing dom_uuid values in data reads
2019-07-07 15:30:28 -04:00
Joshua Boniface
b82ccaa84d
Improve flush handling
...
Similar to recent client changes, don't replace the previous node record
of an already-migrated VM. Wait for shutdown if required. Use a
continue statement instead of a needless else block.
2019-07-07 15:27:37 -04:00
Joshua Boniface
0d398f663b
Rename "Domain" to "VM" in various class names
...
The name "Domain", though technically correct from a Libvirt
perspective, was unnecessarily confusing. Call the class instances what
they are, VMs.
2019-07-07 15:20:37 -04:00