Commit Graph

200 Commits

Author SHA1 Message Date
Joshua Boniface 18fc49fc6c Use node instead of hypervisor consistently 2019-10-12 01:59:08 -04:00
Joshua Boniface 8dc0c8f0ac Fix minor bugs 2019-10-12 01:36:50 -04:00
Joshua Boniface 5995353597 Implement VM metadata and use it
Implements the storing of three VM metadata attributes:
1. Node limits - allows specifying a list of hosts on which the VM must
run. This limit influences the migration behaviour of VMs.
2. Per-VM node selectors - allows each VM to have its migration
autoselection method specified, to automatically allow different methods
per VM based on the administrator's preferences.
3. VM autorestart - allows a VM to be automatically restarted from a
stopped state, presumably due to a failure to find a target node (either
due to limits or otherwise) during a flush/fence recovery, on the next
node unflush/ready state of its home hypervisor. Useful mostly in
conjunction with limits to ensure that VMs which were shut down due to
there being no valid migration targets are started back up when their
node becomes ready again.

Includes the full client interaction with these metadata options,
including printing, as well as defining a new function to modify this
metadata. For the CLI it is set/modified either on `vm define` or via the
`vm meta` command. For the API it is set/modified either on a POST to
the `/vm` endpoint (during VM definition) or on POST to the `/vm/<vm>`
endpoint. For the API this replaces the previous reserved word for VM
creation from scratch as this will no longer be implemented in-daemon
(see #22).

Closes #52
2019-10-12 01:17:39 -04:00
Joshua Boniface 76e6b42389 Add clone_volume backend command 2019-10-10 14:09:07 -04:00
Joshua Boniface 983daceaed Fix shutdown abort during restart
Restart state, being different from shutdown, would trigger an abort of
the shutdown. Fix this by including restart in the valid states to
continue.
2019-09-07 12:08:31 -04:00
Joshua Boniface 7c4d18691a Implement configurable replcfg (node-side)
Implements administrator-selectable replication configurations for new
pools in PVC clusters, overriding the default of copies=3,mincopies=2.
2019-08-23 21:58:54 -04:00
Joshua Boniface 267a3d16e5 Bump version to 0.5 2019-08-08 20:56:27 -04:00
Joshua Boniface 2880a761c0 Move Ceph command pipe to new location
Matching the new /cmd/domain pipe, move Ceph pipe to /cmd/ceph.
2019-08-07 14:47:27 -04:00
Joshua Boniface b7546e3711 Fix bugs in command pipeline for VMs 2019-08-07 14:13:01 -04:00
Joshua Boniface 0ff2d7d537 Use shlex for command splitting
This will preserve quoted strings, required for the rbd lock commands.
2019-08-07 14:02:57 -04:00
Joshua Boniface a2a630f6a0 Add pipeline for VM lock flush cmd 2019-08-07 13:49:33 -04:00
Joshua Boniface 496216321e Move lock flushing to VMInstance
Prepares for reuse of this function via client commands.
2019-08-07 13:36:56 -04:00
Joshua Boniface 0446b2db02 Catch exceptions if Patroni is not up 2019-08-07 11:46:58 -04:00
Joshua Boniface 7e77752ce5 Add limit to Patroni switchover attempts 2019-08-07 11:46:42 -04:00
Joshua Boniface 33a963c2af Improve fence output on failure and increase delay 2019-08-07 11:35:49 -04:00
Joshua Boniface e92a57606d Use better forceful arping command
Send ARP responses with the source IP in it to force update even if the
old primary did not cleanly terminate (during fencing for instance).
2019-08-07 11:29:38 -04:00
Joshua Boniface ef3b6b3723 Arping 3 times instead of 2
During fence 2 is not always enough for the network to recognize the
change in primary coordinator.
2019-08-07 11:15:36 -04:00
Joshua Boniface 3b27a88128 Allow abort of shutdown state
Adds some logic to allow an active shutdown state to be aborted by
changing the VM to another state. Useful mostly if a VM is doing funky
things and not responding to the shutdown, but the administrator either
doesn't want to wait for the timer to expire (forcing an immediate
termination) or wishes to abort the shutdown attempt.

Fixes #49
2019-08-07 10:58:18 -04:00
Joshua Boniface e2ae58b62c Add the missing newline to the string compare 2019-08-04 17:00:33 -04:00
Joshua Boniface d0d5ab4425 Fix bug if the switchover target is the same 2019-08-04 16:51:11 -04:00
Joshua Boniface a329376d33 Lock primary_node key during primary switchover
Also implements a looping to switch over the Patroni leader to ensure
this always follows the primary and clean up the code around here a bit.
2019-08-04 16:42:06 -04:00
Joshua Boniface 710d2cf9c2 Fix record duplication bug and general cleanup
Fixes #47
2019-08-01 13:11:45 -04:00
Joshua Boniface 8bdec03cf1 Properly support debug logging via config 2019-08-01 11:22:27 -04:00
Joshua Boniface c6e58796ba Clean up redundant return section 2019-07-31 23:57:31 -04:00
Joshua Boniface 7380f45b1b Improve dnsmasq interface handling
listen-address is enough; adding interface too causes weird issues where
dnsmasq is listening on an IPv6 global wildcard too which conflicts with
the PowerDNS instance.
2019-07-31 10:03:56 -04:00
Joshua Boniface 324990739e Make DNS aggregator listen on port 53
Using the non-standard port was a pain. Now that all the DNSMasq stuff
works, move back to the default port.
2019-07-30 09:20:01 -04:00
Joshua Boniface 717d00cfcf Implement snapshot rename in node daemon
[4/2] Implements #44
2019-07-28 23:06:12 -04:00
Joshua Boniface 83b806d0b5 Move intervals config one level up
Makes for a slightly-better-organized configuration and explanation.
2019-07-28 19:33:23 -04:00
Joshua Boniface 68ca493b3b Fix bad error code 2019-07-26 20:53:01 -04:00
Joshua Boniface 837666a15e Revamp renamekey function
The function had numerous bugs and didn't work. Fix them up.
2019-07-26 16:38:05 -04:00
Joshua Boniface 35363671a0 Implement Ceph volume resize and rename
Includes a simple implementation of a zookeeper "rename" facility,
allowing a key and all data to be replaced by a new key with a different
name but containing all the same child elements and data.

[2/2] Implements #44
2019-07-26 15:13:21 -04:00
Joshua Boniface 50367c9190 Improve OSD create messages 2019-07-26 11:41:51 -04:00
Joshua Boniface 96bc181877 Set the routerstate on daemon startup
Allows switching from coordinator to not coordinator with a service
restart.
2019-07-12 09:51:56 -04:00
Joshua Boniface 2a220cd16e Nicer colour output for coordinator state client 2019-07-12 09:31:42 -04:00
Joshua Boniface 439c5f18c3 Add router_state to output of keepalives 2019-07-11 20:11:05 -04:00
Joshua Boniface f30be555c1 Improve message output for logging
Improve some formatting of the messages being printed to make it nicer
for long-term logging.
2019-07-10 22:38:32 -04:00
Joshua Boniface ac36870a86 Implement hup for log rotation
This function was long-existent, but never used; implement it.
2019-07-10 22:22:02 -04:00
Joshua Boniface 58f4222ee7 Support disabling log colours and dates
For usecases such as a pure-syslog, allow disabling of dates or colours
in the log messages (separately).
2019-07-10 22:17:23 -04:00
Joshua Boniface 32a6369de2 Add nicer message when live migrate fails 2019-07-10 17:42:24 -04:00
Joshua Boniface 8a28738bff Use consistent terminology in fence message 2019-07-10 11:54:56 -04:00
Joshua Boniface 8f160abf90 Handle cancelling flushes when new ones run
Store the flush_thread of a node as a class object. Before starting a
new flush thread (either flush or unflush), stop the existing one if it
exists to prevent further migrations, then start the new thread. Set the
object to None on init and again once the task actually finishes. Remove
the inflush flag as this is not required when using these threads and
functionally does nothing any longer, but add the flush_stopper flag to
trigger cancellation of the current job.
2019-07-10 11:54:34 -04:00
Joshua Boniface c7c8c8bcbb Fix bug with flush 2019-07-10 00:43:55 -04:00
Joshua Boniface 7a8aee9fe7 Remove flush locking functionality
This just seemed like more trouble that it was worth. Flush locks were
originally intended as a way to counteract the weird issues around
flushing that were mostly fixed by the code refactoring, so this will
help test if those issues are truly gone. If not, will look into a
cleaner solution that doesn't result in unchangeable states.
2019-07-09 23:59:17 -04:00
Joshua Boniface ad284b13bc Fix bugs with fencing 2019-07-09 19:17:53 -04:00
Joshua Boniface 7df200ac44 Improve ZK connection loss handling 2019-07-09 19:17:32 -04:00
Joshua Boniface 47f86475f8 Handle failures of Ceph commands gradefully
If these commands fail, catch the error, print a message, and set up
empty lists. Also handle later data parsing in this case.
2019-07-09 16:43:38 -04:00
Joshua Boniface 1a8e7509f7 Support run_os_command timeout; use timeouts 2019-07-09 15:09:13 -04:00
Joshua Boniface 83a4140703 Allow enabling debug mode in config
Makes debugging easier without modifying code.
2019-07-09 14:59:00 -04:00
Joshua Boniface 8eeba9bc9b Make Ceph commands time out if needed 2019-07-09 14:35:53 -04:00
Joshua Boniface 19701c66e4 Move fencing to after keepalive output
Just makes the messages a little easier to read when triggered.
2019-07-09 14:24:31 -04:00