parallelvirtualcluster/pvc

Author	SHA1	Message	Date
Joshua M. Boniface	83b806d0b5	Move intervals config one level up Makes for a slightly-better-organized configuration and explanation.	2019-07-28 19:33:23 -04:00
Joshua M. Boniface	96bc181877	Set the routerstate on daemon startup Allows switching from coordinator to not coordinator with a service restart.	2019-07-12 09:51:56 -04:00
Joshua M. Boniface	2a220cd16e	Nicer colour output for coordinator state client	2019-07-12 09:31:42 -04:00
Joshua M. Boniface	439c5f18c3	Add router_state to output of keepalives	2019-07-11 20:11:05 -04:00
Joshua M. Boniface	f30be555c1	Improve message output for logging Improve some formatting of the messages being printed to make it nicer for long-term logging.	2019-07-10 22:38:32 -04:00
Joshua M. Boniface	ac36870a86	Implement hup for log rotation This function was long-existent, but never used; implement it.	2019-07-10 22:22:02 -04:00
Joshua M. Boniface	58f4222ee7	Support disabling log colours and dates For usecases such as a pure-syslog, allow disabling of dates or colours in the log messages (separately).	2019-07-10 22:17:23 -04:00
Joshua M. Boniface	7df200ac44	Improve ZK connection loss handling	2019-07-09 19:17:32 -04:00
Joshua M. Boniface	47f86475f8	Handle failures of Ceph commands gradefully If these commands fail, catch the error, print a message, and set up empty lists. Also handle later data parsing in this case.	2019-07-09 16:43:38 -04:00
Joshua M. Boniface	1a8e7509f7	Support run_os_command timeout; use timeouts	2019-07-09 15:09:13 -04:00
Joshua M. Boniface	83a4140703	Allow enabling debug mode in config Makes debugging easier without modifying code.	2019-07-09 14:59:00 -04:00
Joshua M. Boniface	8eeba9bc9b	Make Ceph commands time out if needed	2019-07-09 14:35:53 -04:00
Joshua M. Boniface	19701c66e4	Move fencing to after keepalive output Just makes the messages a little easier to read when triggered.	2019-07-09 14:24:31 -04:00
Joshua M. Boniface	b551b54642	Rename message when contending	2019-07-09 14:03:48 -04:00
Joshua M. Boniface	4249d5d982	Always load and store IPMI on daemon start Without this, the IPMI information set during initial node creation can never be changed, which can cause issues later. Instead, always set it fresh on each node boot.	2019-07-09 14:00:31 -04:00
Joshua M. Boniface	cda690e94f	Set RADOS df information in ZK	2019-07-08 10:19:56 -04:00
Joshua M. Boniface	0d398f663b	Rename "Domain" to "VM" in various class names The name "Domain", though technically correct from a Libvirt perspective, was unnecessarily confusing. Call the class instances what they are, VMs.	2019-07-07 15:20:37 -04:00
Joshua M. Boniface	8216125b02	Enable autostart of API client on Primary Adds a config flag that turns on the API client following the Primary coordinator. The retcode of the start/stop commands is ignore so this can fail gracefully if e.g. the client isn't installed.	2019-07-06 02:42:56 -04:00
Joshua M. Boniface	3e591bd09e	Remove extra whitespaces on blank lines	2019-06-25 22:33:23 -04:00
Joshua M. Boniface	d336fce253	Connect to actual IP not localhost for Libvirt	2019-06-25 22:09:32 -04:00
Joshua M. Boniface	75d0e7f989	Revert "Only perform fencing duties on primary" This reverts commit `464c69aac6`. Actually, yea, this made sense - if the primary fails, it can't fence itself.	2019-06-25 12:36:48 -04:00
Joshua M. Boniface	464c69aac6	Only perform fencing duties on primary There was really no need for this to be shared among all the coordinators, which seemed more fragile. This way only the primary will try to fence dead nodes.	2019-06-24 20:17:51 -04:00
Joshua M. Boniface	0f15e7cda5	Set shutdown state after final keepalive	2019-06-19 14:52:47 -04:00
Joshua M. Boniface	0060c0313b	Put daemonstate to shutdown when stopping This way it isn't "run" all the way until it shuts down.	2019-06-19 14:23:07 -04:00
Joshua M. Boniface	a940d03959	Fix some bugs and add RBD volume stats	2019-06-19 10:25:22 -04:00
Joshua M. Boniface	db0b382b3d	Don't bother with snapshot management by Daemon This is definitely not needed in the end, and just uses RAM for no conceivable purpose. Snapshots are fully client-managed.	2019-06-19 09:43:04 -04:00
Joshua M. Boniface	1c9f606480	Implement volume and snapshot handling by daemon This seems like a super-gross way to do this, but at the moment I don't have a better way. Maybe just remove this component since none of the volume/snapshot stuff is dynamic; will see as this progresses.	2019-06-19 09:40:32 -04:00
Joshua M. Boniface	784b428ed0	Add creation of volume and snapshot lists	2019-06-19 09:29:36 -04:00
Joshua M. Boniface	2bbbda3da5	Only trigger pool updates on primary	2019-06-18 21:26:05 -04:00
Joshua M. Boniface	443108f53d	Add support for enable/disable keepalive detail	2019-06-18 19:54:42 -04:00
Joshua M. Boniface	79f284a0a9	Pass logger into run_command	2019-06-18 13:45:59 -04:00
Joshua M. Boniface	080ca3201c	Correct actual problem with this_node	2019-06-18 13:43:54 -04:00
Joshua M. Boniface	aee078f3eb	Support disabling keepalive logging	2019-06-18 12:44:07 -04:00
Joshua M. Boniface	b0411e8e1a	Remove "error" message from Ceph commands This triggeres at every node start and isn't useful.	2019-06-18 12:41:38 -04:00
Joshua M. Boniface	8d9007f697	Remove OSD stat collection if count is zero Otherwise, ceph osd df will hang indefinitely trying to get data for the zero OSDs.	2019-06-18 12:36:53 -04:00
Joshua M. Boniface	5a327dc41a	Clean up Ceph pipeline and add more debug logs	2019-06-18 11:19:03 -04:00
Joshua M. Boniface	1f92b90a3e	Don't encode initial data as we're using zkhander	2019-06-17 23:53:16 -04:00
Joshua M. Boniface	d4ebe63d9b	Rename network device field It seems much nicer and more consistent as "device" rather than as "name".	2019-06-17 23:44:41 -04:00
Joshua M. Boniface	1d3f868206	Unify network devices and addresses in config The old way of doing this was a little cumbersome, with an upper YAML tree split between "devices" (name and MTU) and addresses. This commit unifies these under the root "networking" section to make this section clearer.	2019-06-17 23:41:07 -04:00
Joshua M. Boniface	e70255dbd6	Support configurable interface MTUs MTUs were hardcoded at 9000, which breaks if the underlying interface or network switch does not support jumbo frames, a possible deployment limitation. This has non-obvious consequences due to MTU mismatches for certain services (Ceph, Zookeeper, etc.). This commit adds support for configurable MTUs for each interface, set in pvcd.yaml. The example has been updated to reflect this, with a default of 1500 (the Ethernet standard). This commit also adds autoconfiguration of the VNI device MTU based on the `vni_mtu` value, the same for bridge networks and minus 50 (rather than 200 from the hardcoded value, based on the following resource [1]) for VXLAN networks. [1] http://ipengineer.net/2014/06/vxlan-mtu-vs-ip-mtu-consideration/	2019-06-17 23:34:48 -04:00
Joshua M. Boniface	c583ee1709	Revert "Wait a little longer" This reverts commit `bd7a55e9e1`. This is not really needed, but do keep the 5s wait	2019-06-17 21:56:06 -04:00
Joshua M. Boniface	bd7a55e9e1	Wait a little longer	2019-06-17 12:14:13 -04:00
Joshua M. Boniface	23994f8a11	Increase wait time for daemons and log message	2019-06-17 10:30:46 -04:00
Joshua M. Boniface	fe654aa5a2	Correct typo in daemon	2019-06-16 19:27:20 -04:00
Joshua M. Boniface	e8b666708c	Add one final keepalive update before exiting	2019-05-23 23:23:03 -04:00
Joshua M. Boniface	8881b97e8b	Correct a missing capitalization	2019-05-21 23:19:19 -04:00
Joshua M. Boniface	595cf1782c	Switch DNS aggregator to PostgreSQL MariaDB+Galera was terribly unstable, with the cluster failing to start or dying randomly, and generally seemed incredibly unsuitable for an HA solution. This commit switches the DNS aggregator SQL backend to PostgreSQL, implemented via Patroni HA. It also manages the Patroni state, forcing the primary instance to follow the PVC coordinator, such that the active DNS Aggregator instance is always able to communicate read+write with the local system. This required some logic changes to how the DNS Aggregator worked, specifically ensuring that database changes aren't attempted while the instance isn't actively running - to be honest this was a bug anyways that had just never been noticed. Closes #34	2019-05-21 01:07:41 -04:00
Joshua Boniface	2151566b74	Send total memory via ZK so its accurate	2019-05-10 23:26:59 -04:00
Joshua Boniface	7416d440d5	Use zkhandler when writing initial node config	2019-05-10 23:26:59 -04:00
Joshua Boniface	b6ecd36588	Implement domain log watching Implements the ability for a client to watch almost-live domain console logs from the hypervisors. It does this using a deque-based "tail -f" mechanism (with a configurable buffer per-VM) that watches the domain console logfile in the (configurable) directory every half-second. It then stores the current buffer in Zookeeper when changed, where a client can then request it, either as a static piece of text in the `less` pager, or via a similar "tail -f" functionality implemented using fixed line splitting and comparison to provide a generally-seamless output. Enabling this feature requires each guest VM to implement a Libvirt serial log and write its (text) console to it, for example using the default logging directory: ``` <serial type='pty'> <log file='/var/log/libvirt/vmname.log' append='off'/> <serial> ``` The append mode can be either on or off; on grows files unbounded, off causes the log (and hence the PVC log data) to be truncated on initial VM startup from offline. The administrator must choose how they best want to handle this until Libvirt implements their own clog-type logging format.	2019-05-10 23:26:59 -04:00

1 2

99 Commits