Commit Graph

487 Commits

Author SHA1 Message Date
Joshua Boniface a940d03959 Fix some bugs and add RBD volume stats 2019-06-19 10:25:22 -04:00
Joshua Boniface db0b382b3d Don't bother with snapshot management by Daemon
This is *definitely* not needed in the end, and just uses RAM for
no conceivable purpose. Snapshots are fully client-managed.
2019-06-19 09:43:04 -04:00
Joshua Boniface 1c9f606480 Implement volume and snapshot handling by daemon
This seems like a super-gross way to do this, but at the moment
I don't have a better way. Maybe just remove this component since
none of the volume/snapshot stuff is dynamic; will see as this
progresses.
2019-06-19 09:40:32 -04:00
Joshua Boniface 784b428ed0 Add creation of volume and snapshot lists 2019-06-19 09:29:36 -04:00
Joshua Boniface 064e6455bc Correct some more bugs 2019-06-19 00:29:21 -04:00
Joshua Boniface a4ab3075ab Correct some bugs around new code 2019-06-19 00:23:25 -04:00
Joshua Boniface 01959cb9e3 Implementation of RBD volumes and snapshots
Adds the ability to manage RBD volumes (add/remove) and RBD
snapshots (add/remove). (Working) list functions to come.
2019-06-19 00:12:44 -04:00
Joshua Boniface 2bbbda3da5 Only trigger pool updates on primary 2019-06-18 21:26:05 -04:00
Joshua Boniface 612f5ab52c Strip pv_block from stdout 2019-06-18 20:34:25 -04:00
Joshua Boniface 1622226c32 Add more logging during OSD creation/deletion 2019-06-18 20:31:04 -04:00
Joshua Boniface 3adeef6fdd Use the fsid to activate new OSDs 2019-06-18 20:22:28 -04:00
Joshua Boniface 443108f53d Add support for enable/disable keepalive detail 2019-06-18 19:54:42 -04:00
Joshua Boniface 79f284a0a9 Pass logger into run_command 2019-06-18 13:45:59 -04:00
Joshua Boniface 080ca3201c Correct actual problem with this_node 2019-06-18 13:43:54 -04:00
Joshua Boniface d076f9f4eb Use self.this_node everywhere 2019-06-18 13:25:16 -04:00
Joshua Boniface aee078f3eb Support disabling keepalive logging 2019-06-18 12:44:07 -04:00
Joshua Boniface b0411e8e1a Remove "error" message from Ceph commands
This triggeres at every node start and isn't useful.
2019-06-18 12:41:38 -04:00
Joshua Boniface 8d9007f697 Remove OSD stat collection if count is zero
Otherwise, ceph osd df will hang indefinitely trying to get data
for the zero OSDs.
2019-06-18 12:36:53 -04:00
Joshua Boniface 5a327dc41a Clean up Ceph pipeline and add more debug logs 2019-06-18 11:19:03 -04:00
Joshua Boniface 46a416bc78 Use a proper variable for vni_mtu 2019-06-18 00:01:12 -04:00
Joshua Boniface 1f92b90a3e Don't encode initial data as we're using zkhander 2019-06-17 23:53:16 -04:00
Joshua Boniface d4ebe63d9b Rename network device field
It seems much nicer and more consistent as "device" rather than as
"name".
2019-06-17 23:44:41 -04:00
Joshua Boniface 1d3f868206 Unify network devices and addresses in config
The old way of doing this was a little cumbersome, with an upper YAML
tree split between "devices" (name and MTU) and addresses. This commit
unifies these under the root "networking" section to make this section
clearer.
2019-06-17 23:41:07 -04:00
Joshua Boniface e70255dbd6 Support configurable interface MTUs
MTUs were hardcoded at 9000, which breaks if the underlying interface
or network switch does not support jumbo frames, a possible deployment
limitation. This has non-obvious consequences due to MTU mismatches
for certain services (Ceph, Zookeeper, etc.).

This commit adds support for configurable MTUs for each interface,
set in pvcd.yaml. The example has been updated to reflect this, with
a default of 1500 (the Ethernet standard).

This commit also adds autoconfiguration of the VNI device MTU based
on the `vni_mtu` value, the same for bridge networks and minus 50
(rather than 200 from the hardcoded value, based on the following
resource [1]) for VXLAN networks.

[1] http://ipengineer.net/2014/06/vxlan-mtu-vs-ip-mtu-consideration/
2019-06-17 23:34:48 -04:00
Joshua Boniface c583ee1709 Revert "Wait a little longer"
This reverts commit bd7a55e9e1.

This is not really needed, but do keep the 5s wait
2019-06-17 21:56:06 -04:00
Joshua Boniface bd7a55e9e1 Wait a little longer 2019-06-17 12:14:13 -04:00
Joshua Boniface 23994f8a11 Increase wait time for daemons and log message 2019-06-17 10:30:46 -04:00
Joshua Boniface fe654aa5a2 Correct typo in daemon 2019-06-16 19:27:20 -04:00
Joshua Boniface 14e9ba892c Wait on both sides for 30s
Still finding issues with the flush
2019-05-24 01:23:18 -04:00
Joshua Boniface ae37afcf75 Wait 10 seconds when starting pvc-flush
Without waiting the unflush will trigger too soon, before the
daemon is fully ready and such it fails in odd ways.
2019-05-23 23:35:01 -04:00
Joshua Boniface e8b666708c Add one final keepalive update before exiting 2019-05-23 23:23:03 -04:00
Joshua Boniface 4c5ce9b995 Perform additional tweaks to units
Use RemainAfterExit to avoid pvc-flush from auto-stopping immediately.

Use PartOf to tie services to the target itself.

Use --wait on flush to avoid daemon stopping before flush is complete.
2019-05-23 23:18:28 -04:00
Joshua Boniface e46aa22989 Remove invalid Restart in pvc-flush.service 2019-05-23 22:51:36 -04:00
Joshua Boniface 7c6132f7dd Add node autoflush service and target
Add a systemd service to manage node flush/unflush, useful during
system startup and shutdown to avoid requiring administrator
intervention for this to occur. This is optional and the service is
not enabled by default, and the postinst script informs the
administrator of this.

Also adds a systemd target to collect the two service units together
and provide an easy way to flush+shutdown or startup+unflush the
entire PVC system.

Closes #28
2019-05-23 22:42:51 -04:00
Joshua Boniface 8ef21cf9f2 Sleep longer before removing gateways
1 second was just slightly too little time to wait and packets would
occasionally be lost on primary switchover. Increase this to 2
seconds to provide more time for arping to run on the new primary.
2019-05-23 22:20:38 -04:00
Joshua Boniface 8881b97e8b Correct a missing capitalization 2019-05-21 23:19:19 -04:00
Joshua Boniface 3893666507 Improve performance by removing spurious actions
1. Remove a number of time.sleep commands which don't really seem
necessary any longer and which significantly increased the startup
time while parsing the VM list.
2. Handle some variable sets during initialization of the object,
rather than waiting for a management command, enabling...
3. Know when a state change, and the corresponding Libvirt lookup,
is unnecessary due to the target node not matching the current node.
This also removes a number of unremovable errors from Libvirt on the
console which were annoying.

This reduces the total time taken by the VM startup segment (lines
760-762 of Daemon.py) from 17.117s down to 0.976s for 82 VMs.
2019-05-21 22:56:40 -04:00
Joshua Boniface 595cf1782c Switch DNS aggregator to PostgreSQL
MariaDB+Galera was terribly unstable, with the cluster failing to
start or dying randomly, and generally seemed incredibly unsuitable
for an HA solution. This commit switches the DNS aggregator SQL
backend to PostgreSQL, implemented via Patroni HA.

It also manages the Patroni state, forcing the primary instance to
follow the PVC coordinator, such that the active DNS Aggregator
instance is always able to communicate read+write with the local
system.

This required some logic changes to how the DNS Aggregator worked,
specifically ensuring that database changes aren't attempted while
the instance isn't actively running - to be honest this was a bug
anyways that had just never been noticed.

Closes #34
2019-05-21 01:07:41 -04:00
Joshua Boniface 9e806d30f9 Only stop log parser if it's actually running 2019-05-11 12:09:42 -04:00
Joshua Boniface 3cf573baf6 Update domainstate after unflush is complete 2019-05-11 00:55:15 -04:00
Joshua Boniface 18a122c772 Remove redundant try block 2019-05-11 00:47:50 -04:00
Joshua Boniface 516ea1b57c Handle unflushes like flushes squentially
Makes an unflush a controlled event like flushing, rather than a
free-for-all. This does slow down unflushing somewhat (disallowing
parallelism from multiple hosts to the current host), but allows
the locking to actually be effective.
2019-05-11 00:30:47 -04:00
Joshua Boniface 62a71af46e Implement locking for unflush as well
References #32
2019-05-11 00:13:03 -04:00
Joshua Boniface 9d8c886811 Correct typo in flush_lock write 2019-05-11 00:08:07 -04:00
Joshua Boniface c19902d952 Implement flush locking for nodes
Implements a locking mechanism to prevent clobbering of node
flushes. When a flush begins, a global cluster lock is placed
which is freed once the flush completes. While the lock is in place,
other flush events queue waiting for the lock to free before
proceeding.

Modifies the CLI output flow when the `--wait` option is specified.
First, if a lock exists when running the command, the message is
tweaked to indicate this, and the client will wait first for the
lock to free, and then for the flush as normal. Second, the wait
depends on the active lock rather than the domain_status for
consistency purposes.

Closes #32
2019-05-10 23:52:24 -04:00
Joshua Boniface 2151566b74 Send total memory via ZK so its accurate 2019-05-10 23:26:59 -04:00
Joshua Boniface 7416d440d5 Use zkhandler when writing initial node config 2019-05-10 23:26:59 -04:00
Joshua Boniface 41d3e79187 Add pause between stop/start on restart 2019-05-10 23:26:59 -04:00
Joshua Boniface b6ecd36588 Implement domain log watching
Implements the ability for a client to watch almost-live domain
console logs from the hypervisors. It does this using a deque-based
"tail -f" mechanism (with a configurable buffer per-VM) that watches
the domain console logfile in the (configurable) directory every
half-second. It then stores the current buffer in Zookeeper when
changed, where a client can then request it, either as a static piece
of text in the `less` pager, or via a similar "tail -f" functionality
implemented using fixed line splitting and comparison to provide a
generally-seamless output.

Enabling this feature requires each guest VM to implement a Libvirt
serial log and write its (text) console to it, for example using the
default logging directory:

```
<serial type='pty'>
    <log file='/var/log/libvirt/vmname.log' append='off'/>
<serial>
```

The append mode can be either on or off; on grows files unbounded,
off causes the log (and hence the PVC log data) to be truncated on
initial VM startup from offline. The administrator must choose how
they best want to handle this until Libvirt implements their own
clog-type logging format.
2019-05-10 23:26:59 -04:00
Joshua Boniface 989c5f6bed Don't depend start on mariadb 2019-05-10 23:26:59 -04:00
Joshua Boniface d5ea38732a Disable RP filtering only on VNI and Upstream devs 2019-03-20 12:01:26 -04:00
Joshua Boniface 0dbd1c41a9 Create floating VNI address on brcluster 2019-03-18 20:17:26 -04:00
Joshua Boniface 013f75111a Rearrange sysctl for rp_filtering off on bridge 2019-03-17 20:05:58 -04:00
Joshua Boniface 4050c452d6 Update dnsmasq script to use YAML config 2019-03-17 13:59:05 -04:00
Joshua Boniface deb4247e25 Only remove gateways when managed 2019-03-17 13:19:44 -04:00
Joshua Boniface 3924586eb5 Update zookeeper inside keepalive start
If nodes reconnect to ZK, this way they update immediately too.
2019-03-17 12:52:23 -04:00
Joshua Boniface 3df8365851 Only manage DHCP on managed networks 2019-03-17 12:36:39 -04:00
Joshua Boniface c52a1845e3 Don't create gateways or rules unless managed 2019-03-17 12:33:54 -04:00
Joshua Boniface aee130f65f Handle the starting of all daemons better 2019-03-17 01:45:17 -04:00
Joshua Boniface f38ab856c2 Move config of local networks before ZK init
Otherwise, ZK will fail to start properly
2019-03-17 00:53:11 -04:00
Joshua Boniface 33070ba4c5 Correct another typo 2019-03-17 00:40:23 -04:00
Joshua Boniface 7a1a29c3fd Correct typo in gateways 2019-03-17 00:39:08 -04:00
Joshua Boniface 3aa8223504 Add support for upstream default gateway 2019-03-17 00:36:19 -04:00
Joshua Boniface 12bc3acf85 Use vmbr name for Bridge interfaces 2019-03-17 00:19:01 -04:00
Joshua Boniface 2782120f94 Correct missing netmask with by-id 2019-03-16 23:27:51 -04:00
Joshua Boniface 946442ae38 Add support for bridge-only VNIs 2019-03-15 13:54:11 -04:00
Joshua Boniface 6eab87a2a8 Fix bad split on list 2019-03-13 19:26:08 -04:00
Joshua Boniface 19445205d7 Go back to on-failure restart 2019-03-12 23:18:28 -04:00
Joshua Boniface d90fb07240 Move to YAML config and allow split functions
1. Move to a YAML-based configuration format instead of the original
   INI-based configuration to facilitate better organization and
   readability.
2. Modify the daemon to be able to operate in several modes based
   on configuration flags. Either networking or storage functions
   can be disabled using the configuration, allowing the PVC system
   to be used only for hypervisor management if required.
2019-03-11 01:47:40 -04:00
Joshua Boniface 994315afa3 Add example YAML file 2019-03-10 20:40:45 -04:00
Joshua Boniface cbc70e2ef8 Use correct IPMItool command to start server 2018-12-07 12:36:53 -05:00
Joshua Boniface be37dd954b Fix output message inconsistency 2018-12-05 23:56:20 -05:00
Joshua Boniface 42f380e339 Only copy over A/AAAA records to aggregator 2018-12-05 23:54:54 -05:00
Joshua Boniface 411dc22384 Add newly-required auth-server directive in dnsmasq 2018-12-05 23:54:16 -05:00
Joshua Boniface d2e9433322 Nicer layout 2018-12-05 21:38:28 -05:00
Joshua Boniface f172574d3a Disable debug mode 2018-11-27 22:19:42 -05:00
Joshua Boniface 397c61f6bf Disable DAD on bridge NICs 2018-11-27 22:19:14 -05:00
Joshua Boniface 1da98a4497 Print better information when AXFR fails 2018-11-27 22:18:59 -05:00
Joshua Boniface a270770ec2 Add debug mode and fix bug 2018-11-27 22:15:19 -05:00
Joshua Boniface 4eaf3f7de3 Correct bug in write locking 2018-11-27 21:30:30 -05:00
Joshua Boniface 0c7705e70f Fix missing variable 2018-11-27 21:26:12 -05:00
Joshua Boniface b8a5073a35 Move OSD handling to CephInstance file 2018-11-23 20:05:07 -05:00
Joshua Boniface 790ed16a42 Make IPMI handling a bit better 2018-11-23 20:05:07 -05:00
Joshua Boniface a911d71644 Add proper header to leases.py 2018-11-23 20:05:07 -05:00
Joshua Boniface 52a9a0e075 Improve fence locking; use consistent ZK lock names 2018-11-20 21:21:23 -05:00
Joshua Boniface 3ff4e9da29 Remove some cruft 2018-11-20 21:11:23 -05:00
Joshua Boniface 6add44936a Clean up some commented code 2018-11-20 21:07:31 -05:00
Joshua Boniface 38c9e71144 Fix last few options for DHCPv6
Closes #26
2018-11-20 20:59:48 -05:00
Joshua Boniface 8737124b36 Add cluster bridge interface 2018-11-18 18:31:02 -05:00
Joshua Boniface 766893f4c6 Remove obsolete schema file 2018-11-18 18:30:55 -05:00
Joshua Boniface e2a1d9ad60 Set pvcd after mariadb.service 2018-11-18 18:30:35 -05:00
Joshua Boniface 37a0432281 Add cluster bridge on startup 2018-11-18 17:58:06 -05:00
Joshua Boniface 84b57ccc87 Add better output when AXFR fails 2018-11-18 17:34:51 -05:00
Joshua Boniface a421bde679 Fix up a few more bugs 2018-11-18 17:29:35 -05:00
Joshua Boniface e71ba42be0 Clean up some unneeded prints 2018-11-18 17:09:52 -05:00
Joshua Boniface 1f58d61cb0 Rewrite DNSAggregatorInstance to handle DNS well
Trying to directly AXFR from dnsmasq is a mess, since their zone is
barely compliant with spec, it doesn't support notifies, and it is
generally really messy.

This implements an advanced "AXFR parser" system, which looks at the
results of an AXFR from the local dnsmasq instances per-network, and
updates the real replicated MariaDB pdns backend cluster with the
changed data. This allows a sensible, transferable zone with its own
SOA that is dynamically reconfigured as hosts come and go from the
dnsmasq zone.
2018-11-18 16:45:52 -05:00
Joshua Boniface b1d0b6e62f Fix up the remaining DHCPv6 setup 2018-11-18 00:55:34 -05:00
Joshua Boniface 4c1e1b4622 Make everything work with dual-stack 2018-11-14 00:26:52 -05:00
Joshua Boniface a2f4102cb5 Add crush weight and reweight output 2018-11-01 23:17:38 -04:00
Joshua Boniface 9fcce4b09a Support setting a CRUSH weight on new OSDs 2018-11-01 23:03:27 -04:00
Joshua Boniface 2ea8a14ba4 Support OSD out/in and commands 2018-11-01 22:08:11 -04:00
Joshua Boniface 99fcb21e3b Support adding and removing Ceph pools 2018-10-31 23:38:17 -04:00
Joshua Boniface 3e4a6086d5 Finish up Ceph OSD removal, add locking to commands 2018-10-30 22:41:44 -04:00
Joshua Boniface 89a3e0c7ee Rename some entries for consistency 2018-10-30 09:17:41 -04:00
Joshua Boniface bfbe9188ce Finish setup of Ceph OSD addition and basic management 2018-10-29 17:51:25 -04:00
Joshua Boniface 59472ae374 Fix up bad restart 2018-10-28 22:16:06 -04:00
Joshua Boniface 939532c293 Show ceph health status in keepalive message 2018-10-27 18:24:27 -04:00
Joshua Boniface 4422eb8941 Write Ceph status data to ZK 2018-10-27 18:04:55 -04:00
Joshua Boniface 0c67812fc2 Fix shutdown secondary bug 2018-10-27 16:33:29 -04:00
Joshua Boniface d8796fd6d6 Move IP creation/removal to common function 2018-10-27 16:31:31 -04:00
Joshua Boniface d727f91c06 Fix typo 2018-10-25 23:38:49 -04:00
Joshua Boniface 35eee2c498 Fix comment 2018-10-25 23:28:30 -04:00
Joshua Boniface 3e2a6b8e80 Better handle termination; remove cluster info from keepalive printout 2018-10-25 22:21:40 -04:00
Joshua Boniface 62b2718d7a Remove kill signal 2018-10-25 22:09:42 -04:00
Joshua Boniface fd27d3f544 Add and remove dnsaggregator nets on primary change 2018-10-25 22:09:32 -04:00
Joshua Boniface 5740df3a04 Set aggregator zones back to SLAVE 2018-10-25 21:40:21 -04:00
Joshua Boniface 73755ae4a9 Allow NTP in to the router in NFT 2018-10-25 11:43:38 -04:00
Joshua Boniface 12c55d6b7a Just push out the gateway for NTP since mcast won't work 2018-10-24 01:13:47 -04:00
Joshua Boniface 7d9426dd65 Add NTP to dnsmasq DHCP; move mkdir of dnsmasq_hostsdir to init 2018-10-24 01:04:04 -04:00
Joshua Boniface 94398a7847 Remove spurious ipmi_command definition 2018-10-22 23:49:56 -04:00
Joshua Boniface 2cdd98d0f1 I do have to restart Kazoo during the SUSPENDED fail 2018-10-22 23:11:04 -04:00
Joshua Boniface 6b5fa3d50b Move Zookeeper update out of NodeInstance and into the main Daemon 2018-10-22 21:01:59 -04:00
Joshua Boniface bfd42b5a7b Make primary watching happen in the daemon not the Node object 2018-10-21 22:08:23 -04:00
Joshua Boniface 187a572c13 Make a whole bunch of things work 2018-10-17 20:05:22 -04:00
Joshua Boniface 87d1c7513e Add floating IPs and better termination of daemons 2018-10-17 00:23:43 -04:00
Joshua Boniface 1b49f70b3c Tweaks to the dameon operation 2018-10-15 22:22:34 -04:00
Joshua Boniface c13a4e84af Add DNS aggregator via PowerDNS and sqlite3 2018-10-15 21:09:40 -04:00
Joshua Boniface a2a7a1d790 Support logging of daemon directly to a file 2018-10-15 21:07:00 -04:00
Joshua Boniface a5c76c5d41 Use new-style class definitions 2018-10-14 22:14:29 -04:00
Joshua Boniface a3b1445bf1 Support configuring upstream interface on coordinators 2018-10-14 21:58:19 -04:00
Joshua Boniface 709be9fbba Explicitly stop dnsmasq service on startup 2018-10-14 18:38:23 -04:00
Joshua Boniface 2e2459c63c Some cleanups and fix bridge interface bug 2018-10-14 18:35:57 -04:00
Joshua Boniface 3bbff271a0 Reorganize sysctl commands; fix bug with rp_filtering on vni_dev 2018-10-14 11:00:31 -04:00
Joshua Boniface d4e5015db4 Shorten this string 2018-10-14 03:08:11 -04:00
Joshua Boniface b0a4ca97bf Typo in this_node reference 2018-10-14 03:02:47 -04:00
Joshua Boniface 5337e8242d Add sysctl tweaks on daemon startup 2018-10-14 02:58:02 -04:00
Joshua Boniface f198f62563 Massive rejigger into single daemon
Completely restructure the daemon code to move the 4 discrete daemons
into a single daemon that can be run on every hypervisor. Introduce the
idea of a static list of "coordinator" nodes which are configured at
install time to run Zookeeper and FRR in router mode, and which are
allowed to take on client network management duties (gateway, DHCP, DNS,
etc.) while also allowing them to run VMs (i.e. no dedicated "router"
nodes required).
2018-10-14 02:40:54 -04:00