111 Commits

Author SHA1 Message Date
1d3f868206 Unify network devices and addresses in config
The old way of doing this was a little cumbersome, with an upper YAML
tree split between "devices" (name and MTU) and addresses. This commit
unifies these under the root "networking" section to make this section
clearer.
2019-06-17 23:41:07 -04:00
e70255dbd6 Support configurable interface MTUs
MTUs were hardcoded at 9000, which breaks if the underlying interface
or network switch does not support jumbo frames, a possible deployment
limitation. This has non-obvious consequences due to MTU mismatches
for certain services (Ceph, Zookeeper, etc.).

This commit adds support for configurable MTUs for each interface,
set in pvcd.yaml. The example has been updated to reflect this, with
a default of 1500 (the Ethernet standard).

This commit also adds autoconfiguration of the VNI device MTU based
on the `vni_mtu` value, the same for bridge networks and minus 50
(rather than 200 from the hardcoded value, based on the following
resource [1]) for VXLAN networks.

[1] http://ipengineer.net/2014/06/vxlan-mtu-vs-ip-mtu-consideration/
2019-06-17 23:34:48 -04:00
c583ee1709 Revert "Wait a little longer"
This reverts commit bd7a55e9e1de08f00208e641b237b1bbe7ab420f.

This is not really needed, but do keep the 5s wait
2019-06-17 21:56:06 -04:00
bd7a55e9e1 Wait a little longer 2019-06-17 12:14:13 -04:00
23994f8a11 Increase wait time for daemons and log message 2019-06-17 10:30:46 -04:00
fe654aa5a2 Correct typo in daemon 2019-06-16 19:27:20 -04:00
e8b666708c Add one final keepalive update before exiting 2019-05-23 23:23:03 -04:00
8881b97e8b Correct a missing capitalization 2019-05-21 23:19:19 -04:00
595cf1782c Switch DNS aggregator to PostgreSQL
MariaDB+Galera was terribly unstable, with the cluster failing to
start or dying randomly, and generally seemed incredibly unsuitable
for an HA solution. This commit switches the DNS aggregator SQL
backend to PostgreSQL, implemented via Patroni HA.

It also manages the Patroni state, forcing the primary instance to
follow the PVC coordinator, such that the active DNS Aggregator
instance is always able to communicate read+write with the local
system.

This required some logic changes to how the DNS Aggregator worked,
specifically ensuring that database changes aren't attempted while
the instance isn't actively running - to be honest this was a bug
anyways that had just never been noticed.

Closes #34
2019-05-21 01:07:41 -04:00
2151566b74 Send total memory via ZK so its accurate 2019-05-10 23:26:59 -04:00
7416d440d5 Use zkhandler when writing initial node config 2019-05-10 23:26:59 -04:00
b6ecd36588 Implement domain log watching
Implements the ability for a client to watch almost-live domain
console logs from the hypervisors. It does this using a deque-based
"tail -f" mechanism (with a configurable buffer per-VM) that watches
the domain console logfile in the (configurable) directory every
half-second. It then stores the current buffer in Zookeeper when
changed, where a client can then request it, either as a static piece
of text in the `less` pager, or via a similar "tail -f" functionality
implemented using fixed line splitting and comparison to provide a
generally-seamless output.

Enabling this feature requires each guest VM to implement a Libvirt
serial log and write its (text) console to it, for example using the
default logging directory:

```
<serial type='pty'>
    <log file='/var/log/libvirt/vmname.log' append='off'/>
<serial>
```

The append mode can be either on or off; on grows files unbounded,
off causes the log (and hence the PVC log data) to be truncated on
initial VM startup from offline. The administrator must choose how
they best want to handle this until Libvirt implements their own
clog-type logging format.
2019-05-10 23:26:59 -04:00
d5ea38732a Disable RP filtering only on VNI and Upstream devs 2019-03-20 12:01:26 -04:00
013f75111a Rearrange sysctl for rp_filtering off on bridge 2019-03-17 20:05:58 -04:00
3924586eb5 Update zookeeper inside keepalive start
If nodes reconnect to ZK, this way they update immediately too.
2019-03-17 12:52:23 -04:00
aee130f65f Handle the starting of all daemons better 2019-03-17 01:45:17 -04:00
f38ab856c2 Move config of local networks before ZK init
Otherwise, ZK will fail to start properly
2019-03-17 00:53:11 -04:00
33070ba4c5 Correct another typo 2019-03-17 00:40:23 -04:00
7a1a29c3fd Correct typo in gateways 2019-03-17 00:39:08 -04:00
3aa8223504 Add support for upstream default gateway 2019-03-17 00:36:19 -04:00
2782120f94 Correct missing netmask with by-id 2019-03-16 23:27:51 -04:00
946442ae38 Add support for bridge-only VNIs 2019-03-15 13:54:11 -04:00
d90fb07240 Move to YAML config and allow split functions
1. Move to a YAML-based configuration format instead of the original
   INI-based configuration to facilitate better organization and
   readability.
2. Modify the daemon to be able to operate in several modes based
   on configuration flags. Either networking or storage functions
   can be disabled using the configuration, allowing the PVC system
   to be used only for hypervisor management if required.
2019-03-11 01:47:40 -04:00
f172574d3a Disable debug mode 2018-11-27 22:19:42 -05:00
a270770ec2 Add debug mode and fix bug 2018-11-27 22:15:19 -05:00
4eaf3f7de3 Correct bug in write locking 2018-11-27 21:30:30 -05:00
0c7705e70f Fix missing variable 2018-11-27 21:26:12 -05:00
b8a5073a35 Move OSD handling to CephInstance file 2018-11-23 20:05:07 -05:00
52a9a0e075 Improve fence locking; use consistent ZK lock names 2018-11-20 21:21:23 -05:00
6add44936a Clean up some commented code 2018-11-20 21:07:31 -05:00
8737124b36 Add cluster bridge interface 2018-11-18 18:31:02 -05:00
37a0432281 Add cluster bridge on startup 2018-11-18 17:58:06 -05:00
a421bde679 Fix up a few more bugs 2018-11-18 17:29:35 -05:00
1f58d61cb0 Rewrite DNSAggregatorInstance to handle DNS well
Trying to directly AXFR from dnsmasq is a mess, since their zone is
barely compliant with spec, it doesn't support notifies, and it is
generally really messy.

This implements an advanced "AXFR parser" system, which looks at the
results of an AXFR from the local dnsmasq instances per-network, and
updates the real replicated MariaDB pdns backend cluster with the
changed data. This allows a sensible, transferable zone with its own
SOA that is dynamically reconfigured as hosts come and go from the
dnsmasq zone.
2018-11-18 16:45:52 -05:00
b1d0b6e62f Fix up the remaining DHCPv6 setup 2018-11-18 00:55:34 -05:00
4c1e1b4622 Make everything work with dual-stack 2018-11-14 00:26:52 -05:00
a2f4102cb5 Add crush weight and reweight output 2018-11-01 23:17:38 -04:00
9fcce4b09a Support setting a CRUSH weight on new OSDs 2018-11-01 23:03:27 -04:00
2ea8a14ba4 Support OSD out/in and commands 2018-11-01 22:08:11 -04:00
99fcb21e3b Support adding and removing Ceph pools 2018-10-31 23:38:17 -04:00
3e4a6086d5 Finish up Ceph OSD removal, add locking to commands 2018-10-30 22:41:44 -04:00
89a3e0c7ee Rename some entries for consistency 2018-10-30 09:17:41 -04:00
bfbe9188ce Finish setup of Ceph OSD addition and basic management 2018-10-29 17:51:25 -04:00
939532c293 Show ceph health status in keepalive message 2018-10-27 18:24:27 -04:00
4422eb8941 Write Ceph status data to ZK 2018-10-27 18:04:55 -04:00
0c67812fc2 Fix shutdown secondary bug 2018-10-27 16:33:29 -04:00
d727f91c06 Fix typo 2018-10-25 23:38:49 -04:00
3e2a6b8e80 Better handle termination; remove cluster info from keepalive printout 2018-10-25 22:21:40 -04:00
fd27d3f544 Add and remove dnsaggregator nets on primary change 2018-10-25 22:09:32 -04:00
2cdd98d0f1 I do have to restart Kazoo during the SUSPENDED fail 2018-10-22 23:11:04 -04:00