This reverts commit 65d14ccd92.
This was actually a bad idea. For inexplicable reasons, running these
Ceph commands manually (not even via Python, but in a normal shell)
takes 7 * two orders of magnitude longer than running them with the
Rados module, so long in fact that some basic commands like "ceph
health" would sometimes take longer than the 1 second timeout to
complete. The Rados commands would however take about 1ms instead.
Despite the occasional issues when monitors drop out, the Rados module
is clearly far superior to the shell commands for any moderately-loaded
Ceph cluster. We can look into solving timeouts another way (perhaps
with Processes instead of Threads) at a later time.
Rados module "ceph health":
b'{"checks":{},"status":"HEALTH_OK"}'
0.001204 (s)
b'{"checks":{},"status":"HEALTH_OK"}'
0.001258 (s)
Command "ceph health":
joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
real 0m0.772s
user 0m0.707s
sys 0m0.046s
joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
real 0m0.796s
user 0m0.728s
sys 0m0.054s
Using the Rados module was very problematic, specifically because it had
no sensible timeout parameters and thus would hang for many seconds.
This has poor implications since it blocks further keepalives.
Instead, remove the Rados usage entirely and go back completely to using
manual OS commands to gather this information. While this may cause PID
exhaustion more quickly it's worthwhile to avoid failure scenarios when
Ceph stats time out.
Closes#137
Also fixes up the Debian packaging such that this works how I would
want, with proper module installation while leaving everything else
untouched. Finally implements automatic installation and removal of the
BASH completion for the PVC command.
Gevent was completely failure. The API would block during large file
uploads with no obvious solutions beyond "use gunicorn", which is not
suited to this. I originally had this working with the Flask "debug"
server, so just move to using that all the time. SSL is added using a
custom context with the OpenSSL library, so include that as a
dependency.
Add management of the pvcprov database with SQLAlchemy, to allow
seamless management of the database. Add automatic tasks to the postinst
of the API to execute these migrations.
Rename "pvcd" to "pvcnoded", and "pvc-api" to "pvcapid" so names for the
daemons are fully consistent. Update the names of the configuration
files as well to match this new formatting.
References #79
Implements a "maintenance mode" for PVC clusters. For now, the only
thing this mode does is disable node fencing while the state is true.
This allows the administrator to tell PVC that network connectivity,
etc. might be interrupted and to avoid fencing nodes.
Closes#70
Add a systemd service to manage node flush/unflush, useful during
system startup and shutdown to avoid requiring administrator
intervention for this to occur. This is optional and the service is
not enabled by default, and the postinst script informs the
administrator of this.
Also adds a systemd target to collect the two service units together
and provide an easy way to flush+shutdown or startup+unflush the
entire PVC system.
Closes#28
MariaDB+Galera was terribly unstable, with the cluster failing to
start or dying randomly, and generally seemed incredibly unsuitable
for an HA solution. This commit switches the DNS aggregator SQL
backend to PostgreSQL, implemented via Patroni HA.
It also manages the Patroni state, forcing the primary instance to
follow the PVC coordinator, such that the active DNS Aggregator
instance is always able to communicate read+write with the local
system.
This required some logic changes to how the DNS Aggregator worked,
specifically ensuring that database changes aren't attempted while
the instance isn't actively running - to be honest this was a bug
anyways that had just never been noticed.
Closes#34
1. Move to a YAML-based configuration format instead of the original
INI-based configuration to facilitate better organization and
readability.
2. Modify the daemon to be able to operate in several modes based
on configuration flags. Either networking or storage functions
can be disabled using the configuration, allowing the PVC system
to be used only for hypervisor management if required.