Commit Graph

55 Commits

Author SHA1 Message Date
Joshua Boniface 102c3c3106 Port all Celery worker functions to discrete pkg
Moves all tasks run by the Celery worker into a discrete package/module
for easier installation. Also adjusts several parameters throughout to
accomplish this.
2023-11-30 02:24:54 -05:00
Joshua Boniface 41f4e4fb2f Split health monitoring into discrete daemon/pkg 2023-11-29 21:21:51 -05:00
Joshua Boniface bd811408f9 Remove "Python 3" from package descriptions 2023-11-29 15:12:09 -05:00
Joshua Boniface 42ed6f6420 Remove redis as a dependency 2023-11-05 18:23:34 -05:00
Joshua Boniface 3dc1f57de2 Revert "Switch to ZK+PG over Redis for Celery queue"
This reverts commit 54215bab6c.
2023-11-05 17:10:46 -05:00
Joshua Boniface 54215bab6c Switch to ZK+PG over Redis for Celery queue
Redis did not provide a distributed solution for the worker, which
precluded several important planned functions. So instead, move to using
Zookeeper + PostgreSQL as the broker and result backend respectively.

Should be a seamless drop-in change but for future uses requires the
database host to be the primary coordinator IP rather than localhost, so
that writes can occur to the database from non-primary hosts.
2023-11-04 12:46:34 -04:00
Joshua Boniface 9ba7aa5b08 [Bookworm] Remove obsolete package 2023-08-31 14:13:05 -04:00
Joshua Boniface 32613ff119 Remove obsolete Suggests lines from control 2021-07-20 00:35:21 -04:00
Joshua Boniface cfeba50b17 Revert "Return to all command-based Ceph gathering"
This reverts commit 65d14ccd92.

This was actually a bad idea. For inexplicable reasons, running these
Ceph commands manually (not even via Python, but in a normal shell)
takes 7 * two orders of magnitude longer than running them with the
Rados module, so long in fact that some basic commands like "ceph
health" would sometimes take longer than the 1 second timeout to
complete. The Rados commands would however take about 1ms instead.

Despite the occasional issues when monitors drop out, the Rados module
is clearly far superior to the shell commands for any moderately-loaded
Ceph cluster. We can look into solving timeouts another way (perhaps
with Processes instead of Threads) at a later time.

Rados module "ceph health":
    b'{"checks":{},"status":"HEALTH_OK"}'
    0.001204 (s)
    b'{"checks":{},"status":"HEALTH_OK"}'
    0.001258 (s)
Command "ceph health":
    joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
    real    0m0.772s
    user    0m0.707s
    sys     0m0.046s
    joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
    real    0m0.796s
    user    0m0.728s
    sys     0m0.054s
2021-07-10 03:47:45 -04:00
Joshua Boniface 65d14ccd92 Return to all command-based Ceph gathering
Using the Rados module was very problematic, specifically because it had
no sensible timeout parameters and thus would hang for many seconds.
This has poor implications since it blocks further keepalives.

Instead, remove the Rados usage entirely and go back completely to using
manual OS commands to gather this information. While this may cause PID
exhaustion more quickly it's worthwhile to avoid failure scenarios when
Ceph stats time out.

Closes #137
2021-07-06 11:30:45 -04:00
Joshua Boniface d2c0d868c4 Add gevent to node daemon
Required for the Metadata API instance.
2020-10-27 02:42:49 -04:00
Joshua Boniface fbbdb209c3 Remove Python OpenSSL dependency
Not actually required for the SSL configuration.
2020-10-26 02:02:15 -04:00
Joshua Boniface f85c2c2a75 Remove PyWSGI and move to Flask server
Gevent was completely failure. The API would block during large file
uploads with no obvious solutions beyond "use gunicorn", which is not
suited to this. I originally had this working with the Flask "debug"
server, so just move to using that all the time. SSL is added using a
custom context with the OpenSSL library, so include that as a
dependency.
2020-10-26 01:58:43 -04:00
Joshua Boniface 4fbec63bf4 Add missing dependency for CLI 2020-08-27 13:14:46 -04:00
Joshua Boniface 887e14a4e2 Add storage benchmarking to API 2020-08-25 01:57:21 -04:00
Joshua Boniface 598b2025e8 Use Rados and add Ceph entries to pvcnoded.yaml 2020-06-06 21:12:51 -04:00
Joshua Boniface 4417bd374b Add Python requests toolbelt to CLI deps 2020-02-20 23:27:07 -05:00
Joshua Boniface 560cb609ba Add database management with SQLAlchemy
Add management of the pvcprov database with SQLAlchemy, to allow
seamless management of the database. Add automatic tasks to the postinst
of the API to execute these migrations.
2020-02-15 22:51:27 -05:00
Joshua Boniface bd8536d9d1 Add OVA upload to API (initial)
Initial, very barebones OVA parsing and image creation.

References #71
2020-02-15 02:10:14 -05:00
Joshua Boniface 83704d8677 Adjust package descriptions
References #79
2020-02-08 19:01:01 -05:00
Joshua Boniface 97e318a2ca Align naming of Debian packages
Rename pvc-daemon to pvc-daemon-node and pvc-api to pvc-daemon-api.

Closes #79
2020-02-08 18:58:56 -05:00
Joshua Boniface 4505b239eb Rename API and common Debian packages
Closes #79
2020-02-08 18:50:38 -05:00
Joshua Boniface b6474198a4 Implement cluster maintenance mode
Implements a "maintenance mode" for PVC clusters. For now, the only
thing this mode does is disable node fencing while the state is true.
This allows the administrator to tell PVC that network connectivity,
etc. might be interrupted and to avoid fencing nodes.

Closes #70
2020-01-09 10:53:27 -05:00
Joshua Boniface f4ef08df49 Add lxml dependency for pretty parsing of VM XML 2019-12-29 16:33:50 -05:00
Joshua Boniface f5436ed8a9 Change dependencies for CLI client 2019-12-29 16:33:50 -05:00
Joshua Boniface e4c96ee43d Add flask-restful dependency 2019-12-24 10:48:15 -05:00
Joshua Boniface eecc07b731 Depend daemons on systemd
Numerous parts of PVC call systemctl commands or otherwise require a
functioning systemd-based system. Make the dependencies explicitly
reflect this.
2019-12-19 19:04:25 -05:00
Joshua Boniface 355e16e23a Add missing dependencies 2019-12-18 11:56:22 -05:00
Joshua Boniface 2b5c134970 Add missing distutils dep 2019-12-15 13:53:22 -05:00
Joshua Boniface b3e21a5bf8 Integrate metadata API into node daemon 2019-12-14 16:41:01 -05:00
Joshua Boniface 0727a7f6ed Move all provisioner API functionality into main 2019-12-14 14:12:55 -05:00
Joshua Boniface 57e8fba602 Add provisioner to Debian packages 2019-12-09 10:40:27 -05:00
Joshua Boniface c638bdeaee Add configuration file, authentication, pywsgi 2019-07-06 02:04:26 -04:00
Joshua Boniface a480048d36 Add flask dependency to API client 2019-07-05 23:24:27 -04:00
Joshua Boniface 0a96e26bc6 Clean up Debian control and add API package 2019-07-05 22:22:28 -04:00
Joshua Boniface d59280d829 Update dependencies for Postgres 2019-05-22 21:57:06 -04:00
Joshua Boniface 595cf1782c Switch DNS aggregator to PostgreSQL
MariaDB+Galera was terribly unstable, with the cluster failing to
start or dying randomly, and generally seemed incredibly unsuitable
for an HA solution. This commit switches the DNS aggregator SQL
backend to PostgreSQL, implemented via Patroni HA.

It also manages the Patroni state, forcing the primary instance to
follow the PVC coordinator, such that the active DNS Aggregator
instance is always able to communicate read+write with the local
system.

This required some logic changes to how the DNS Aggregator worked,
specifically ensuring that database changes aren't attempted while
the instance isn't actively running - to be honest this was a bug
anyways that had just never been noticed.

Closes #34
2019-05-21 01:07:41 -04:00
Joshua Boniface 2459c3e475 Add dependency for vlan support 2019-03-15 21:15:17 -04:00
Joshua Boniface fb3cf827a2 Add further client deps 2019-03-12 23:10:52 -04:00
Joshua Boniface 5be1cdc40a Support YAML in the client and update configfile 2019-03-12 22:55:29 -04:00
Joshua Boniface ebd28ecef0 Add YAML to server dependencies 2019-03-12 22:54:42 -04:00
Joshua Boniface 1f58d61cb0 Rewrite DNSAggregatorInstance to handle DNS well
Trying to directly AXFR from dnsmasq is a mess, since their zone is
barely compliant with spec, it doesn't support notifies, and it is
generally really messy.

This implements an advanced "AXFR parser" system, which looks at the
results of an AXFR from the local dnsmasq instances per-network, and
updates the real replicated MariaDB pdns backend cluster with the
changed data. This allows a sensible, transferable zone with its own
SOA that is dynamically reconfigured as hosts come and go from the
dnsmasq zone.
2018-11-18 16:45:52 -05:00
Joshua Boniface 8dca8c1cd2 Add paramiko to dependencies 2018-10-27 15:54:05 -04:00
Joshua Boniface 605889f59f Add some additional dependencies 2018-10-14 18:36:02 -04:00
Joshua Boniface f198f62563 Massive rejigger into single daemon
Completely restructure the daemon code to move the 4 discrete daemons
into a single daemon that can be run on every hypervisor. Introduce the
idea of a static list of "coordinator" nodes which are configured at
install time to run Zookeeper and FRR in router mode, and which are
allowed to take on client network management duties (gateway, DHCP, DNS,
etc.) while also allowing them to run VMs (i.e. no dedicated "router"
nodes required).
2018-10-14 02:40:54 -04:00
Joshua Boniface 0f9637cb69 Make the IP failover work including threaded background os commands 2018-09-24 04:08:35 -04:00
Joshua Boniface 4ba2eea4ed Add router daemon 2018-09-23 15:26:41 -04:00
Joshua Boniface a111eff9cd Update suggested client packages 2018-09-23 01:17:55 -04:00
Joshua Boniface 513de96626 Major refactor to separate out and standardize libraries 2018-09-20 03:43:34 -04:00
Joshua Boniface c3d37701e7 Add network daemon to manage VXLAN VNIs on hypervisors 2018-09-14 12:07:41 -04:00