Commit Graph

257 Commits

Author SHA1 Message Date
Joshua Boniface 83b8ce7b62 Bump version to 0.9.69 (nice) 2023-08-29 22:02:13 -04:00
Joshua Boniface 83d475bd15 Bump version to 0.9.68 2023-08-27 20:59:23 -04:00
Joshua Boniface 705ec802a3 Bump version to 0.9.67 2023-08-27 14:47:20 -04:00
Joshua Boniface 0b90f37518 Bump version to 0.9.66 2023-08-27 11:41:22 -04:00
Joshua Boniface 1e083d7652 Bump version to 0.9.65 2023-08-23 01:56:57 -04:00
Joshua Boniface 075dbe7cc9 Bump version to 0.9.64 2023-08-18 12:34:27 -04:00
Joshua Boniface 3a90fda109 Bump version to 0.9.63 2023-04-28 14:47:04 -04:00
Joshua Boniface 45ad3b9a17 Bump version to 0.9.62 2023-02-22 18:13:45 -05:00
Joshua Boniface 3c742a827b Initial implementation of monitoring plugin system 2023-02-13 12:06:26 -05:00
Joshua Boniface aeb238f43c Bump version to 0.9.61 2023-02-08 10:08:05 -05:00
Joshua Boniface a49510ecc8 Bump version to 0.9.60 2022-12-06 15:42:55 -05:00
Joshua Boniface 92feeefd26 Bump version to 0.9.59 2022-11-15 15:50:15 -05:00
Joshua Boniface 095bcb2373 Bump version to 0.9.58 2022-11-07 12:27:48 -05:00
Joshua Boniface d65f512897 Bump version to 0.9.57 2022-11-06 01:39:50 -04:00
Joshua Boniface c3bc55eff8 Bump version to 0.9.56 2022-10-27 14:21:04 -04:00
Joshua Boniface 726d0a562b Update copyright header year 2022-10-06 11:55:27 -04:00
Joshua Boniface f1df1cfe93 Bump version to 0.9.55 2022-10-04 13:21:40 -04:00
Joshua Boniface 239c392892 Bump version to 0.9.54 2022-08-23 11:01:05 -04:00
Joshua Boniface 9b499b9f48 Bump version to 0.9.53 2022-08-12 17:47:11 -04:00
Joshua Boniface 2a21d48128 Bump version to 0.9.52 2022-08-12 11:09:25 -04:00
Joshua Boniface 645b525ad7 Bump version to 0.9.51 2022-07-25 23:25:41 -04:00
Joshua Boniface 932b3c55a3 Bump version to 0.9.50 2022-07-06 16:01:14 -04:00
Joshua Boniface 51ad2058ed Bump version to 0.9.49 2022-05-06 15:49:39 -04:00
Joshua Boniface 464f0e0356 Store additional OSD information in ZK
Ensures that information like the FSIDs and the OSD LVM volume are
stored in Zookeeper at creation time and updated at daemon start time
(to ensure the data is populated at least once, or if the /dev/sdX
path changes).

This will allow safer operation of OSD removals and the potential
implementation of re-activation after node replacements.
2022-05-02 12:11:39 -04:00
Joshua Boniface 5807351405 Bump version to 0.9.48 2022-04-29 15:03:52 -04:00
Joshua Boniface ea709f573f Bump version to 0.9.47 2021-12-28 22:03:08 -05:00
Joshua Boniface 58d57d7037 Bump version to 0.9.46 2021-12-28 15:02:14 -05:00
Joshua Boniface 00d2c67c41 Allow single-node clusters to restart and timeout
Prevents a daemon from waiting forever to terminate if it is primary,
and avoids this entirely if there is only a single node in the cluster.
2021-12-28 03:06:03 -05:00
Joshua Boniface f164d898c1 Bump version to 0.9.45 2021-11-25 09:34:20 -05:00
Joshua Boniface 817dffcf30 Bump version to 0.9.44 2021-11-11 16:20:38 -05:00
Joshua Boniface 6e9fcd38a3 Bump version to 0.9.43 2021-11-08 02:29:17 -05:00
Joshua Boniface c41664d2da Reformat code with Black code formatter
Unify the code style along PEP and Black principles using the tool.
2021-11-06 03:02:43 -04:00
Joshua Boniface e88147db4a Bump version to 0.9.42 2021-10-12 15:25:42 -04:00
Joshua Boniface f13cc04b89 Bump version to 0.9.41 2021-10-09 19:39:21 -04:00
Joshua Boniface 95e01f38d5 Adjust log type of object setup message 2021-10-09 19:23:12 -04:00
Joshua Boniface c27359c4bf Bump version to 0.9.40 2021-10-07 14:42:04 -04:00
Joshua Boniface 46078932c3 Correct bad stop_keepalive_timer call 2021-10-07 14:41:12 -04:00
Joshua Boniface bdb9db8375 Bump version to 0.9.39 2021-10-07 11:52:38 -04:00
Joshua Boniface da9248cfa2 Bump version to 0.9.38 2021-10-03 22:32:41 -04:00
Joshua Boniface 23977b04fc Bump version to 0.9.37 2021-09-30 02:08:14 -04:00
Joshua Boniface 1de069298c Fix missing character in log message 2021-09-27 00:49:43 -04:00
Joshua Boniface d0f3e9e285 Bump version to 0.9.36 2021-09-23 14:01:38 -04:00
Joshua Boniface df277edf1c Move console watcher stop try up
Could cause an exception if d_domain is not defined yet.
2021-09-22 16:02:04 -04:00
Joshua Boniface 772807deb3 Bump version to 0.9.35 2021-09-13 02:20:46 -04:00
Joshua Boniface be954c1625 Don't crash cleanup if no this_node 2021-08-29 03:52:18 -04:00
Joshua Boniface 694b8e85a0 Bump version to 0.9.34 2021-08-24 16:15:25 -04:00
Joshua Boniface a18cef5f25 Bump version to 0.9.33 2021-08-21 03:28:48 -04:00
Joshua Boniface afb0359c20 Refactor pvcnoded to reduce Daemon.py size
This branch commit refactors the pvcnoded component to better adhere to
good programming practices. The previous Daemon.py was a massive file
which contained almost 2000 lines of direct, root-level code which was
directly imported. Not only was this poor practice, but this resulted
in a nigh-unmaintainable file which was hard even for me to understand.

This refactoring splits a large section of the code from Daemon.py into
separate small modules and functions in the `util/` directory. This will
hopefully make most of the functionality easy to find and modify without
having to dig through a single large file.

Further the existing subcomponents have been moved to the `objects/`
directory which clearly separates them.

Finally, the Daemon.py code has mostly been moved into a function,
`entrypoint()`, which is then called from the `pvcnoded.py` stub.

An additional item is that most format strings have been replaced by
f-strings to make use of the Python 3.6 features in Daemon.py and the
utility files.
2021-08-21 03:14:22 -04:00
Joshua Boniface afdf254297 Bump version to 0.9.32 2021-08-19 12:37:58 -04:00
Joshua Boniface 42e776fac1 Properly handle exceptions getting VM stats 2021-08-19 12:36:31 -04:00
Joshua Boniface 7ecc6a2635 Bump version to 0.9.31 2021-07-30 12:08:12 -04:00
Joshua Boniface 2a99a27feb Bump version to 0.9.30 2021-07-20 00:01:45 -04:00
Joshua Boniface fa1d93e933 Bump version to 0.9.29 2021-07-19 16:55:41 -04:00
Joshua Boniface 6ead21a308 Handle cleanup from a failure properly 2021-07-19 12:39:13 -04:00
Joshua Boniface b7c8c2ee3d Fix handling of this_node and d_domain in cleanup 2021-07-19 12:36:35 -04:00
Joshua Boniface d48f58930b Use harder exits and add cleanup termination 2021-07-19 12:27:16 -04:00
Joshua Boniface 7c36388c8f Add post-networking delay and adjust daemon delay 2021-07-19 12:23:45 -04:00
Joshua Boniface 71e4d0b32a Bump version to 0.9.28 2021-07-19 09:29:34 -04:00
Joshua Boniface 15d92c483f Bump version to 0.9.27 2021-07-19 00:03:40 -04:00
Joshua Boniface 602093029c Bump version to 0.9.26 2021-07-18 20:49:52 -04:00
Joshua Boniface b770e15a91 Fix final termination of logger
We need to do a bit more finagling with the logger on termination to
ensure that all messages are written and the queue drained before
actually terminating.
2021-07-18 19:53:00 -04:00
Joshua Boniface e23a65128a Remove del of logger item 2021-07-18 19:03:47 -04:00
Joshua Boniface 3a2478ee0c Cleanly terminate logger on cleanup 2021-07-18 18:57:44 -04:00
Joshua Boniface 323c7c41ae Implement node logging into Zookeeper
Adds the ability to send node daemon logs to Zookeeper to facilitate a
command like "pvc node log", similar to "pvc vm log". Each node stores
its logs in a separate tree under "/logs" which can then be combined or
queried. By default, set by config, only 2000 lines are kept.
2021-07-18 17:11:43 -04:00
Joshua Boniface cd1db3d587 Ensure node name is part of confing 2021-07-18 16:38:58 -04:00
Joshua Boniface 2e9f6ac201 Bump version to 0.9.25 2021-07-11 23:19:09 -04:00
Joshua Boniface f09849bedf Don't overwrite shutdown state on termination
Just a minor quibble and not really impactful.
2021-07-11 23:18:14 -04:00
Joshua Boniface c76149141f Only log ZK connections when persistent
Prevents spam in the API logs.
2021-07-10 23:35:49 -04:00
Joshua Boniface f00c4d07f4 Add date output to keepalive
Helps track when there is a log follow in "-o cat" mode.
2021-07-10 23:24:59 -04:00
Joshua Boniface 20b66c10e1 Move two more commands to Rados library 2021-07-10 17:28:42 -04:00
Joshua Boniface cfeba50b17 Revert "Return to all command-based Ceph gathering"
This reverts commit 65d14ccd92.

This was actually a bad idea. For inexplicable reasons, running these
Ceph commands manually (not even via Python, but in a normal shell)
takes 7 * two orders of magnitude longer than running them with the
Rados module, so long in fact that some basic commands like "ceph
health" would sometimes take longer than the 1 second timeout to
complete. The Rados commands would however take about 1ms instead.

Despite the occasional issues when monitors drop out, the Rados module
is clearly far superior to the shell commands for any moderately-loaded
Ceph cluster. We can look into solving timeouts another way (perhaps
with Processes instead of Threads) at a later time.

Rados module "ceph health":
    b'{"checks":{},"status":"HEALTH_OK"}'
    0.001204 (s)
    b'{"checks":{},"status":"HEALTH_OK"}'
    0.001258 (s)
Command "ceph health":
    joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
    real    0m0.772s
    user    0m0.707s
    sys     0m0.046s
    joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
    real    0m0.796s
    user    0m0.728s
    sys     0m0.054s
2021-07-10 03:47:45 -04:00
Joshua Boniface 551bae2518 Bump version to 0.9.24 2021-07-09 15:58:36 -04:00
Joshua Boniface 2b5dc286ab Correct failure to get ceph_health data 2021-07-09 13:10:28 -04:00
Joshua Boniface 330cf14638 Remove return statements in keepalive collectors
These seem to bork the keepalive timer process, so just remove them and
let it continue to press on.
2021-07-09 13:04:17 -04:00
Joshua Boniface 65d14ccd92 Return to all command-based Ceph gathering
Using the Rados module was very problematic, specifically because it had
no sensible timeout parameters and thus would hang for many seconds.
This has poor implications since it blocks further keepalives.

Instead, remove the Rados usage entirely and go back completely to using
manual OS commands to gather this information. While this may cause PID
exhaustion more quickly it's worthwhile to avoid failure scenarios when
Ceph stats time out.

Closes #137
2021-07-06 11:30:45 -04:00
Joshua Boniface 7082982a33 Bump version to 0.9.23 2021-07-05 23:40:32 -04:00
Joshua Boniface 5b6ef71909 Ensure daemon mode is updated on startup
Fixes the side effect of the previous bug during deploys of 0.9.22.
2021-07-05 23:39:23 -04:00
Joshua Boniface 37cd278bc2 Bump version to 0.9.22 2021-07-05 14:18:51 -04:00
Joshua Boniface a69105569f Add node PVC version data to Node information
Allows API client to see the currently-active version of the node
daemon.
2021-07-05 09:57:38 -04:00
Joshua Boniface f12de6727d Adjust logo slightly and add debug state 2021-07-02 02:32:08 -04:00
Joshua Boniface e94f5354e6 Update startup messages with new ASCII logo 2021-07-02 02:21:30 -04:00
Joshua Boniface c51023ba81 Add profiler to keepalive function 2021-07-02 01:55:15 -04:00
Joshua Boniface 39e82ee426 Cast base schema version to int
Or all our comparisons will fail later and nodes can't start.
2021-06-30 09:40:33 -04:00
Joshua Boniface fe0a1d582a Bump version to 0.9.21 2021-06-29 19:21:31 -04:00
Joshua Boniface e623909a43 Store PHY MAC for VFs and restore after free 2021-06-22 00:56:47 -04:00
Joshua Boniface eeb83da97d Add support for SR-IOV NICs to VMs 2021-06-21 23:18:22 -04:00
Joshua Boniface 64d1a37b3c Add PCIe device paths to SR-IOV VF information
This will be used when adding VM network interfaces of type hostdev.
2021-06-21 21:08:46 -04:00
Joshua Boniface 13cc0f986f Implement SR-IOV VF config set
Also fixes some random bugs, adds proper interface sorting, and assorted
tweaks.
2021-06-21 18:40:11 -04:00
Joshua Boniface ca11dbf491 Sort the list of VFs for easier parsing 2021-06-21 01:40:05 -04:00
Joshua Boniface e8bd1bf2c4 Ensure used/used_by are set on creation 2021-06-21 01:25:38 -04:00
Joshua Boniface 57b041dc62 Ensure default for vLAN and QOS is 0 not empty 2021-06-17 01:54:37 -04:00
Joshua Boniface 5607a6bb62 Avoid overwriting VF data
Ensures that the configuration of a VF is not overwritten in Zookeeper
on a node restart. The SRIOVVFInstance handlers were modified to start
with None values, so that the DataWatch statements will always trigger
updates to the live system interfaces on daemon startup, thus ensuring
that the config stored in Zookeeper is applied to the system on startup
(mostly relevant after a cold boot or if the API changes them during a
daemon restart).
2021-06-17 01:45:22 -04:00
Joshua Boniface 8f1af2a642 Ignore hostdev interfaces in VM net stat gathering
Prevents errors if a SR-IOV hostdev interface is configured until this
is more defined.
2021-06-17 01:33:11 -04:00
Joshua Boniface e7b6a3eac1 Implement SR-IOV PF and VF instances
Adds support for the node daemon managing SR-IOV PF and VF instances.

PFs are added to Zookeeper automatically based on the config at startup
during network configuration, and are otherwise completely static. PFs
are automatically removed from Zookeeper, along with all coresponding
VFs, should the PF phy device be removed from the configuration.

VFs are configured based on the (autocreated) VFs of each PF device,
added to Zookeeper, and then a new class instance, SRIOVVFInstance, is
used to watch them for configuration changes. This will enable the
runtime management of VF settings by the API. The set of keys ensures
that both configuration and details of the NIC can be tracked.

Most keys are self-explanatory, especially for PFs and the basic keys
for VFs. The configuration tree is also self-explanatory, being based
entirely on the options available in the `ip link set {dev} vf` command.

Two additional keys are also present: `used` and `used_by`, which will
be able to track the (boolean) state of usage, as well as the VM that
uses a given VIF. Since the VM side implementation will support both
macvtap and direct "hostdev" assignments, this will ensure that this
state can be tracked on both the VF and the VM side.
2021-06-17 01:33:03 -04:00
Joshua Boniface 0ad6d55dff Add initial SR-IOV support to node daemon
Adds configuration values for enabled flag and SR-IOV devices to the
configuration and sets up the initial SR-IOV configuration on daemon
startup (inserting the module, configuring the VF count, etc.).
2021-06-15 22:56:09 -04:00
Joshua Boniface 953e46055a Fix issue with loading None version schema 2021-06-14 21:09:55 -04:00
Joshua Boniface d2bcfe5cf7 Bump version to 0.9.20 2021-06-14 18:06:27 -04:00
Joshua Boniface 0a9c0c1ccb Use a nicer reload method on hot schema update
Instead of exiting and trusting systemd to restart us, instead leverage
the os.execv() call to reload the process in the current PID context.

Also improves the log messages so it's very clear what's going on.
2021-06-14 17:10:21 -04:00
Joshua Boniface e34a7d4d2a Handle hot reloads properly
A hot reload isn't possible due to DataWatch and ChildrenWatch
constructs, so we instead need to terminate the daemon to "apply" the
schema update. Thus we use exit code 150 (Application defined in LSB)
and reorder some of the elements of the schema validation to ensure
things happen in the right order.
2021-06-14 12:52:43 -04:00
Joshua Boniface b694945010 Fix incorrect name bug 2021-06-10 01:11:14 -04:00