Commit Graph

578 Commits

Author SHA1 Message Date
Joshua Boniface c8134d3a1c Fix several bugs in fence handling
1. Output from ipmitool was not being stripped, and stray newlines were
throwing off the comparisons. Fixes this.

2. Several stages were lacking meaningful messages. Adds these in so the
output is more clear about what is going on.

3. Reduce the sleep time after a fence to just 1x the
keepalive_interval, rather than 2x, because this seemed like excessively
long even for slow IPMI interfaces, especially since we're checking the
power state now anyways.

4. Set the node daemon state to an explicit 'fenced' state after a
successful fence to indicate to users that the node was indeed fenced
successfully and not still 'dead'.
2021-09-26 20:07:30 -04:00
Joshua Boniface 9f41373324 Ensure pvc-flush is after network-online 2021-09-26 17:40:42 -04:00
Joshua Boniface 8e62d5b30b Fix typo in log message 2021-09-26 03:35:30 -04:00
Joshua Boniface 7df5b8e52e Fix typo in sgdisk command options 2021-09-26 00:59:05 -04:00
Joshua Boniface 6f96219023 Use re.search instead of re.match
Required since we're not matching the start of the string.
2021-09-26 00:55:29 -04:00
Joshua Boniface 51967e164b Raise basic exceptions in CephInstance
Avoids no exception to reraise errors on failures.
2021-09-26 00:50:10 -04:00
Joshua Boniface 7a3a44d47c Fix OSD creation for partition paths and fix gdisk
The previous implementation did not work with /dev/nvme devices or any
/dev/disk/by-* devices due to some logical failures in the partition
naming scheme, so fix these, and be explicit about what is supported in
the PVC CLI command output.

The 'echo | gdisk' implementation of partition creation also did not
work due to limitations of subprocess.run; instead, use sgdisk which
allows these commands to be written out explicitly and is included in
the same package as gdisk.
2021-09-26 00:12:28 -04:00
Joshua Boniface 44491dd988 Add support for configurable OSD DB ratios
The default of 0.05 (5%) is likely ideal in the initial implementation,
but allow this to be set explicitly for maximum flexibility in
space-constrained or performance-critical use-cases.
2021-09-24 01:06:39 -04:00
Joshua Boniface eba142f470 Bump version to 0.9.36 2021-09-23 14:01:38 -04:00
Joshua Boniface 6cef68d157 Add separate OSD DB device support
Adds in three parts:

1. Create an API endpoint to create OSD DB volume groups on a device.
Passed through to the node via the same command pipeline as
creating/removing OSDs, and creates a volume group with a fixed name
(osd-db).

2. Adds API support for specifying whether or not to use this DB volume
group when creating a new OSD via the "ext_db" flag. Naming and sizing
is fixed for simplicity and based on Ceph recommendations (5% of OSD
size). The Zookeeper schema tracks the block device to use during
removal.

3. Adds CLI support for the new and modified API endpoints, as well as
displaying the block device and DB block device in the OSD list.

While I debated supporting adding a DB device to an existing OSD, in
practice this ended up being a very complex operation involving stopping
the OSD and setting some options, so this is not supported; this can be
specified during OSD creation only.

Closes #142
2021-09-23 13:59:49 -04:00
Joshua Boniface e8caf3369e Move console watcher stop try up
Could cause an exception if d_domain is not defined yet.
2021-09-22 16:02:04 -04:00
Joshua Boniface 3e3776a25b Bump version to 0.9.35 2021-09-13 02:20:46 -04:00
Joshua Boniface 1b6d10e03a Handle VM disk/network stats gathering exceptions 2021-09-12 19:41:07 -04:00
Joshua Boniface 73c96d1e93 Add VM device hot attach/detach support
Adds a new API endpoint to support hot attach/detach of devices, and the
corresponding client-side logic to use this endpoint when doing VM
network/storage add/remove actions.

The live attach is now the default behaviour for these types of
additions and removals, and can be disabled if needed.

Closes #141
2021-09-12 19:33:00 -04:00
Joshua Boniface bc6395c959 Don't crash cleanup if no this_node 2021-08-29 03:52:18 -04:00
Joshua Boniface d582f87472 Change default node object state to flushed 2021-08-29 03:34:08 -04:00
Joshua Boniface e9735113af Bump version to 0.9.34 2021-08-24 16:15:25 -04:00
Joshua Boniface d3392c0282 Fix typo in output message 2021-08-23 00:39:19 -04:00
Joshua Boniface 560c013e95 Bump version to 0.9.33 2021-08-21 03:28:48 -04:00
Joshua Boniface 534c7cd7f0 Refactor pvcnoded to reduce Daemon.py size
This branch commit refactors the pvcnoded component to better adhere to
good programming practices. The previous Daemon.py was a massive file
which contained almost 2000 lines of direct, root-level code which was
directly imported. Not only was this poor practice, but this resulted
in a nigh-unmaintainable file which was hard even for me to understand.

This refactoring splits a large section of the code from Daemon.py into
separate small modules and functions in the `util/` directory. This will
hopefully make most of the functionality easy to find and modify without
having to dig through a single large file.

Further the existing subcomponents have been moved to the `objects/`
directory which clearly separates them.

Finally, the Daemon.py code has mostly been moved into a function,
`entrypoint()`, which is then called from the `pvcnoded.py` stub.

An additional item is that most format strings have been replaced by
f-strings to make use of the Python 3.6 features in Daemon.py and the
utility files.
2021-08-21 03:14:22 -04:00
Joshua Boniface 4014ef7714 Bump version to 0.9.32 2021-08-19 12:37:58 -04:00
Joshua Boniface 180f0445ac Properly handle exceptions getting VM stats 2021-08-19 12:36:31 -04:00
Joshua Boniface 7ecc6a2635 Bump version to 0.9.31 2021-07-30 12:08:12 -04:00
Joshua Boniface 3ab6365a53 Adjust receive output to show proper source 2021-07-22 15:43:08 -04:00
Joshua Boniface 2a99a27feb Bump version to 0.9.30 2021-07-20 00:01:45 -04:00
Joshua Boniface fa1d93e933 Bump version to 0.9.29 2021-07-19 16:55:41 -04:00
Joshua Boniface 6ead21a308 Handle cleanup from a failure properly 2021-07-19 12:39:13 -04:00
Joshua Boniface b7c8c2ee3d Fix handling of this_node and d_domain in cleanup 2021-07-19 12:36:35 -04:00
Joshua Boniface d48f58930b Use harder exits and add cleanup termination 2021-07-19 12:27:16 -04:00
Joshua Boniface 7c36388c8f Add post-networking delay and adjust daemon delay 2021-07-19 12:23:45 -04:00
Joshua Boniface 71e4d0b32a Bump version to 0.9.28 2021-07-19 09:29:34 -04:00
Joshua Boniface 15d92c483f Bump version to 0.9.27 2021-07-19 00:03:40 -04:00
Joshua Boniface 602093029c Bump version to 0.9.26 2021-07-18 20:49:52 -04:00
Joshua Boniface b770e15a91 Fix final termination of logger
We need to do a bit more finagling with the logger on termination to
ensure that all messages are written and the queue drained before
actually terminating.
2021-07-18 19:53:00 -04:00
Joshua Boniface e23a65128a Remove del of logger item 2021-07-18 19:03:47 -04:00
Joshua Boniface 3a2478ee0c Cleanly terminate logger on cleanup 2021-07-18 18:57:44 -04:00
Joshua Boniface 323c7c41ae Implement node logging into Zookeeper
Adds the ability to send node daemon logs to Zookeeper to facilitate a
command like "pvc node log", similar to "pvc vm log". Each node stores
its logs in a separate tree under "/logs" which can then be combined or
queried. By default, set by config, only 2000 lines are kept.
2021-07-18 17:11:43 -04:00
Joshua Boniface cd1db3d587 Ensure node name is part of confing 2021-07-18 16:38:58 -04:00
Joshua Boniface 75fb60b1b4 Add VM list filtering by tag
Uses same method as state or node filtering, rather than altering how
the main LIMIT field works.
2021-07-14 00:59:20 -04:00
Joshua Boniface c6d552ae57 Rework success checks for IPMI fencing
Previously, if the node failed to restart, it was declared a "bad fence"
and no further action would be taken. However, there are some
situations, for instance critical hardware failures, where intelligent
systems will not attempt (or succeed at) starting up the node in such a
case, which would result in dead, known-offline nodes without recovery.

Tweak this behaviour somewhat. The main path of Reboot -> Check On ->
Success + fence-flush is retained, but some additional side-paths are
now defined:

1. We attempt to power "on" the chassis 1 second after the reboot, just
in case it is off and can be recovered. We then wait another 2 seconds
and check the power status (as we did before).

2. If the reboot succeeded, follow this series of choices:

    a. If the chassis is on, the fence succeeded.

    b. If the chassis is off, the fence "succeeded" as well.

    c. If the chassis is in some other state, the fence failed.

3. If the reboot failed, follow this series of choices:

    a. If the chassis is off, the fence itself failed, but we can treat
    it as "succeeded"" since the chassis is in a known-offline state.
    This is the most likely situation when there is a critical hardware
    failure, and the server's IPMI does not allow itself to start back
    up again.

    b. If the chassis is in any other state ("on" or unknown), the fence
    itself failed and we must treat this as a fence failure.

Overall, this should alleviate the aforementioned issue of a critical
failure rendering the node persistently "off" not triggering a
fence-flush and ensure fencing is more robust.
2021-07-13 17:54:41 -04:00
Joshua Boniface 2e9f6ac201 Bump version to 0.9.25 2021-07-11 23:19:09 -04:00
Joshua Boniface f09849bedf Don't overwrite shutdown state on termination
Just a minor quibble and not really impactful.
2021-07-11 23:18:14 -04:00
Joshua Boniface c76149141f Only log ZK connections when persistent
Prevents spam in the API logs.
2021-07-10 23:35:49 -04:00
Joshua Boniface f00c4d07f4 Add date output to keepalive
Helps track when there is a log follow in "-o cat" mode.
2021-07-10 23:24:59 -04:00
Joshua Boniface 20b66c10e1 Move two more commands to Rados library 2021-07-10 17:28:42 -04:00
Joshua Boniface cfeba50b17 Revert "Return to all command-based Ceph gathering"
This reverts commit 65d14ccd92.

This was actually a bad idea. For inexplicable reasons, running these
Ceph commands manually (not even via Python, but in a normal shell)
takes 7 * two orders of magnitude longer than running them with the
Rados module, so long in fact that some basic commands like "ceph
health" would sometimes take longer than the 1 second timeout to
complete. The Rados commands would however take about 1ms instead.

Despite the occasional issues when monitors drop out, the Rados module
is clearly far superior to the shell commands for any moderately-loaded
Ceph cluster. We can look into solving timeouts another way (perhaps
with Processes instead of Threads) at a later time.

Rados module "ceph health":
    b'{"checks":{},"status":"HEALTH_OK"}'
    0.001204 (s)
    b'{"checks":{},"status":"HEALTH_OK"}'
    0.001258 (s)
Command "ceph health":
    joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
    real    0m0.772s
    user    0m0.707s
    sys     0m0.046s
    joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
    real    0m0.796s
    user    0m0.728s
    sys     0m0.054s
2021-07-10 03:47:45 -04:00
Joshua Boniface 551bae2518 Bump version to 0.9.24 2021-07-09 15:58:36 -04:00
Joshua Boniface 2b5dc286ab Correct failure to get ceph_health data 2021-07-09 13:10:28 -04:00
Joshua Boniface 330cf14638 Remove return statements in keepalive collectors
These seem to bork the keepalive timer process, so just remove them and
let it continue to press on.
2021-07-09 13:04:17 -04:00
Joshua Boniface 65d14ccd92 Return to all command-based Ceph gathering
Using the Rados module was very problematic, specifically because it had
no sensible timeout parameters and thus would hang for many seconds.
This has poor implications since it blocks further keepalives.

Instead, remove the Rados usage entirely and go back completely to using
manual OS commands to gather this information. While this may cause PID
exhaustion more quickly it's worthwhile to avoid failure scenarios when
Ceph stats time out.

Closes #137
2021-07-06 11:30:45 -04:00