2560 Commits

Author SHA1 Message Date
584cb95b8d Use consistent language for primary mode
I didn't call it "router" anywhere else, but the state in the list is
called "coordinator" so, call it "coordinator mode".
2022-05-06 15:40:52 -04:00
21bbb0393f Add support for replacing/refreshing OSDs
Adds commands to both replace an OSD disk, and refresh (reimport) an
existing OSD disk on a new node. This handles the cases where an OSD
disk should be replaced (either due to upgrades or failures) or where a
node is rebuilt in-place and an existing OSD must be re-imported to it.

This should avoid the need to do a full remove/add sequence for either
case.

Also cleans up some aspects of OSD removal that are identical between
methods (e.g. using safe-to-destroy and sleeping after stopping) and
fixes a bug if an OSD does not truly exist when the daemon starts up.
2022-05-06 15:32:06 -04:00
d18e009b00 Improve handling of rounded values 2022-05-02 15:29:30 -04:00
1f8f3252a6 Fix bug with initial JSON for stats 2022-05-02 13:28:19 -04:00
b47c9832b7 Refactor OSD removal to use new ZK data
With the OSD LVM information stored in Zookeeper, we can use this to
determine the actual block device to zap rather than relying on runtime
determination and guestimation.
2022-05-02 12:52:22 -04:00
d2757004db Store additional OSD information in ZK
Ensures that information like the FSIDs and the OSD LVM volume are
stored in Zookeeper at creation time and updated at daemon start time
(to ensure the data is populated at least once, or if the /dev/sdX
path changes).

This will allow safer operation of OSD removals and the potential
implementation of re-activation after node replacements.
2022-05-02 12:11:39 -04:00
7323269775 Ensure initial OSD stats is populated
Values are all invalid but this ensures the client won't error out when
trying to show an OSD that has never checked in yet.
2022-04-29 16:50:30 -04:00
85463f9aec Bump version to 0.9.48 2022-04-29 15:03:52 -04:00
19c37c3ed5 Fix bugs with forced removal 2022-04-29 14:03:07 -04:00
7d2ea494e7 Ensure unresponsive OSDs still display in list
It is still useful to see such dead OSDs even if they've never checked
in or have not checked in for quite some time.
2022-04-29 12:11:52 -04:00
cb50eee2a9 Add OSD removal force option
Ensures a removal can continue even in situations where some step(s)
might fail, for instance removing an obsolete OSD from a replaced node.
2022-04-29 11:16:33 -04:00
f3f4eaadf1 Use a singular configured cluster by default
If there is...
  1. No '--cluster' passed, and
  2. No 'local' cluster, and
  3. There is exactly one cluster configured
...then use that cluster by default in the CLI.
2022-01-13 18:36:20 -05:00
313a5d1c7d Bump version to 0.9.47 2021-12-28 22:03:08 -05:00
b6d689b769 Add pool PGs count modification
Allows an administrator to adjust the PG count of a given pool. This can
be used to increase the PGs (for example after adding more OSDs) or
decrease it (to remove OSDs, reduce CPU load, etc.).
2021-12-28 21:53:29 -05:00
a0fccf83f7 Add PGs count to pool list 2021-12-28 21:12:02 -05:00
46896c593e Fix issue if pool stats have not updated yet 2021-12-28 21:03:10 -05:00
02138974fa Add device class tiers to Ceph pools
Allows specifying a particular device class ("tier") for a given pool,
for instance SSD-only or NVMe-only. This is implemented with Crush
rules on the Ceph side, and via an additional new key in the pool
Zookeeper schema which is defaulted to "default".
2021-12-28 20:58:15 -05:00
c3d255be65 Bump version to 0.9.46 2021-12-28 15:02:14 -05:00
45fc8a47a3 Allow single-node clusters to restart and timeout
Prevents a daemon from waiting forever to terminate if it is primary,
and avoids this entirely if there is only a single node in the cluster.
2021-12-28 03:06:03 -05:00
07f2006f68 Fix bug when removing OSDs
Ensure the OSD is down as well as out or purge might fail.
2021-12-28 03:05:34 -05:00
f4c7fdffb8 Handle detect strings as arguments for blockdevs
Allows specifying blockdevs in the OSD and OSD-DB addition commands as
detect strings rather than actual block device paths. This provides
greater flexibility for automation with pvcbootstrapd (which originates
the concept of detect strings) and in general usage as well.
2021-12-28 02:53:02 -05:00
be1b67b8f0 Allow bypassing confirm message for benchmarks 2021-12-23 21:00:42 -05:00
d68f6a945e Add auditing to local syslog from PVC client
This ensures that any client command is logged by the local system.
Helps ensure Accounting for users of the CLI. Currently logs the full
command executed along with the $USER environment variable contents.
2021-12-10 16:17:33 -05:00
c776aba8b3 Standardize fuzzy matching and use fullmatch
Solves two problems:

1. How match fuzziness was used was very inconsistent; make them all the
same, i.e. "if is_fuzzy and limit, apply .* to both sides".

2. Use re.fullmatch instead of re.match to ensure exact matching of the
regex to the value. Without fuzziness, this would sometimes cause
inconsistent behavior, for instance if a limit was non-fuzzy "vm",
expecting to match the actual "vm", but also matching "vm1" too.
2021-12-06 16:35:29 -05:00
2461941421 Remove "and started" from message text
This is not necessarily the case.
2021-11-29 16:42:26 -05:00
68954a79ec Fix bug with cloned image sizes 2021-11-29 14:56:50 -05:00
a2fa6ed450 Fix bugs with legacy benchmark format 2021-11-26 11:42:35 -05:00
02a2f6a27a Bump version to 0.9.45 2021-11-25 09:34:20 -05:00
a75b951605 Ensure echo always has an argument 2021-11-25 09:33:26 -05:00
658e80350f Fix ordering of pvcnoded unit
We want to be after network.target and want network-online.target
2021-11-18 16:56:49 -05:00
3aa20fbaa3 Bump version to 0.9.44 2021-11-11 16:20:38 -05:00
6d101df1ff Add Munin plugin for Ceph utilization 2021-11-08 15:21:09 -05:00
be6a3992c1 Add 0.05s to connection timeout
This is recommended by the Python Requests documentation:

> It’s a good practice to set connect timeouts to slightly larger than a
  multiple of 3, which is the default TCP packet retransmission window.
2021-11-08 03:11:41 -05:00
d76da0f25a Use separate connect and data timeouts
This allows us to keep a very low connect timeout of 3 seconds, but also
ensure that long commands (e.g. --wait or VM disable) can take as long
as the API requires to complete.

Avoids having to explicitly set very long single-instance timeouts for
other functions which would block forever on an unreachable API.
2021-11-08 03:10:09 -05:00
bc722ce9b8 Fix quote in sed for unstable deb build 2021-11-08 02:54:27 -05:00
7890c32c59 Add sudo to deploy-package task 2021-11-08 02:41:10 -05:00
6febcfdd97 Bump version to 0.9.43 2021-11-08 02:29:17 -05:00
11d8ce70cd Fix sed commands after Black formatting change 2021-11-08 02:29:05 -05:00
a17d9439c0 Remove references to Ansible manual 2021-11-08 00:29:47 -05:00
9cd02eb148 Remove Ansible and Testing manuals
The Ansible manual can't keep up with the other repo, so it should live
there instead (eventually, after significant rewrites).

The Testing page is obsoleted by the "test-cluster" script.
2021-11-08 00:25:27 -05:00
459485c202 Allow American spelling for compatibility 2021-11-08 00:09:59 -05:00
9f92d5d822 Shorten help messages slightly to fit 2021-11-08 00:07:21 -05:00
947ac561c8 Add forced colour support
Allows preserving colour within e.g. watch, where Click would normally
determine that it is "not a terminal". This is done via the wrapper echo
which filters via the local config.
2021-11-08 00:04:20 -05:00
ca143c1968 Add funding configuration 2021-11-06 18:05:17 -04:00
6e110b178c Add start delineators to command output 2021-11-06 13:35:30 -04:00
d07d37d08e Revamp formatting and linting on commit
Remove the prepare script, and run the two stages manually. Better
handle Black reformatting by doing a check (for the errcode) then
reformat and abort commit to review.
2021-11-06 13:34:33 -04:00
0639b16c86 Apply more granular timeout formatting
We don't need to wait forever if state changes aren't waiting or disable
(which does a shutdown before returning).
2021-11-06 13:34:03 -04:00
1cf8706a52 Up timeout when setting VM state
Ensures the API won't time out immediately especially during a
wait-flagged or disable action.
2021-11-06 04:15:10 -04:00
dd8f07526f Use positive check rather than negative
Ensure the VM is start before doing shutdown/stop, rather than being
stopped. Prevents overwrite of existing disable state and other
weirdness.
2021-11-06 04:08:33 -04:00
5a5e5da663 Add disable forcing to CLI
References #148
2021-11-06 04:02:50 -04:00