2576 Commits

Author SHA1 Message Date
58dd5830eb Add additional kb_ values to OSD stats
Allows for easier parsing later to get e.g. % values and more details on
the used amounts.
2022-08-11 11:06:36 -04:00
90e515c46f Always sort VM list
Same justification as previous commit.
2022-08-09 12:05:40 -04:00
a6a5f71226 Ensure the node list is sorted
Otherwise the node entries could come back in an arbitrary order; since
this is an ordered list of dictionaries that might not be expected by
the API consumers, so ensure it's always sorted.
2022-08-09 12:03:49 -04:00
60a3ef1604 Add reference to bootstrap in index 2022-08-03 20:22:16 -04:00
95807b23eb Add missing cluster_req for vm modify 2022-08-02 10:02:26 -04:00
5ae430e1c5 Bump version to 0.9.51 2022-07-25 23:25:41 -04:00
4731faa2f0 Remove pvc-flush service
This service caused more headaches than it was worth, so remove it.

The original goal was to cleanly flush nodes on shutdown and unflush
them on startup, but this is tightly controlled by Ansible playbooks at
this point, and this is something best left to the Administrator and
their particular situation anyways.
2022-07-25 23:21:34 -04:00
42f4907dec Add confirmation to disable command 2022-07-21 16:43:37 -04:00
02168a5ecf Remove faulty literal_eval 2022-07-18 13:35:15 -04:00
8cfcd02ac2 Fix bad changelog entries 2022-07-06 16:57:55 -04:00
e464dcb483 Bump version to 0.9.50 2022-07-06 16:01:14 -04:00
27214c8190 Fix bug with space-containing detect strings 2022-07-06 15:58:57 -04:00
f78669a175 Add selector help and adjust flag name
1. Add documentation on the node selector flags. In the API, reference
the daemon configuration manual which now includes details in this
section; in the CLI, provide the help in "pvc vm define" in detail and
then reference that command's help in the other commands that use this
field.

2. Ensure the naming is consistent in the CLI, using the flag name
"--node-selector" everywhere (was "--selector" for "pvc vm" commands and
"--node-selector" for "pvc provisioner" commands).
2022-06-10 02:42:06 -04:00
00a4a01517 Add memfree to selector and use proper defaults 2022-06-10 02:03:12 -04:00
a40a69816d Add migration selector via free memory
Closes #152
2022-05-18 03:47:16 -04:00
baf5a132ff Bump version to 0.9.49 2022-05-06 15:49:39 -04:00
584cb95b8d Use consistent language for primary mode
I didn't call it "router" anywhere else, but the state in the list is
called "coordinator" so, call it "coordinator mode".
2022-05-06 15:40:52 -04:00
21bbb0393f Add support for replacing/refreshing OSDs
Adds commands to both replace an OSD disk, and refresh (reimport) an
existing OSD disk on a new node. This handles the cases where an OSD
disk should be replaced (either due to upgrades or failures) or where a
node is rebuilt in-place and an existing OSD must be re-imported to it.

This should avoid the need to do a full remove/add sequence for either
case.

Also cleans up some aspects of OSD removal that are identical between
methods (e.g. using safe-to-destroy and sleeping after stopping) and
fixes a bug if an OSD does not truly exist when the daemon starts up.
2022-05-06 15:32:06 -04:00
d18e009b00 Improve handling of rounded values 2022-05-02 15:29:30 -04:00
1f8f3252a6 Fix bug with initial JSON for stats 2022-05-02 13:28:19 -04:00
b47c9832b7 Refactor OSD removal to use new ZK data
With the OSD LVM information stored in Zookeeper, we can use this to
determine the actual block device to zap rather than relying on runtime
determination and guestimation.
2022-05-02 12:52:22 -04:00
d2757004db Store additional OSD information in ZK
Ensures that information like the FSIDs and the OSD LVM volume are
stored in Zookeeper at creation time and updated at daemon start time
(to ensure the data is populated at least once, or if the /dev/sdX
path changes).

This will allow safer operation of OSD removals and the potential
implementation of re-activation after node replacements.
2022-05-02 12:11:39 -04:00
7323269775 Ensure initial OSD stats is populated
Values are all invalid but this ensures the client won't error out when
trying to show an OSD that has never checked in yet.
2022-04-29 16:50:30 -04:00
85463f9aec Bump version to 0.9.48 2022-04-29 15:03:52 -04:00
19c37c3ed5 Fix bugs with forced removal 2022-04-29 14:03:07 -04:00
7d2ea494e7 Ensure unresponsive OSDs still display in list
It is still useful to see such dead OSDs even if they've never checked
in or have not checked in for quite some time.
2022-04-29 12:11:52 -04:00
cb50eee2a9 Add OSD removal force option
Ensures a removal can continue even in situations where some step(s)
might fail, for instance removing an obsolete OSD from a replaced node.
2022-04-29 11:16:33 -04:00
f3f4eaadf1 Use a singular configured cluster by default
If there is...
  1. No '--cluster' passed, and
  2. No 'local' cluster, and
  3. There is exactly one cluster configured
...then use that cluster by default in the CLI.
2022-01-13 18:36:20 -05:00
313a5d1c7d Bump version to 0.9.47 2021-12-28 22:03:08 -05:00
b6d689b769 Add pool PGs count modification
Allows an administrator to adjust the PG count of a given pool. This can
be used to increase the PGs (for example after adding more OSDs) or
decrease it (to remove OSDs, reduce CPU load, etc.).
2021-12-28 21:53:29 -05:00
a0fccf83f7 Add PGs count to pool list 2021-12-28 21:12:02 -05:00
46896c593e Fix issue if pool stats have not updated yet 2021-12-28 21:03:10 -05:00
02138974fa Add device class tiers to Ceph pools
Allows specifying a particular device class ("tier") for a given pool,
for instance SSD-only or NVMe-only. This is implemented with Crush
rules on the Ceph side, and via an additional new key in the pool
Zookeeper schema which is defaulted to "default".
2021-12-28 20:58:15 -05:00
c3d255be65 Bump version to 0.9.46 2021-12-28 15:02:14 -05:00
45fc8a47a3 Allow single-node clusters to restart and timeout
Prevents a daemon from waiting forever to terminate if it is primary,
and avoids this entirely if there is only a single node in the cluster.
2021-12-28 03:06:03 -05:00
07f2006f68 Fix bug when removing OSDs
Ensure the OSD is down as well as out or purge might fail.
2021-12-28 03:05:34 -05:00
f4c7fdffb8 Handle detect strings as arguments for blockdevs
Allows specifying blockdevs in the OSD and OSD-DB addition commands as
detect strings rather than actual block device paths. This provides
greater flexibility for automation with pvcbootstrapd (which originates
the concept of detect strings) and in general usage as well.
2021-12-28 02:53:02 -05:00
be1b67b8f0 Allow bypassing confirm message for benchmarks 2021-12-23 21:00:42 -05:00
d68f6a945e Add auditing to local syslog from PVC client
This ensures that any client command is logged by the local system.
Helps ensure Accounting for users of the CLI. Currently logs the full
command executed along with the $USER environment variable contents.
2021-12-10 16:17:33 -05:00
c776aba8b3 Standardize fuzzy matching and use fullmatch
Solves two problems:

1. How match fuzziness was used was very inconsistent; make them all the
same, i.e. "if is_fuzzy and limit, apply .* to both sides".

2. Use re.fullmatch instead of re.match to ensure exact matching of the
regex to the value. Without fuzziness, this would sometimes cause
inconsistent behavior, for instance if a limit was non-fuzzy "vm",
expecting to match the actual "vm", but also matching "vm1" too.
2021-12-06 16:35:29 -05:00
2461941421 Remove "and started" from message text
This is not necessarily the case.
2021-11-29 16:42:26 -05:00
68954a79ec Fix bug with cloned image sizes 2021-11-29 14:56:50 -05:00
a2fa6ed450 Fix bugs with legacy benchmark format 2021-11-26 11:42:35 -05:00
02a2f6a27a Bump version to 0.9.45 2021-11-25 09:34:20 -05:00
a75b951605 Ensure echo always has an argument 2021-11-25 09:33:26 -05:00
658e80350f Fix ordering of pvcnoded unit
We want to be after network.target and want network-online.target
2021-11-18 16:56:49 -05:00
3aa20fbaa3 Bump version to 0.9.44 2021-11-11 16:20:38 -05:00
6d101df1ff Add Munin plugin for Ceph utilization 2021-11-08 15:21:09 -05:00
be6a3992c1 Add 0.05s to connection timeout
This is recommended by the Python Requests documentation:

> It’s a good practice to set connect timeouts to slightly larger than a
  multiple of 3, which is the default TCP packet retransmission window.
2021-11-08 03:11:41 -05:00
d76da0f25a Use separate connect and data timeouts
This allows us to keep a very low connect timeout of 3 seconds, but also
ensure that long commands (e.g. --wait or VM disable) can take as long
as the API requires to complete.

Avoids having to explicitly set very long single-instance timeouts for
other functions which would block forever on an unreachable API.
2021-11-08 03:10:09 -05:00