Commit Graph

3365 Commits

Author SHA1 Message Date
Joshua Boniface af8a8d969e Ensure queues are set up for non-coordinator nodes
Allows a runner to operate on every possible node, not just
coordinators, as OSDs or other things could be on any node.

Also add more comments.
2023-11-04 15:05:07 -04:00
Joshua Boniface a6caac1b78 Add Celery queue routing for tasks
By default, tasks will continue to run as they did, on the primary
coordinator's task runner. However this opens the possibility for
defining more tasks that will run on other nodes or coordinators.
2023-11-04 14:29:59 -04:00
Joshua Boniface 30d7e49401 Start API worker with node daemon on coordinators 2023-11-04 13:08:16 -04:00
Joshua Boniface ab629f6b51 Use per-host hostname and queues in worker
Opens up the ability to direct tasks to specific workers.
2023-11-04 13:02:30 -04:00
Joshua Boniface 54215bab6c Switch to ZK+PG over Redis for Celery queue
Redis did not provide a distributed solution for the worker, which
precluded several important planned functions. So instead, move to using
Zookeeper + PostgreSQL as the broker and result backend respectively.

Should be a seamless drop-in change but for future uses requires the
database host to be the primary coordinator IP rather than localhost, so
that writes can occur to the database from non-primary hosts.
2023-11-04 12:46:34 -04:00
Joshua Boniface 7490f13b7c Check for partition tables on new devices 2023-11-04 03:13:58 -04:00
Joshua Boniface d1602f35de Adjust split indicator 2023-11-04 02:56:21 -04:00
Joshua Boniface 7cdedde2fb Adjust wording about extdb 2023-11-04 02:54:25 -04:00
Joshua Boniface ab156b14b7 Update help messages for OSD refresh 2023-11-04 02:47:04 -04:00
Joshua Boniface a016337f57 Remove block verify in APi
This doesn't work right and is handled by the node anyways.
2023-11-04 02:45:10 -04:00
Joshua Boniface e32054be81 Refactor refresh as well 2023-11-04 02:44:52 -04:00
Joshua Boniface 18d32fede3 Fix wording of detect strings 2023-11-04 01:37:07 -04:00
Joshua Boniface b3d13fe9be Add log message for zap 2023-11-04 01:02:51 -04:00
Joshua Boniface 48b2ccbd95 Add timeout for safe-to-destroy
Continuously take the OSD down and out while doing so.
2023-11-04 00:55:05 -04:00
Joshua Boniface 1535078842 Fix lvremove, lvcreate, and update ZK details 2023-11-04 00:30:14 -04:00
Joshua Boniface 0e45613634 Use right key with correct data 2023-11-04 00:02:00 -04:00
Joshua Boniface 75135f6d5f Avoid broken output format for new OSDs 2023-11-03 23:54:10 -04:00
Joshua Boniface 7f5dd385b5 Use right key for FSID elsewhere 2023-11-03 23:51:01 -04:00
Joshua Boniface befce62925 Add OSD destroy before purge 2023-11-03 23:44:27 -04:00
Joshua Boniface b0909aed61 Get proper FSID value 2023-11-03 23:38:24 -04:00
Joshua Boniface f418b40527 Use proper FSID instead of hack 2023-11-03 16:38:19 -04:00
Joshua Boniface ec42b19d0e Send FSID to clients too 2023-11-03 16:37:55 -04:00
Joshua Boniface dd0177ce10 Rework replacement procedure again
Avoid calling other functions; replicate the actual process from Ceph
docs (https://docs.ceph.com/en/pacific/rados/operations/add-or-rm-osds/)
to ensure things work out well (e.g. preserving OSD IDs).
2023-11-03 16:31:56 -04:00
Joshua Boniface ed5bc9fb43 Fix numerous formatting and function bugs 2023-11-03 14:00:05 -04:00
Joshua Boniface 94d8d2cf75 Fix skip_zap_flag anomaly and add crush rm 2023-11-03 02:35:12 -04:00
Joshua Boniface 20497cf89d Fix bugs and skip safe_to_destroy on force 2023-11-03 02:29:50 -04:00
Joshua Boniface 64e37ae963 Update OSD replacement functionality
1. Simplify this by leveraging the existing remove_osd/add_osd
functions, since its task was functionally identical to those two in
sequential order.
2. Add support for split OSDs within the command (replacing all OSDs on
the block device(s) as required).
3. Add additional configurability and flexibility around the old device,
weight, and external DB LVs.
2023-11-03 01:45:49 -04:00
Joshua Boniface 3cb8a70f04 Add forcing to OSD purge 2023-11-02 23:20:48 -04:00
Joshua Boniface 44d2f98e75 Remove Var field from OSDs
Not super duper useful and increases length
2023-11-02 22:55:39 -04:00
Joshua Boniface cb91bf18a7 Fix incorrect variables 2023-11-02 22:39:32 -04:00
Joshua Boniface a3e3fe829a Adjust helptext for osd add 2023-11-02 22:34:58 -04:00
Joshua Boniface f53af510c1 Avoid startup failures if OSD removed 2023-11-02 22:24:39 -04:00
Joshua Boniface d5d783fad3 Set proper split flag 2023-11-02 22:20:22 -04:00
Joshua Boniface 8b8957547a Adjust helptext for create-db-vg command 2023-11-02 22:14:25 -04:00
Joshua Boniface 980ea6a9e9 Adjust handling of ext_db and _count options
Avoid the use of superfluous flag options, default them to none, and add
support for fixed-size DB LVs.
2023-11-02 13:29:47 -04:00
Joshua Boniface 0f433bd5eb Add wait messages for OSD commands 2023-11-02 09:31:41 -04:00
Joshua Boniface 8780044be6 Ensure db_device is an empty string 2023-11-02 00:52:18 -04:00
Joshua Boniface f08c654f22 Fix missing fstring 2023-11-01 21:41:06 -04:00
Joshua Boniface 80a7fd6195 Improve help text messages 2023-11-01 21:38:55 -04:00
Joshua Boniface 8b93f9a80e Handle OSD index errors during stats collection 2023-11-01 21:33:40 -04:00
Joshua Boniface 526a5f4a74 Add support for split OSD adds
Allows creating multiple OSDs on a single (NVMe) block device,
leveraging the "ceph-volume lvm batch" command. Replaces the previous
method of creating OSDs.

Also adds a new ZK item for each OSD indicating if it is split or not.
2023-11-01 21:31:35 -04:00
Joshua Boniface aa0b1f504f Fix output bug 2023-11-01 15:46:38 -04:00
Joshua Boniface bc425b9224 Avoid duplicate confirmations in a safer way
This version instead still requires --yes with --restart to avoid the
confirmation option, but avoids duplicate prompts.

This might be slightly more cumbersome, but ensures consistency: every
situation that could cause a restart is confirmed even if --restart is
given.
2023-11-01 12:05:52 -04:00
Joshua Boniface 79e5c098cd Revert "Remove duplicate confirmation for VM restart"
This reverts commit 3c61a3ac03.
2023-11-01 12:04:34 -04:00
Joshua Boniface 3c61a3ac03 Remove duplicate confirmation for VM restart
Having both restart_opt and confirm_opt resulted in a duplicate
confirmation message, at least if neither --restart/--no-restart is
specified. This is not necessary as the confirmation is already given by
the restart_opt or the relevant --restart/--no-restart flag.
2023-11-01 12:02:34 -04:00
Joshua Boniface 988c777912 Properly handle live state with restart confirm
If "--live" is passed (the default), we shouldn't confirm to restart the
VM as this is not required. Instead only confirm if "--no-live" was
passed or if the flag doesn't exist.
2023-11-01 11:46:59 -04:00
Joshua Boniface 5b4dd61754 Bump version to 0.9.80 2023-10-27 09:56:31 -04:00
Joshua Boniface 2fccbcda89 Add enhancements to autobackup
1. Add a cron mode to avoid exit(1) during cronjobs/timers
2. Revamp the remote_mount settings into auto_mount
   This removes a lot of unnecessary complexity while giving the
   administrator more flexibility in what they want to execute to mount
   a filesystem and how. The naming reflects the goal but the possible
   commands are arbitrary.
2023-10-27 02:07:24 -04:00
Joshua Boniface 6ad51ea4bb Handle store exceptions in cli() function
Avoids having an unsuppressable error message in some contexts, and
provides a cleaner module.
2023-10-26 23:30:22 -04:00
Joshua Boniface 5954feaa31 Add autobackup functionality to CLI
Adds autobackup (integrated, managed VM backups with automatic remote
filesystem mounting, included backup expiry/removal and automatic
full/incremental selection, independent from the manual "pvc vm backup"
commands) to the CLI client.

This is a bit of a large command to handle only inside the CLI client,
but this was chosen as it's the only real place for it aside from an
external script.

There are several major restrictions on this command, mainly that it
must be run from the primary coordinator using the "local" connection,
and that it must be run as "root".

The command is designed to run in a cron/systemd timer installed by
pvc-ansible when the appropriate group_vars are enabled, and otherwise
not touched.
2023-10-26 21:25:23 -04:00