Compare commits

..

156 Commits

Author SHA1 Message Date
d07d37d08e Revamp formatting and linting on commit
Remove the prepare script, and run the two stages manually. Better
handle Black reformatting by doing a check (for the errcode) then
reformat and abort commit to review.
2021-11-06 13:34:33 -04:00
0639b16c86 Apply more granular timeout formatting
We don't need to wait forever if state changes aren't waiting or disable
(which does a shutdown before returning).
2021-11-06 13:34:03 -04:00
1cf8706a52 Up timeout when setting VM state
Ensures the API won't time out immediately especially during a
wait-flagged or disable action.
2021-11-06 04:15:10 -04:00
dd8f07526f Use positive check rather than negative
Ensure the VM is start before doing shutdown/stop, rather than being
stopped. Prevents overwrite of existing disable state and other
weirdness.
2021-11-06 04:08:33 -04:00
5a5e5da663 Add disable forcing to CLI
References #148
2021-11-06 04:02:50 -04:00
739b60b91e Perform automatic shutdown/stop on VM disable
Instead of requiring the VM to already be stopped, instead allow disable
state changes to perform a shutdown first. Also add a force option which
will do a hard stop instead of a shutdown.

References #148
2021-11-06 03:57:24 -04:00
16544227eb Reformat recent changes with Black 2021-11-06 03:27:07 -04:00
73e3746885 Fix linting error F541 f-string placeholders 2021-11-06 03:26:03 -04:00
66230ce971 Fix linting errors F522/F523 unused args 2021-11-06 03:24:50 -04:00
fbfbd70461 Rename build-deb.sh to build-stable-deb.sh
Unifies the naming with the other build-unstable-deb.sh script.
2021-11-06 03:18:58 -04:00
2506098223 Remove obsolete gitlab-ci config 2021-11-06 03:18:22 -04:00
83e887c4ee Ensure all helper scripts pushd/popd
Make sure all of these move to the root of the repository first, then
return to where they were afterwards, using pushd/popd. This allows them
to be executed from anywhere in the repo.
2021-11-06 03:17:47 -04:00
4eb0f3bb8a Unify formatting and linting
Ensures optimal formatting in addition to linting during manual deploys
and during pre-commit actions.
2021-11-06 03:10:17 -04:00
adc767e32f Add newline to start of lint 2021-11-06 03:04:14 -04:00
2083fd824a Reformat code with Black code formatter
Unify the code style along PEP and Black principles using the tool.
2021-11-06 03:02:43 -04:00
3aa74a3940 Add safe mode to Black 2021-11-06 02:59:54 -04:00
71d94bbeab Move Flake configuration into dedicated file
Avoid passing arguments in the script.
2021-11-06 02:55:37 -04:00
718f689df9 Clean up linter after Black add (pass two) 2021-11-06 02:51:14 -04:00
268b5c0b86 Exclude Alembic migrations from Black
These files are autogenerated with their own formats, so we don't want
to override that.
2021-11-06 02:46:06 -04:00
b016b9bf3d Clean up linter after Black add (pass one) 2021-11-06 02:44:24 -04:00
7604b9611f Add black formatter to project root 2021-11-06 02:44:05 -04:00
b21278fd80 Add Basic Builder configuration
Configuration for my new CI system under Gitea.
2021-10-31 00:09:55 -04:00
3b02034b70 Add some delay and additional tries to fencing 2021-10-27 16:24:17 -04:00
c7a5b41b1e Fix ordering to show correct message 2021-10-27 13:37:52 -04:00
48b0091d3e Support adding the same network to a VM again
This is a supported configuration for some edge cases and should be
allowed.
2021-10-27 13:33:27 -04:00
2e94516ee2 Reorder linting on build-and-deploy 2021-10-27 13:25:14 -04:00
d7f26b27ea More gracefully handle restart + live
Instead of erroring, just use the implication that restarting a VM does
not want a live modification, and proceed from there. Update the help
text to match.
2021-10-27 13:23:39 -04:00
872f35a7ee Support removing VM interfaces by MAC
Provides a way to handle multiple interfaces in the same network
gracefully, while making the previous behaviour explicit.
2021-10-27 13:20:05 -04:00
52c3e8ced3 Fix bad test in postinst 2021-10-19 00:27:12 -04:00
1d7acf62bf Fix bad location of config sets 2021-10-12 17:23:04 -04:00
c790c331a7 Also validate on failures 2021-10-12 17:11:03 -04:00
23165482df Bump version to 0.9.42 2021-10-12 15:25:42 -04:00
057071a7b7 Go back to passing if exception
Validation already happened and the set happens again later.
2021-10-12 14:21:52 -04:00
554fa9f412 Use current live value for bridge_mtu
This will ensure that upgrading without the bridge_mtu config key set
will keep things as they are.
2021-10-12 12:24:03 -04:00
5a5f924268 Use power off in fence instead of reset
Use a power off (and then make the power on a requirement) during a node
fence. Removes some potential ambiguity in the power state, since we
will know for certain if it is off.
2021-10-12 11:04:27 -04:00
cc309fc021 Validate network MTU after initial read 2021-10-12 10:53:17 -04:00
5f783f1663 Make cluster example images clickable 2021-10-12 03:15:04 -04:00
bc89bb5b68 Mention fencing only in run state 2021-10-12 03:05:01 -04:00
eb233ef588 Adjust more wording and fix typos 2021-10-12 03:00:21 -04:00
d3efb54cb4 Adjust some wording 2021-10-12 02:54:16 -04:00
da15357c8a Remove codeql setup
I don't use this for anything useful, so disable it since a run takes
ages.
2021-10-12 02:51:19 -04:00
b6939a28c0 Fix formatting of subsection 2021-10-12 02:49:40 -04:00
a1da479a4c Add reference to Ansible manual 2021-10-12 02:48:47 -04:00
ace4082820 Fix spelling errors 2021-10-12 02:47:31 -04:00
4036af6045 Fix link to cluster architecture docs 2021-10-12 02:41:22 -04:00
f96de97861 Adjust getting started docs
Update the docs with the current information on setting up a cluster,
including simplifying the Ansible configuration to use the new
create-local-repo.sh script, and simplifying some other sections.
2021-10-12 02:39:25 -04:00
04cad46305 Default to removing build artifacts in b-a-d.sh 2021-10-11 16:41:00 -04:00
e9dea4d2d1 Add explicit 3 second timeout to requests 2021-10-11 16:31:18 -04:00
39fd85fcc3 Add version function support to CLI 2021-10-11 15:34:41 -04:00
cbbab46b55 Add new configs for Ansible 2021-10-11 14:44:18 -04:00
d1f2ce0b0a Bump version to 0.9.41 2021-10-09 19:39:21 -04:00
2f01edca14 Add bridge_mtu config to docs 2021-10-09 19:28:50 -04:00
12a3a3a6a6 Adjust log type of object setup message 2021-10-09 19:23:12 -04:00
c44732be83 Avoid duplicate runs of MTU set
It wasn't the validator duplicating, but the update duplicating, so
avoid that happening properly this time.
2021-10-09 19:21:47 -04:00
a8b68e0968 Revert "Avoid duplicate runs of MTU validator"
This reverts commit 56021c443a.
2021-10-09 19:11:42 -04:00
e59152afee Set all log messages to information state
None of these were "success" messages and thus shouldn't have been ok
state.
2021-10-09 19:09:38 -04:00
56021c443a Avoid duplicate runs of MTU validator 2021-10-09 19:07:41 -04:00
ebdea165f1 Use correct isinstance instead of type 2021-10-09 19:03:31 -04:00
fb0651fb05 Move MTU validation to function
Prevents code duplication and ensures validation runs when an MTU is
updated, not just on network creation.
2021-10-09 19:01:45 -04:00
35e7e11403 Add logger message when setting MTU 2021-10-09 18:56:18 -04:00
b7555468eb Ensure vx_mtu is always an int() 2021-10-09 18:52:50 -04:00
f1b4ee02ba Fix bad header length in network list 2021-10-09 18:50:32 -04:00
4698edc98e Add MTU value checking and log messages
Ensures that if a specified MTU is more than the maximum it is set to
the maximum instead, and adds warning messages for both situations.
2021-10-09 18:48:56 -04:00
40e7e04aad Fix invalid schema key
Addresses #144
2021-10-09 18:42:33 -04:00
7f074847c4 Add MTU support to network add/modify commands
Addresses #144
2021-10-09 18:06:21 -04:00
b0b0b75605 Have VXNetworkInstance set MTU if unset
Makes this explicit in Zookeeper if a network is unset, post-migration
(schema version 6).

Addresses #144
2021-10-09 17:52:57 -04:00
89f62318bd Add MTU to network creation/modification
Addresses #144
2021-10-09 17:51:32 -04:00
925141ed65 Fix migration bugs and invalid vx_mtu
Addresses #144
2021-10-09 17:35:10 -04:00
f7a826bf52 Add handlers for client network MTUs
Refactors some of the code in VXNetworkInterface to handle MTUs in a
more streamlined fashion. Also fixes a bug whereby bridge client
networks were being explicitly given the cluster dev MTU which might not
be correct. Now adds support for this option explicitly in the configs,
and defaults to 1500 for safety (the standard Ethernet MTU).

Addresses #144
2021-10-09 17:02:27 -04:00
e176f3b2f6 Make n-1 values clearer 2021-10-07 18:11:15 -04:00
b339d5e641 Correct levels in TOC 2021-10-07 18:08:28 -04:00
d476b13cc0 Correct spelling errors 2021-10-07 18:07:06 -04:00
ce8b2c22cc Add documentation sections on IPMI and fencing 2021-10-07 18:05:47 -04:00
feab5d3479 Correct flawed conditional in verify_ipmi 2021-10-07 15:11:19 -04:00
ee348593c9 Bump version to 0.9.40 2021-10-07 14:42:04 -04:00
e403146bcf Correct bad stop_keepalive_timer call 2021-10-07 14:41:12 -04:00
bde684dd3a Remove redundant wording from header 2021-10-07 12:20:04 -04:00
992e003500 Replace headers with links in CHANGELOG.md 2021-10-07 12:17:44 -04:00
eaeb860a83 Add missing period to changelog sentence 2021-10-07 12:10:35 -04:00
1198ca9f5c Move changelog into dedicated file
The changelog was getting far too long for the README/docs index to
support, so move it into CHANGELOG.md and link to it instead.
2021-10-07 12:09:26 -04:00
e79d200244 Bump version to 0.9.39 2021-10-07 11:52:38 -04:00
5b3bb9f306 Add linting to build-and-deploy
Ensures that bad code isn't deployed during testing.
2021-10-07 11:51:05 -04:00
5501586a47 Add limit negation to VM list
When using the "state", "node", or "tag" arguments to a VM list, add
support for a "negate" flag to look for all VMs *not in* the state,
node, or tag state.
2021-10-07 11:50:52 -04:00
c160648c5c Add note about fencing at remote sites 2021-10-04 19:58:08 -04:00
fa37227127 Correct TOC in architecture page 2021-10-04 01:54:22 -04:00
2cac98963c Correct spelling errors 2021-10-04 01:51:58 -04:00
8e50428707 Double image sizes for example clusters 2021-10-04 01:47:35 -04:00
a4953bc6ef Adjust toc_depth for RTD theme 2021-10-04 01:45:05 -04:00
3c10d57148 Revamp about and architecture docs
Makes these a little simpler to follow and provides some more up-to-date
information based on recent tests and developments.
2021-10-04 01:42:08 -04:00
26d8551388 Adjust bump-version changelog heading level 2021-10-04 01:41:48 -04:00
57342541dd Move changelog headers down one more level 2021-10-04 01:41:22 -04:00
50f8afd749 Adjust indent of index/README versions 2021-10-04 00:33:24 -04:00
3449069e3d Bump version to 0.9.38 2021-10-03 22:32:41 -04:00
cb66b16045 Correct latency units and format name 2021-10-03 17:06:34 -04:00
8edce74b85 Revamp test result display
Instead of showing CLAT percentiles, which are very hard to interpret
and understand, instead use the main latency buckets.
2021-10-03 15:49:01 -04:00
e9b69c4124 Revamp postinst for the API daemon
Ensures that the worker is always restarted and make the NOTE
conditional more specific.
2021-10-03 15:15:26 -04:00
3948206225 Tweak fio tests for benchmarks
1. Remove ramp_time as this was giving very strange results.

2. Up the runtime to 75 seconds to compensate.

3. Print the fio command to the console to validate.
2021-10-03 15:06:18 -04:00
a09578fcf5 Add benchmark format to list 2021-10-03 15:05:58 -04:00
73be807b84 Adjust ETA for benchmarks 2021-10-02 04:51:01 -04:00
4a9805578e Add format parsing for format 1 storage benchmarks 2021-10-02 04:46:44 -04:00
f70f052df1 Add version 2 benchmark list formatting 2021-10-02 02:47:17 -04:00
1e8841ce69 Handle benchmark running state properly 2021-10-02 01:54:51 -04:00
9c7d39d523 Fix missing argument in database insert 2021-10-02 01:49:47 -04:00
011490bcca Update to storage benchmark format 1
1. Runs `fio` with the `--format=json` option and removes all terse
format parsing from the results.

2. Adds a 15-second ramp time to minimize wonky ramp-up results.

3. Sets group_reporting, which isn't necessary with only a single job,
but is here for consistency.
2021-10-02 01:41:08 -04:00
8de63b2785 Fix handling of array of information
With a benchmark info we only ever want test one, so pass only that to
the formatter. Simplifies the format function.
2021-10-02 01:28:39 -04:00
8f8f00b2e9 Avoid versioning benchmark lists
This wouldn't work since each individual test is versioned. Instead add
a placeholder for later once additional format(s) are defined.
2021-10-02 01:25:18 -04:00
1daab49b50 Add format option to benchmark info
Allows specifying of raw json or json-pretty formats in addition to the
"pretty" formatted option.
2021-10-02 01:13:50 -04:00
9f6041b9cf Add benchmark format function support
Allows choosing different list and info functions based on the benchmark
version found. Currently only implements "legacy" version 0 with more to
be added.
2021-10-02 01:07:25 -04:00
5b27e438a9 Add test format versioning to storage benchmarks
Adds a test_format database column and a value in the API return for the
test format version, starting at 0 for the existing format as of 0.9.37.

References #143
2021-10-02 00:55:27 -04:00
3e8a85b029 Load benchmark results as JSON
Load the JSON at the API side instead of client side, because that's
what the API doc says it is and it just makes more sense.
2021-09-30 23:40:24 -04:00
19ac1e17c3 Bump version to 0.9.37 2021-09-30 02:08:14 -04:00
252175fb6f Revamp benchmark tests
1. Move to a time-based (60s) benchmark to avoid these taking an absurd
amount of time to show the same information.

2. Eliminate the 256k random benchmarks, since they don't really add
anything.

3. Add in a 4k single-queue benchmark as this might provide valuable
insight into latency.

4. Adjust the output to reflect the above changes.

While this does change the benchmarking, this should not invalidate any
existing benchmarks since most of the test suit is unchanged (especially
the most important 4M sequential and 4K random tests). It simply removes
an unused entry and adds a more helpful one. The time-based change
should not significantly affect the results either, just reduces the
total runtime for long-tests and increase the runtime for quick tests to
provide a better picture.
2021-09-29 20:51:30 -04:00
f39b041471 Add primary node to benchmark job name
Ensures tracking of the current primary node the job was run on, since
this may be relevant for performance reasons.
2021-09-28 09:58:22 -04:00
3b41759262 Add timeouts to queue gets and adjust
Ensure that all keepalive timeouts are set (prevent the queue.get()
actions from blocking forever) and set the thread timeouts to line up as
well. Everything here is thus limited to keepalive_interval seconds
(default 5s) to keep it uniform.
2021-09-27 16:10:27 -04:00
e514eed414 Re-add success log output during migration 2021-09-27 11:50:55 -04:00
b81e70ec18 Fix missing character in log message 2021-09-27 00:49:43 -04:00
c2a473ed8b Simplify VM migration down to 3 steps
Remove two superfluous synchronization steps which are not needed here,
since the exclusive lock handles that situation anyways.

Still does not fix the weird flush->unflush lock timeout bug, but is
better worked-around now due to the cancelling of the other wait freeing
this up and continuing.
2021-09-27 00:03:20 -04:00
5355f6ff48 Work around synchronization lock issues
Make the block on stage C only wait for 900 seconds (15 minutes) to
prevent indefinite blocking.

The issue comes if a VM is being received, and the current unflush is
cancelled for a flush. When this happens, this lock acquisition seems to
block for no obvious reason, and no other changes seem to affect it.
This is certainly some sort of locking bug within Kazoo but I can't
diagnose it as-is. Leave a TODO to look into this again in the future.
2021-09-26 23:26:21 -04:00
bf7823deb5 Improve log messages during VM migration 2021-09-26 23:15:38 -04:00
8ba371723e Use event to non-block wait and fix inf wait 2021-09-26 22:55:39 -04:00
e10ac52116 Track status of VM state thread 2021-09-26 22:55:21 -04:00
341073521b Simplify locking process for VM migration
Rather than using a cumbersome and overly complex ping-pong of read and
write locks, instead move to a much simpler process using exclusive
locks.

Describing the process in ASCII or narrative is cumbersome, but the
process ping-pongs via a set of exclusive locks and wait timers, so that
the two sides are able to synchronize via blocking the exclusive lock.
The end result is a much more streamlined migration (takes about half
the time all things considered) which should be less error-prone.
2021-09-26 22:08:07 -04:00
16c38da5ef Fix failure to connect to libvirt in keepalive
This should be caught and abort the thread rather than failing and
holding up keepalives.
2021-09-26 20:42:01 -04:00
c8134d3a1c Fix several bugs in fence handling
1. Output from ipmitool was not being stripped, and stray newlines were
throwing off the comparisons. Fixes this.

2. Several stages were lacking meaningful messages. Adds these in so the
output is more clear about what is going on.

3. Reduce the sleep time after a fence to just 1x the
keepalive_interval, rather than 2x, because this seemed like excessively
long even for slow IPMI interfaces, especially since we're checking the
power state now anyways.

4. Set the node daemon state to an explicit 'fenced' state after a
successful fence to indicate to users that the node was indeed fenced
successfully and not still 'dead'.
2021-09-26 20:07:30 -04:00
9f41373324 Ensure pvc-flush is after network-online 2021-09-26 17:40:42 -04:00
8e62d5b30b Fix typo in log message 2021-09-26 03:35:30 -04:00
7a8eee244a Tweak CLI helptext around OSD actions
Adds some more detail about OSD commands and their values.
2021-09-26 01:29:23 -04:00
7df5b8e52e Fix typo in sgdisk command options 2021-09-26 00:59:05 -04:00
6f96219023 Use re.search instead of re.match
Required since we're not matching the start of the string.
2021-09-26 00:55:29 -04:00
51967e164b Raise basic exceptions in CephInstance
Avoids no exception to reraise errors on failures.
2021-09-26 00:50:10 -04:00
7a3a44d47c Fix OSD creation for partition paths and fix gdisk
The previous implementation did not work with /dev/nvme devices or any
/dev/disk/by-* devices due to some logical failures in the partition
naming scheme, so fix these, and be explicit about what is supported in
the PVC CLI command output.

The 'echo | gdisk' implementation of partition creation also did not
work due to limitations of subprocess.run; instead, use sgdisk which
allows these commands to be written out explicitly and is included in
the same package as gdisk.
2021-09-26 00:12:28 -04:00
44491dd988 Add support for configurable OSD DB ratios
The default of 0.05 (5%) is likely ideal in the initial implementation,
but allow this to be set explicitly for maximum flexibility in
space-constrained or performance-critical use-cases.
2021-09-24 01:06:39 -04:00
eba142f470 Bump version to 0.9.36 2021-09-23 14:01:38 -04:00
6cef68d157 Add separate OSD DB device support
Adds in three parts:

1. Create an API endpoint to create OSD DB volume groups on a device.
Passed through to the node via the same command pipeline as
creating/removing OSDs, and creates a volume group with a fixed name
(osd-db).

2. Adds API support for specifying whether or not to use this DB volume
group when creating a new OSD via the "ext_db" flag. Naming and sizing
is fixed for simplicity and based on Ceph recommendations (5% of OSD
size). The Zookeeper schema tracks the block device to use during
removal.

3. Adds CLI support for the new and modified API endpoints, as well as
displaying the block device and DB block device in the OSD list.

While I debated supporting adding a DB device to an existing OSD, in
practice this ended up being a very complex operation involving stopping
the OSD and setting some options, so this is not supported; this can be
specified during OSD creation only.

Closes #142
2021-09-23 13:59:49 -04:00
e8caf3369e Move console watcher stop try up
Could cause an exception if d_domain is not defined yet.
2021-09-22 16:02:04 -04:00
3e3776a25b Bump version to 0.9.35 2021-09-13 02:20:46 -04:00
6e0d0e264e Add memory and vCPU checks to VM define/modify
Ensures that a VM won't:

(a) Have provisioned more RAM than there is available on a given node.
Due to memory overprovisioning, this is simply a "is the VM memory count
more than the node count", and doesn't factor in free or used memory on
a node, total cluster usage, etc. So if a node has 64GB total RAM, the
VM limit is 64GB. It is up to an administrator to ensure sanity *below*
that value.

(b) Have provisioned more vCPUs than there are CPU cores on the node,
minus 2 to account for hypervisor/storage processes. Will ensure there
is no severe CPU contention caused by a single VM having more vCPUs than
there are actual execution threads available.

Closes #139
2021-09-13 01:51:21 -04:00
1855d03a36 Add pool size check when resizing volumes
Closes #140
2021-09-12 19:54:51 -04:00
1a286dc8dd Increase build-and-deploy sleep 2021-09-12 19:50:58 -04:00
1b6d10e03a Handle VM disk/network stats gathering exceptions 2021-09-12 19:41:07 -04:00
73c96d1e93 Add VM device hot attach/detach support
Adds a new API endpoint to support hot attach/detach of devices, and the
corresponding client-side logic to use this endpoint when doing VM
network/storage add/remove actions.

The live attach is now the default behaviour for these types of
additions and removals, and can be disabled if needed.

Closes #141
2021-09-12 19:33:00 -04:00
5841c98a59 Adjust lint script for newer linter 2021-09-12 15:40:38 -04:00
bc6395c959 Don't crash cleanup if no this_node 2021-08-29 03:52:18 -04:00
d582f87472 Change default node object state to flushed 2021-08-29 03:34:08 -04:00
e9735113af Bump version to 0.9.34 2021-08-24 16:15:25 -04:00
722fd0a65d Properly handle =-separated fsargs 2021-08-24 11:40:22 -04:00
3b41beb0f3 Convert argument elements of task status to types 2021-08-23 14:28:12 -04:00
d3392c0282 Fix typo in output message 2021-08-23 00:39:19 -04:00
560c013e95 Bump version to 0.9.33 2021-08-21 03:28:48 -04:00
384c6320ef Avoid failing if no provisioner tasks 2021-08-21 03:25:16 -04:00
445dec1c38 Ensure pycache files are removed on deb creation 2021-08-21 03:19:18 -04:00
534c7cd7f0 Refactor pvcnoded to reduce Daemon.py size
This branch commit refactors the pvcnoded component to better adhere to
good programming practices. The previous Daemon.py was a massive file
which contained almost 2000 lines of direct, root-level code which was
directly imported. Not only was this poor practice, but this resulted
in a nigh-unmaintainable file which was hard even for me to understand.

This refactoring splits a large section of the code from Daemon.py into
separate small modules and functions in the `util/` directory. This will
hopefully make most of the functionality easy to find and modify without
having to dig through a single large file.

Further the existing subcomponents have been moved to the `objects/`
directory which clearly separates them.

Finally, the Daemon.py code has mostly been moved into a function,
`entrypoint()`, which is then called from the `pvcnoded.py` stub.

An additional item is that most format strings have been replaced by
f-strings to make use of the Python 3.6 features in Daemon.py and the
utility files.
2021-08-21 03:14:22 -04:00
4014ef7714 Bump version to 0.9.32 2021-08-19 12:37:58 -04:00
180f0445ac Properly handle exceptions getting VM stats 2021-08-19 12:36:31 -04:00
Joshua Boniface
074664d4c1 Fix image dimensions and size 2021-08-18 19:51:55 -04:00
Joshua Boniface
418ac23d40 Add screenshots to docs 2021-08-18 19:49:53 -04:00
18 changed files with 1105 additions and 136 deletions

View File

@@ -4,4 +4,4 @@ bbuilder:
published: published:
- git submodule update --init - git submodule update --init
- /bin/bash build-stable-deb.sh - /bin/bash build-stable-deb.sh
- sudo /usr/local/bin/deploy-package -C pvc - /usr/local/bin/deploy-package -C pvc

2
.github/FUNDING.yml vendored
View File

@@ -1,2 +0,0 @@
github: [joshuaboniface]
patreon: [joshuaboniface]

View File

@@ -1 +1 @@
0.9.43 0.9.42

View File

@@ -1,18 +1,5 @@
## PVC Changelog ## PVC Changelog
###### [v0.9.43](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.43)
* [Packaging] Fixes a bad test in postinst
* [CLI] Adds support for removing VM interfaces by MAC address
* [CLI] Modifies the default restart + live behaviour to prefer the explicit restart
* [CLI] Adds support for adding additional VM interfaces in the same network
* [CLI] Various ordering and message fixes
* [Node Daemon] Adds additional delays and retries to fencing actions
* [All] Adds Black formatting for Python code and various script/hook cleanups
* [CLI/API] Adds automatic shutdown or stop when disabling a VM
* [CLI] Adds support for forcing colourized output
* [Docs] Remove obsolete Ansible and Testing manuals
###### [v0.9.42](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.42) ###### [v0.9.42](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.42)
* [Documentation] Reworks and updates various documentation sections * [Documentation] Reworks and updates various documentation sections

View File

@@ -25,7 +25,7 @@ import yaml
from distutils.util import strtobool as dustrtobool from distutils.util import strtobool as dustrtobool
# Daemon version # Daemon version
version = "0.9.43" version = "0.9.42"
# API version # API version
API_VERSION = 1.0 API_VERSION = 1.0

View File

@@ -19,9 +19,9 @@ $EDITOR ${changelog_file}
changelog="$( cat ${changelog_file} | grep -v '^#' | sed 's/^*/ */' )" changelog="$( cat ${changelog_file} | grep -v '^#' | sed 's/^*/ */' )"
rm ${changelog_file} rm ${changelog_file}
sed -i "s,version = \"${current_version}\",version = \"${new_version}\"," node-daemon/pvcnoded/Daemon.py sed -i "s,version = '${current_version}',version = '${new_version}'," node-daemon/pvcnoded/Daemon.py
sed -i "s,version = \"${current_version}\",version = \"${new_version}\"," api-daemon/pvcapid/Daemon.py sed -i "s,version = '${current_version}',version = '${new_version}'," api-daemon/pvcapid/Daemon.py
sed -i "s,version=\"${current_version}\",version=\"${new_version}\"," client-cli/setup.py sed -i "s,version='${current_version}',version='${new_version}'," client-cli/setup.py
echo ${new_version} > .version echo ${new_version} > .version
changelog_tmpdir=$( mktemp -d ) changelog_tmpdir=$( mktemp -d )
@@ -52,7 +52,7 @@ git commit -v
popd &>/dev/null popd &>/dev/null
echo echo
echo "Release message:" echo "GitLab release message:"
echo echo
echo "# Parallel Virtual Cluster version ${new_version}" echo "# Parallel Virtual Cluster version ${new_version}"
echo echo

View File

@@ -42,13 +42,11 @@ import pvc.cli_lib.network as pvc_network
import pvc.cli_lib.ceph as pvc_ceph import pvc.cli_lib.ceph as pvc_ceph
import pvc.cli_lib.provisioner as pvc_provisioner import pvc.cli_lib.provisioner as pvc_provisioner
myhostname = socket.gethostname().split(".")[0] myhostname = socket.gethostname().split(".")[0]
zk_host = "" zk_host = ""
is_completion = True if os.environ.get("_PVC_COMPLETE", "") == "complete" else False is_completion = True if os.environ.get("_PVC_COMPLETE", "") == "complete" else False
default_store_data = {"cfgfile": "/etc/pvc/pvcapid.yaml"} default_store_data = {"cfgfile": "/etc/pvc/pvcapid.yaml"}
config = dict()
# #
@@ -60,7 +58,7 @@ def print_version(ctx, param, value):
from pkg_resources import get_distribution from pkg_resources import get_distribution
version = get_distribution("pvc").version version = get_distribution("pvc").version
echo(f"Parallel Virtual Cluster version {version}") click.echo(f"Parallel Virtual Cluster version {version}")
ctx.exit() ctx.exit()
@@ -168,18 +166,9 @@ if not is_completion:
CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"], max_content_width=120) CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"], max_content_width=120)
def echo(msg, nl=True, err=False):
if config.get("colour", False):
colour = True
else:
colour = None
click.echo(message=msg, color=colour, nl=nl, err=err)
def cleanup(retcode, retmsg): def cleanup(retcode, retmsg):
if retmsg != "": if retmsg != "":
echo(retmsg) click.echo(retmsg)
if retcode is True: if retcode is True:
exit(0) exit(0)
else: else:
@@ -268,7 +257,9 @@ def cluster_add(description, address, port, ssl, name, api_key):
} }
# Update the store # Update the store
update_store(store_path, existing_config) update_store(store_path, existing_config)
echo('Added new cluster "{}" at host "{}" to local database'.format(name, address)) click.echo(
'Added new cluster "{}" at host "{}" to local database'.format(name, address)
)
############################################################################### ###############################################################################
@@ -289,7 +280,7 @@ def cluster_remove(name):
print('No cluster with name "{}" found'.format(name)) print('No cluster with name "{}" found'.format(name))
# Update the store # Update the store
update_store(store_path, existing_config) update_store(store_path, existing_config)
echo('Removed cluster "{}" from local database'.format(name)) click.echo('Removed cluster "{}" from local database'.format(name))
############################################################################### ###############################################################################
@@ -363,9 +354,9 @@ def cluster_list(raw):
if not raw: if not raw:
# Display the data nicely # Display the data nicely
echo("Available clusters:") click.echo("Available clusters:")
echo() click.echo()
echo( click.echo(
"{bold}{name: <{name_length}} {description: <{description_length}} {address: <{address_length}} {port: <{port_length}} {scheme: <{scheme_length}} {api_key: <{api_key_length}}{end_bold}".format( "{bold}{name: <{name_length}} {description: <{description_length}} {address: <{address_length}} {port: <{port_length}} {scheme: <{scheme_length}} {api_key: <{api_key_length}}{end_bold}".format(
bold=ansiprint.bold(), bold=ansiprint.bold(),
end_bold=ansiprint.end(), end_bold=ansiprint.end(),
@@ -402,7 +393,7 @@ def cluster_list(raw):
api_key = "N/A" api_key = "N/A"
if not raw: if not raw:
echo( click.echo(
"{bold}{name: <{name_length}} {description: <{description_length}} {address: <{address_length}} {port: <{port_length}} {scheme: <{scheme_length}} {api_key: <{api_key_length}}{end_bold}".format( "{bold}{name: <{name_length}} {description: <{description_length}} {address: <{address_length}} {port: <{port_length}} {scheme: <{scheme_length}} {api_key: <{api_key_length}}{end_bold}".format(
bold="", bold="",
end_bold="", end_bold="",
@@ -421,7 +412,7 @@ def cluster_list(raw):
) )
) )
else: else:
echo(cluster) click.echo(cluster)
# Validate that the cluster is set for a given command # Validate that the cluster is set for a given command
@@ -429,7 +420,7 @@ def cluster_req(function):
@wraps(function) @wraps(function)
def validate_cluster(*args, **kwargs): def validate_cluster(*args, **kwargs):
if config.get("badcfg", None): if config.get("badcfg", None):
echo( click.echo(
'No cluster specified and no local pvcapid.yaml configuration found. Use "pvc cluster" to add a cluster API to connect to.' 'No cluster specified and no local pvcapid.yaml configuration found. Use "pvc cluster" to add a cluster API to connect to.'
) )
exit(1) exit(1)
@@ -472,24 +463,24 @@ def node_secondary(node, wait):
task_retcode, task_retdata = pvc_provisioner.task_status(config, None) task_retcode, task_retdata = pvc_provisioner.task_status(config, None)
if len(task_retdata) > 0: if len(task_retdata) > 0:
echo( click.echo(
"Note: There are currently {} active or queued provisioner jobs on the current primary node.".format( "Note: There are currently {} active or queued provisioner jobs on the current primary node.".format(
len(task_retdata) len(task_retdata)
) )
) )
echo( click.echo(
" These jobs will continue executing, but status will not be visible until the current" " These jobs will continue executing, but status will not be visible until the current"
) )
echo(" node returns to primary state.") click.echo(" node returns to primary state.")
echo() click.echo()
retcode, retmsg = pvc_node.node_coordinator_state(config, node, "secondary") retcode, retmsg = pvc_node.node_coordinator_state(config, node, "secondary")
if not retcode: if not retcode:
cleanup(retcode, retmsg) cleanup(retcode, retmsg)
else: else:
if wait: if wait:
echo(retmsg) click.echo(retmsg)
echo("Waiting for state transition... ", nl=False) click.echo("Waiting for state transition... ", nl=False)
# Every half-second, check if the API is reachable and the node is in secondary state # Every half-second, check if the API is reachable and the node is in secondary state
while True: while True:
try: try:
@@ -525,24 +516,24 @@ def node_primary(node, wait):
task_retcode, task_retdata = pvc_provisioner.task_status(config, None) task_retcode, task_retdata = pvc_provisioner.task_status(config, None)
if len(task_retdata) > 0: if len(task_retdata) > 0:
echo( click.echo(
"Note: There are currently {} active or queued provisioner jobs on the current primary node.".format( "Note: There are currently {} active or queued provisioner jobs on the current primary node.".format(
len(task_retdata) len(task_retdata)
) )
) )
echo( click.echo(
" These jobs will continue executing, but status will not be visible until the current" " These jobs will continue executing, but status will not be visible until the current"
) )
echo(" node returns to primary state.") click.echo(" node returns to primary state.")
echo() click.echo()
retcode, retmsg = pvc_node.node_coordinator_state(config, node, "primary") retcode, retmsg = pvc_node.node_coordinator_state(config, node, "primary")
if not retcode: if not retcode:
cleanup(retcode, retmsg) cleanup(retcode, retmsg)
else: else:
if wait: if wait:
echo(retmsg) click.echo(retmsg)
echo("Waiting for state transition... ", nl=False) click.echo("Waiting for state transition... ", nl=False)
# Every half-second, check if the API is reachable and the node is in secondary state # Every half-second, check if the API is reachable and the node is in secondary state
while True: while True:
try: try:
@@ -1027,7 +1018,7 @@ def vm_modify(
text=current_vm_cfgfile, require_save=True, extension=".xml" text=current_vm_cfgfile, require_save=True, extension=".xml"
) )
if new_vm_cfgfile is None: if new_vm_cfgfile is None:
echo("Aborting with no modifications.") click.echo("Aborting with no modifications.")
exit(0) exit(0)
else: else:
new_vm_cfgfile = new_vm_cfgfile.strip() new_vm_cfgfile = new_vm_cfgfile.strip()
@@ -1038,15 +1029,15 @@ def vm_modify(
new_vm_cfgfile = cfgfile.read() new_vm_cfgfile = cfgfile.read()
cfgfile.close() cfgfile.close()
echo( click.echo(
'Replacing configuration of VM "{}" with file "{}".'.format( 'Replacing configuration of VM "{}" with file "{}".'.format(
dom_name, cfgfile.name dom_name, cfgfile.name
) )
) )
# Show a diff and confirm # Show a diff and confirm
echo("Pending modifications:") click.echo("Pending modifications:")
echo("") click.echo("")
diff = list( diff = list(
difflib.unified_diff( difflib.unified_diff(
current_vm_cfgfile.split("\n"), current_vm_cfgfile.split("\n"),
@@ -1061,14 +1052,14 @@ def vm_modify(
) )
for line in diff: for line in diff:
if re.match(r"^\+", line) is not None: if re.match(r"^\+", line) is not None:
echo(colorama.Fore.GREEN + line + colorama.Fore.RESET) click.echo(colorama.Fore.GREEN + line + colorama.Fore.RESET)
elif re.match(r"^\-", line) is not None: elif re.match(r"^\-", line) is not None:
echo(colorama.Fore.RED + line + colorama.Fore.RESET) click.echo(colorama.Fore.RED + line + colorama.Fore.RESET)
elif re.match(r"^\^", line) is not None: elif re.match(r"^\^", line) is not None:
echo(colorama.Fore.BLUE + line + colorama.Fore.RESET) click.echo(colorama.Fore.BLUE + line + colorama.Fore.RESET)
else: else:
echo(line) click.echo(line)
echo("") click.echo("")
# Verify our XML is sensible # Verify our XML is sensible
try: try:
@@ -3606,7 +3597,7 @@ def ceph_volume_upload(pool, name, image_format, image_file):
""" """
if not os.path.exists(image_file): if not os.path.exists(image_file):
echo("ERROR: File '{}' does not exist!".format(image_file)) click.echo("ERROR: File '{}' does not exist!".format(image_file))
exit(1) exit(1)
retcode, retmsg = pvc_ceph.ceph_volume_upload( retcode, retmsg = pvc_ceph.ceph_volume_upload(
@@ -4478,7 +4469,7 @@ def provisioner_template_storage_disk_add(
""" """
if source_volume and (size or filesystem or mountpoint): if source_volume and (size or filesystem or mountpoint):
echo( click.echo(
'The "--source-volume" option is not compatible with the "--size", "--filesystem", or "--mountpoint" options.' 'The "--source-volume" option is not compatible with the "--size", "--filesystem", or "--mountpoint" options.'
) )
exit(1) exit(1)
@@ -4619,7 +4610,7 @@ def provisioner_userdata_add(name, filename):
try: try:
yaml.load(userdata, Loader=yaml.SafeLoader) yaml.load(userdata, Loader=yaml.SafeLoader)
except Exception as e: except Exception as e:
echo("Error: Userdata document is malformed") click.echo("Error: Userdata document is malformed")
cleanup(False, e) cleanup(False, e)
params = dict() params = dict()
@@ -4656,7 +4647,7 @@ def provisioner_userdata_modify(name, filename, editor):
# Grab the current config # Grab the current config
retcode, retdata = pvc_provisioner.userdata_info(config, name) retcode, retdata = pvc_provisioner.userdata_info(config, name)
if not retcode: if not retcode:
echo(retdata) click.echo(retdata)
exit(1) exit(1)
current_userdata = retdata["userdata"].strip() current_userdata = retdata["userdata"].strip()
@@ -4664,14 +4655,14 @@ def provisioner_userdata_modify(name, filename, editor):
text=current_userdata, require_save=True, extension=".yaml" text=current_userdata, require_save=True, extension=".yaml"
) )
if new_userdata is None: if new_userdata is None:
echo("Aborting with no modifications.") click.echo("Aborting with no modifications.")
exit(0) exit(0)
else: else:
new_userdata = new_userdata.strip() new_userdata = new_userdata.strip()
# Show a diff and confirm # Show a diff and confirm
echo("Pending modifications:") click.echo("Pending modifications:")
echo("") click.echo("")
diff = list( diff = list(
difflib.unified_diff( difflib.unified_diff(
current_userdata.split("\n"), current_userdata.split("\n"),
@@ -4686,14 +4677,14 @@ def provisioner_userdata_modify(name, filename, editor):
) )
for line in diff: for line in diff:
if re.match(r"^\+", line) is not None: if re.match(r"^\+", line) is not None:
echo(colorama.Fore.GREEN + line + colorama.Fore.RESET) click.echo(colorama.Fore.GREEN + line + colorama.Fore.RESET)
elif re.match(r"^\-", line) is not None: elif re.match(r"^\-", line) is not None:
echo(colorama.Fore.RED + line + colorama.Fore.RESET) click.echo(colorama.Fore.RED + line + colorama.Fore.RESET)
elif re.match(r"^\^", line) is not None: elif re.match(r"^\^", line) is not None:
echo(colorama.Fore.BLUE + line + colorama.Fore.RESET) click.echo(colorama.Fore.BLUE + line + colorama.Fore.RESET)
else: else:
echo(line) click.echo(line)
echo("") click.echo("")
click.confirm("Write modifications to cluster?", abort=True) click.confirm("Write modifications to cluster?", abort=True)
@@ -4708,7 +4699,7 @@ def provisioner_userdata_modify(name, filename, editor):
try: try:
yaml.load(userdata, Loader=yaml.SafeLoader) yaml.load(userdata, Loader=yaml.SafeLoader)
except Exception as e: except Exception as e:
echo("Error: Userdata document is malformed") click.echo("Error: Userdata document is malformed")
cleanup(False, e) cleanup(False, e)
params = dict() params = dict()
@@ -4857,20 +4848,20 @@ def provisioner_script_modify(name, filename, editor):
# Grab the current config # Grab the current config
retcode, retdata = pvc_provisioner.script_info(config, name) retcode, retdata = pvc_provisioner.script_info(config, name)
if not retcode: if not retcode:
echo(retdata) click.echo(retdata)
exit(1) exit(1)
current_script = retdata["script"].strip() current_script = retdata["script"].strip()
new_script = click.edit(text=current_script, require_save=True, extension=".py") new_script = click.edit(text=current_script, require_save=True, extension=".py")
if new_script is None: if new_script is None:
echo("Aborting with no modifications.") click.echo("Aborting with no modifications.")
exit(0) exit(0)
else: else:
new_script = new_script.strip() new_script = new_script.strip()
# Show a diff and confirm # Show a diff and confirm
echo("Pending modifications:") click.echo("Pending modifications:")
echo("") click.echo("")
diff = list( diff = list(
difflib.unified_diff( difflib.unified_diff(
current_script.split("\n"), current_script.split("\n"),
@@ -4885,14 +4876,14 @@ def provisioner_script_modify(name, filename, editor):
) )
for line in diff: for line in diff:
if re.match(r"^\+", line) is not None: if re.match(r"^\+", line) is not None:
echo(colorama.Fore.GREEN + line + colorama.Fore.RESET) click.echo(colorama.Fore.GREEN + line + colorama.Fore.RESET)
elif re.match(r"^\-", line) is not None: elif re.match(r"^\-", line) is not None:
echo(colorama.Fore.RED + line + colorama.Fore.RESET) click.echo(colorama.Fore.RED + line + colorama.Fore.RESET)
elif re.match(r"^\^", line) is not None: elif re.match(r"^\^", line) is not None:
echo(colorama.Fore.BLUE + line + colorama.Fore.RESET) click.echo(colorama.Fore.BLUE + line + colorama.Fore.RESET)
else: else:
echo(line) click.echo(line)
echo("") click.echo("")
click.confirm("Write modifications to cluster?", abort=True) click.confirm("Write modifications to cluster?", abort=True)
@@ -4997,7 +4988,7 @@ def provisioner_ova_upload(name, filename, pool):
Storage templates, provisioning scripts, and arguments for OVA-type profiles will be ignored and should not be set. Storage templates, provisioning scripts, and arguments for OVA-type profiles will be ignored and should not be set.
""" """
if not os.path.exists(filename): if not os.path.exists(filename):
echo("ERROR: File '{}' does not exist!".format(filename)) click.echo("ERROR: File '{}' does not exist!".format(filename))
exit(1) exit(1)
params = dict() params = dict()
@@ -5328,19 +5319,19 @@ def provisioner_create(name, profile, wait_flag, define_flag, start_flag, script
if retcode and wait_flag: if retcode and wait_flag:
task_id = retdata task_id = retdata
echo("Task ID: {}".format(task_id)) click.echo("Task ID: {}".format(task_id))
echo() click.echo()
# Wait for the task to start # Wait for the task to start
echo("Waiting for task to start...", nl=False) click.echo("Waiting for task to start...", nl=False)
while True: while True:
time.sleep(1) time.sleep(1)
task_status = pvc_provisioner.task_status(config, task_id, is_watching=True) task_status = pvc_provisioner.task_status(config, task_id, is_watching=True)
if task_status.get("state") != "PENDING": if task_status.get("state") != "PENDING":
break break
echo(".", nl=False) click.echo(".", nl=False)
echo(" done.") click.echo(" done.")
echo() click.echo()
# Start following the task state, updating progress as we go # Start following the task state, updating progress as we go
total_task = task_status.get("total") total_task = task_status.get("total")
@@ -5361,7 +5352,7 @@ def provisioner_create(name, profile, wait_flag, define_flag, start_flag, script
maxlen = curlen maxlen = curlen
lendiff = maxlen - curlen lendiff = maxlen - curlen
overwrite_whitespace = " " * lendiff overwrite_whitespace = " " * lendiff
echo( click.echo(
" " + task_status.get("status") + overwrite_whitespace, " " + task_status.get("status") + overwrite_whitespace,
nl=False, nl=False,
) )
@@ -5371,7 +5362,7 @@ def provisioner_create(name, profile, wait_flag, define_flag, start_flag, script
if task_status.get("state") == "SUCCESS": if task_status.get("state") == "SUCCESS":
bar.update(total_task - last_task) bar.update(total_task - last_task)
echo() click.echo()
retdata = task_status.get("state") + ": " + task_status.get("status") retdata = task_status.get("state") + ": " + task_status.get("status")
cleanup(retcode, retdata) cleanup(retcode, retdata)
@@ -5600,7 +5591,7 @@ def task_init(confirm_flag, overwrite_flag):
exit(0) exit(0)
# Easter-egg # Easter-egg
echo("Some music while we're Layin' Pipe? https://youtu.be/sw8S_Kv89IU") click.echo("Some music while we're Layin' Pipe? https://youtu.be/sw8S_Kv89IU")
retcode, retmsg = pvc_cluster.initialize(config, overwrite_flag) retcode, retmsg = pvc_cluster.initialize(config, overwrite_flag)
cleanup(retcode, retmsg) cleanup(retcode, retmsg)
@@ -5645,19 +5636,10 @@ def task_init(confirm_flag, overwrite_flag):
default=False, default=False,
help='Allow unsafe operations without confirmation/"--yes" argument.', help='Allow unsafe operations without confirmation/"--yes" argument.',
) )
@click.option(
"--colour",
"--color",
"_colour",
envvar="PVC_COLOUR",
is_flag=True,
default=False,
help="Force colourized output.",
)
@click.option( @click.option(
"--version", is_flag=True, callback=print_version, expose_value=False, is_eager=True "--version", is_flag=True, callback=print_version, expose_value=False, is_eager=True
) )
def cli(_cluster, _debug, _quiet, _unsafe, _colour): def cli(_cluster, _debug, _quiet, _unsafe):
""" """
Parallel Virtual Cluster CLI management tool Parallel Virtual Cluster CLI management tool
@@ -5669,9 +5651,7 @@ def cli(_cluster, _debug, _quiet, _unsafe, _colour):
"PVC_QUIET": Suppress stderr connection output from client instead of using --quiet/-q "PVC_QUIET": Suppress stderr connection output from client instead of using --quiet/-q
"PVC_UNSAFE": Always suppress confirmations instead of needing --unsafe/-u or --yes/-y; USE WITH EXTREME CARE "PVC_UNSAFE": Suppress confirmation requirements instead of using --unsafe/-u or --yes/-y; USE WITH EXTREME CARE
"PVC_COLOUR": Force colour on the output even if Click determines it is not a console (e.g. with 'watch')
If no PVC_CLUSTER/--cluster is specified, attempts first to load the "local" cluster, checking If no PVC_CLUSTER/--cluster is specified, attempts first to load the "local" cluster, checking
for an API configuration in "/etc/pvc/pvcapid.yaml". If this is also not found, abort. for an API configuration in "/etc/pvc/pvcapid.yaml". If this is also not found, abort.
@@ -5683,14 +5663,13 @@ def cli(_cluster, _debug, _quiet, _unsafe, _colour):
if not config.get("badcfg", None): if not config.get("badcfg", None):
config["debug"] = _debug config["debug"] = _debug
config["unsafe"] = _unsafe config["unsafe"] = _unsafe
config["colour"] = _colour
if not _quiet: if not _quiet:
if config["api_scheme"] == "https" and not config["verify_ssl"]: if config["api_scheme"] == "https" and not config["verify_ssl"]:
ssl_unverified_msg = " (unverified)" ssl_unverified_msg = " (unverified)"
else: else:
ssl_unverified_msg = "" ssl_unverified_msg = ""
echo( click.echo(
'Using cluster "{}" - Host: "{}" Scheme: "{}{}" Prefix: "{}"'.format( 'Using cluster "{}" - Host: "{}" Scheme: "{}{}" Prefix: "{}"'.format(
config["cluster"], config["cluster"],
config["api_host"], config["api_host"],
@@ -5700,9 +5679,11 @@ def cli(_cluster, _debug, _quiet, _unsafe, _colour):
), ),
err=True, err=True,
) )
echo("", err=True) click.echo("", err=True)
config = dict()
# #
# Click command tree # Click command tree
# #

View File

@@ -2,7 +2,7 @@ from setuptools import setup
setup( setup(
name="pvc", name="pvc",
version="0.9.43", version="0.9.42",
packages=["pvc", "pvc.cli_lib"], packages=["pvc", "pvc.cli_lib"],
install_requires=[ install_requires=[
"Click", "Click",

15
debian/changelog vendored
View File

@@ -1,18 +1,3 @@
pvc (0.9.43-0) unstable; urgency=high
* [Packaging] Fixes a bad test in postinst
* [CLI] Adds support for removing VM interfaces by MAC address
* [CLI] Modifies the default restart + live behaviour to prefer the explicit restart
* [CLI] Adds support for adding additional VM interfaces in the same network
* [CLI] Various ordering and message fixes
* [Node Daemon] Adds additional delays and retries to fencing actions
* [All] Adds Black formatting for Python code and various script/hook cleanups
* [CLI/API] Adds automatic shutdown or stop when disabling a VM
* [CLI] Adds support for forcing colourized output
* [Docs] Remove obsolete Ansible and Testing manuals
-- Joshua M. Boniface <joshua@boniface.me> Mon, 08 Nov 2021 02:27:38 -0500
pvc (0.9.42-0) unstable; urgency=high pvc (0.9.42-0) unstable; urgency=high
* [Documentation] Reworks and updates various documentation sections * [Documentation] Reworks and updates various documentation sections

View File

@@ -95,7 +95,7 @@ The CLI client is self-documenting using the `-h`/`--help` arguments throughout,
The overall management, deployment, bootstrapping, and configuring of nodes is accomplished via a set of Ansible roles and playbooks, found in the [`pvc-ansible` repository](https://github.com/parallelvirtualcluster/pvc-ansible), and nodes are installed via a custom installer ISO generated by the [`pvc-installer` repository](https://github.com/parallelvirtualcluster/pvc-installer). Once the cluster is set up, nodes can be added, replaced, updated, or reconfigured using this Ansible framework. The overall management, deployment, bootstrapping, and configuring of nodes is accomplished via a set of Ansible roles and playbooks, found in the [`pvc-ansible` repository](https://github.com/parallelvirtualcluster/pvc-ansible), and nodes are installed via a custom installer ISO generated by the [`pvc-installer` repository](https://github.com/parallelvirtualcluster/pvc-installer). Once the cluster is set up, nodes can be added, replaced, updated, or reconfigured using this Ansible framework.
Details about the Ansible setup and node installer can be found in those repositories. The Ansible configuration and architecture manual can be found at the [Ansible manual page](/manuals/ansible).
The [getting started documentation](/getting-started) provides a walk-through of using these tools to bootstrap a new cluster. The [getting started documentation](/getting-started) provides a walk-through of using these tools to bootstrap a new cluster.

View File

@@ -210,7 +210,7 @@ The upstream network functions as the main upstream for the cluster nodes, provi
The floating IP address in the cluster network can be used as a single point of communication with the active primary node, for instance to access the DNS aggregator instance or the management API. PVC provides only limited access control mechanisms to the API interface, so the upstream network should always be protected by a firewall; running PVC directly accessible on the Internet is strongly discouraged and may post a serious security risk, and all access should be restricted to the smallest possible set of remote systems. The floating IP address in the cluster network can be used as a single point of communication with the active primary node, for instance to access the DNS aggregator instance or the management API. PVC provides only limited access control mechanisms to the API interface, so the upstream network should always be protected by a firewall; running PVC directly accessible on the Internet is strongly discouraged and may post a serious security risk, and all access should be restricted to the smallest possible set of remote systems.
Nodes in this network are generally assigned static IP addresses which are configured at node install time in the [Ansible deployment configuration](https://github.com/parallelvirtualcluster/pvc-ansible). Nodes in this network are generally assigned static IP addresses which are configured at node install time and in the [Ansible deployment configuration](/manuals/ansible).
The upstream router should be able to handle static routes to the PVC cluster, or form a BGP neighbour relationship with the coordinator nodes and/or floating IP address to learn routes to the managed client networks. The upstream router should be able to handle static routes to the PVC cluster, or form a BGP neighbour relationship with the coordinator nodes and/or floating IP address to learn routes to the managed client networks.

View File

@@ -14,7 +14,7 @@ This guide will walk you through setting up a simple 3-node PVC cluster from scr
0. Create an initial `hosts` inventory, using `hosts.default` in the `pvc-ansible` repo as a template. You can manage multiple PVC clusters ("sites") from the Ansible repository easily, however for simplicity you can use the simple name `cluster` for your initial site. Define the 3 hostnames you will use under the site group; usually the provided names of `pvchv1`, `pvchv2`, and `pvchv3` are sufficient, though you may use any hostname pattern you wish. It is *very important* that the names all contain a sequential number, however, as this is used by various components. 0. Create an initial `hosts` inventory, using `hosts.default` in the `pvc-ansible` repo as a template. You can manage multiple PVC clusters ("sites") from the Ansible repository easily, however for simplicity you can use the simple name `cluster` for your initial site. Define the 3 hostnames you will use under the site group; usually the provided names of `pvchv1`, `pvchv2`, and `pvchv3` are sufficient, though you may use any hostname pattern you wish. It is *very important* that the names all contain a sequential number, however, as this is used by various components.
0. Create an initial set of `group_vars` for your cluster at `group_vars/<cluster>`, using the `group_vars/default` in the `pvc-ansible` repo as a template. Inside these group vars are two main files: `base.yml` and `pvc.yml`. These example files are well-documented; read them carefully and specify all required options before proceeding, and reference the [Ansible setup examples](https://github.com/parallelvirtualcluster/pvc-ansible) for more detailed descriptions of the options. 0. Create an initial set of `group_vars` for your cluster at `group_vars/<cluster>`, using the `group_vars/default` in the `pvc-ansible` repo as a template. Inside these group vars are two main files: `base.yml` and `pvc.yml`. These example files are well-documented; read them carefully and specify all required options before proceeding, and reference the [Ansible manual](/manuals/ansible) for more detailed descriptions of the options.
* `base.yml` configures the `base` role and some common per-cluster configurations such as an upstream domain, a root password, a set of administrative users, various hardware configuration items, as well as and most importantly, the basic network configuration of the nodes. Make special note of the various items that must be generated such as passwords; these should all be cluster-unique. * `base.yml` configures the `base` role and some common per-cluster configurations such as an upstream domain, a root password, a set of administrative users, various hardware configuration items, as well as and most importantly, the basic network configuration of the nodes. Make special note of the various items that must be generated such as passwords; these should all be cluster-unique.

949
docs/manuals/ansible.md Normal file
View File

@@ -0,0 +1,949 @@
# PVC Ansible architecture
The PVC Ansible setup and management framework is written in Ansible. It consists of two roles: `base` and `pvc`.
## Base role
The Base role configures a node to a specific, standard base Debian system, with a number of PVC-specific tweaks. Some examples include:
* Installing the custom PVC repository hosted at Boniface Labs.
* Removing several unnecessary packages and installing numerous additional packages.
* Automatically configuring network interfaces based on the `group_vars` configuration.
* Configuring several general `sysctl` settings for optimal performance.
* Installing and configuring rsyslog, postfix, ntpd, ssh, and fail2ban.
* Creating the users specified in the `group_vars` configuration.
* Installing custom MOTDs, bashrc files, vimrc files, and other useful configurations for each user.
The end result is a standardized "PVC node" system ready to have the daemons installed by the PVC role.
The Base role is optional: if an administrator so chooses, they can bypass this role and configure things manually. That said, for the proper functioning of the PVC role, the Base role should always be applied first.
## PVC role
The PVC role configures all the dependencies of PVC, including storage, networking, and databases, then installs the PVC daemon itself. Specifically, it will, in order:
* Install Ceph, configure and bootstrap a new cluster if `bootstrap=yes` is set, configure the monitor and manager daemons, and start up the cluster ready for the addition of OSDs via the client interface (coordinators only).
* Install, configure, and if `bootstrap=yes` is set, bootstrap a Zookeeper cluster (coordinators only).
* Install, configure, and if `bootstrap=yes` is set, bootstrap a Patroni PostgreSQL cluster for the PowerDNS aggregator (coordinators only).
* Install and configure Libvirt.
* Install and configure FRRouting.
* Install and configure the main PVC daemon and API client.
* If `bootstrap=yes` is set, initialize the PVC cluster (`pvc task init`).
## Completion
Once the entire playbook has run for the first time against a given host, the host will be rebooted to apply all the configured services. On startup, the system should immediately launch the PVC daemon, check in to the Zookeeper cluster, and become ready. The node will be in `flushed` state on its first boot; the administrator will need to run `pvc node unflush <node>` to set the node into active state ready to handle virtual machines. On the first bootstrap run, the administrator will also have to configure storage block devices (OSDs), networks, etc. For full details, see [the main getting started page](/getting-started).
## General usage
### Initial setup
After cloning the `pvc-ansible` repo, set up a set of configurations for your cluster. One copy of the `pvc-ansible` repository can manage an unlimited number of clusters with differing configurations.
All files created during initial setup should be stored outside the `pvc-ansible` repository, as they will be ignored by the main Git repository by default. It is recommended to set up a separate folder, either standalone or as its own Git repository, to contain your files, then symlink them back into the main repository at the appropriate places outlined below.
Create a `hosts` file containing the clusters as groups, then the list of hosts within each cluster group. The `hosts.default` file can be used as a template.
Create a `files/<cluster>` folder to hold the cluster-created static configuration files. Until the first bootstrap run, this directory will be empty.
Create a `group_vars/<cluster>` folder to hold the cluster configuration variables. The `group_vars/default` directory can be used as an example.
### Bootstrapping a cluster
Before bootstrapping a cluster, see the section on [PVC Ansible configuration variables](/manuals/ansible/#pvc-ansible-configuration-variables) to configure the cluster.
Bootstrapping a cluster can be done using the main `pvc.yml` playbook. Generally, a bootstrap run should be limited to the coordinators of the cluster to avoid potential race conditions or strange bootstrap behaviour. The special variable `bootstrap=yes` must be set to indicate that a cluster bootstrap is to be requested.
**WARNING:** Do not run the playbook with `bootstrap=yes` *except during the very first run against a freshly-installed set of coordinator nodes*. Running it against an existing cluster will result in the complete failure of the cluster, the destruction of all data, or worse.
### Adding new nodes
Adding new nodes to an existing cluster can be done using the main `pvc.yml` playbook. The new node(s) should be added to the `group_vars` configuration `node_list`, then the playbook run against all hosts in the cluster with no special flags or limits. This will ensure the entire cluster is updated with the new information, while simultaneously configuring the new node.
### Reconfiguration and software updates
For general, day-to-day software updates such as base system updates or upgrading to newer PVC versions, a special playbook, `oneshot/update-pvc-cluster.yml`, is provided. This playbook will gracefully update and upgrade all PVC nodes in the cluster, flush them, reboot them, and then unflush them. This operation should be completely transparent to VMs on the cluster.
For more advanced updates, such as changing configurations in the `group_vars`, the main `pvc.yml` playbook can be used to deploy the changes across all hosts. Note that this may cause downtime due to node reboots if certain configurations change, and it is not recommended to use this process frequently.
# PVC Ansible configuration manual
This manual documents the various `group_vars` configuration options for the `pvc-ansible` framework. We assume that the administrator is generally familiar with Ansible and its operation.
## PVC Ansible configuration variables
The `group_vars` folder contains configuration variables for all clusters managed by your local copy of `pvc-ansible`. Each cluster has a distinct set of `group_vars` to allow different configurations for each cluster.
This section outlines the various configuration options available in the `group_vars` configuration; the `group_vars/default` directory contains an example set of variables, split into two files (`base.yml` and `pvc.yml`), that set every listed configuration option.
### Conventions
* Settings may be `required`, `optional`, or `ignored`. Ignored settings are used for human-readability in the configuration but are ignored by the actual role.
* Settings may `depends` on other settings. This indicates that, if one setting is enabled, the other setting is very likely `required` by that setting.
* If a particular `<setting>` is marked `optional`, and a latter setting is marked `depends on <setting>`, the latter is ignored unless the `<setting>` is specified.
### `base.yml`
Example configuration:
```
---
cluster_group: mycluster
timezone_location: Canada/Eastern
local_domain: upstream.local
recursive_dns_servers:
- 8.8.8.8
- 8.8.4.4
recursive_dns_search_domains:
- "{{ local_domain }}"
username_ipmi_host: "pvc"
passwd_ipmi_host: "MyPassword2019"
passwd_root: MySuperSecretPassword # Not actually used by the playbook, but good for reference
passwdhash_root: "$6$shadowencryptedpassword"
logrotate_keepcount: 7
logrotate_interval: daily
username_email_root: root
hosts:
- name: testhost
ip: 127.0.0.1
admin_users:
- name: "myuser"
uid: 500
keys:
- "ssh-ed25519 MyKey 2019-06"
networks:
"bondU":
device: "bondU"
type: "bond"
bond_mode: "802.3ad"
bond_devices:
- "enp1s0f0"
- "enp1s0f1"
mtu: 9000
"upstream":
device: "vlan1000"
type: "vlan"
raw_device: "bondU"
mtu: 1500
domain: "{{ local_domain }}"
subnet: "192.168.100.0/24"
floating_ip: "192.168.100.10/24"
gateway_ip: "192.168.100.1"
"cluster":
device: "vlan1001"
type: "vlan"
raw_device: "bondU"
mtu: 1500
domain: "pvc-cluster.local"
subnet: "10.0.0.0/24"
floating_ip: "10.0.0.254/24"
"storage":
device: "vlan1002"
type: "vlan"
raw_device: "bondU"
mtu: 9000
domain: "pvc-storage.local"
subnet: "10.0.1.0/24"
floating_ip: "10.0.1.254/24"
```
#### `cluster_group`
* *required*
The name of the Ansible PVC cluster group in the `hosts` inventory.
#### `timezone_location`
* *required*
The TZ database format name of the local timezone, e.g. `America/Toronto` or `Canada/Eastern`.
#### `local_domain`
* *required*
The domain name of the PVC cluster nodes. This is the domain portion of the FQDN of each node, and should usually be the domain of the `upstream` network.
#### `recursive_dns_servers`
* *optional*
A list of recursive DNS servers to be used by cluster nodes. Defaults to Google Public DNS if unspecified.
#### `recursive_dns_search_domains`
* *optional*
A list of domain names (must explicitly include `local_domain` if desired) to be used for shortname DNS lookups.
#### `username_ipmi_host`
* *optional*
* *requires* `passwd_ipmi_host`
The IPMI username used by PVC to communicate with the node management controllers. This user should be created on each node's IPMI before deploying the cluster, and should have, at minimum, permission to read and alter the node's power state.
#### `passwd_ipmi_host`
* *optional*
* *requires* `username_ipmi_host`
The IPMI password, in plain text, used by PVC to communicate with the node management controllers.
Generate using `pwgen -s 16` and adjusting length as required.
#### `passwd_root`
* *ignored*
Used only for reference, the plain-text root password for `passwdhash_root`.
#### `passwdhash_root`
* *required*
The `/etc/shadow`-encoded root password for all nodes.
Generate using `pwgen -s 16`, adjusting length as required, and encrypt using `mkpasswd -m sha-512 <password> $( pwgen -s 8 )`.
#### `logrotate_keepcount`
* *required*
The number of `logrotate_interval` to keep system logs.
#### `logrotate_interval`
* *required*
The interval for rotating system logs. Must be one of: `hourly`, `daily`, `weekly`, `monthly`.
#### `username_email_root`
* *required*
The email address of the root user, at the `local_domain`. Usually `root`, but can be something like `admin` if needed.
#### `hosts`
* *optional*
A list of additional entries for the `/etc/hosts` files on the nodes. Each list element contains the following sub-elements:
##### `name`
The hostname of the entry.
##### `ip`
The IP address of the entry.
#### `admin_users`
* *required*
A list of non-root users, their UIDs, and SSH public keys, that are able to access the server. At least one non-root user should be specified to administer the nodes. These users will not have a password set; only key-based login is supported. Each list element contains the following sub-elements:
##### `name`
* *required*
The name of the user.
##### `uid`
* *required*
The Linux UID of the user. Should usually start at 500 and increment for each user.
##### `keys`
* *required*
A list of SSH public key strings, in `authorized_keys` line format, for the user.
#### `networks`
* *required*
A dictionary of networks to configure on the nodes.
The key will be used to "name" the interface file under `/etc/network/interfaces.d`, but otherwise the `device` is the real name of the device (e.g. `iface [device] inet ...`.
The three required networks are: `upstream`, `cluster`, `storage`. If `storage` is configured identically to `cluster`, the two networks will be collapsed into one; for details on this, please see the [documentation about the storage network](/cluster-architecture/#storage-connecting-ceph-daemons-with-each-other-and-with-osds).
Additional networks can also be specified here to automate their configuration. In the above example, a "bondU" interface is configured, which the remaining required networks use as their `raw_device`.
Within each `network` element, the following options may be specified:
##### `device`
* *required*
The real network device name.
##### `type`
* *required*
The type of network device. Must be one of: `nic`, `bond`, `vlan`.
##### `bond_mode`
* *required* if `type` is `bond`
The Linux bonding/`ifenslave` mode for the cluster. Must be a valid Linux bonding mode.
##### `bond_devices`
* *required* if `type` is `bond`
The list of physical (`nic`) interfaces to bond.
##### `raw_device`
* *required* if `type` is `vlan`
The underlying interface for the vLAN.
##### `mtu`
* *required*
The MTU of the interface. Ensure that the underlying network infrastructure can support the configured MTU.
##### `domain`
* *required*
The domain name for the network. For the "upstream" network, should usually be `local_domain`.
##### `subnet`
* *required*
The CIDR-formatted subnet of the network. Individual nodes will be configured with specific IPs in this network in a later setting.
##### `floating_ip`
* *required*
A CIDR-formatted IP address in the network to act as the cluster floating IP address. This IP address will follow the primary coordinator.
##### `gateway_ip`
* *optional*
A non-CIDR gateway IP address for the network.
### `pvc.yml`
Example configuration:
```
---
pvc_log_to_file: False
pvc_log_to_stdout: True
pvc_log_colours: False
pvc_log_dates: False
pvc_log_keepalives: True
pvc_log_keepalive_cluster_details: True
pvc_log_keepalive_storage_details: True
pvc_log_console_lines: 1000
pvc_vm_shutdown_timeout: 180
pvc_keepalive_interval: 5
pvc_fence_intervals: 6
pvc_suicide_intervals: 0
pvc_fence_successful_action: migrate
pvc_fence_failed_action: None
pvc_osd_memory_limit: 4294967296
pvc_zookeeper_heap_limit: 256M
pvc_zookeeper_stack_limit: 512M
pvc_api_listen_address: "0.0.0.0"
pvc_api_listen_port: "7370"
pvc_api_secret_key: ""
pvc_api_enable_authentication: False
pvc_api_tokens:
- description: "myuser"
token: ""
pvc_api_enable_ssl: False
pvc_api_ssl_cert_path: /etc/ssl/pvc/cert.pem
pvc_api_ssl_cert: >
-----BEGIN CERTIFICATE-----
MIIxxx
-----END CERTIFICATE-----
pvc_api_ssl_key_path: /etc/ssl/pvc/key.pem
pvc_api_ssl_key: >
-----BEGIN PRIVATE KEY-----
MIIxxx
-----END PRIVATE KEY-----
pvc_ceph_storage_secret_uuid: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
pvc_dns_database_name: "pvcdns"
pvc_dns_database_user: "pvcdns"
pvc_dns_database_password: "xxxxxxxx"
pvc_api_database_name: "pvcapi"
pvc_api_database_user: "pcapi"
pvc_api_database_password: "xxxxxxxx"
pvc_replication_database_user: "replicator"
pvc_replication_database_password: "xxxxxxxx"
pvc_superuser_database_user: "postgres"
pvc_superuser_database_password: "xxxxxxxx"
pvc_asn: "65500"
pvc_routers:
- "192.168.100.1"
pvc_nodes:
- hostname: "pvchv1"
is_coordinator: yes
node_id: 1
router_id: "192.168.100.11"
upstream_ip: "192.168.100.11"
upstream_cidr: 24
cluster_ip: "10.0.0.1"
cluster_cidr: 24
storage_ip: "10.0.1.1"
storage_cidr: 24
ipmi_host: "pvchv1-lom.{{ local_domain }}"
ipmi_user: "{{ username_ipmi_host }}"
ipmi_password: "{{ passwd_ipmi_host }}"
- hostname: "pvchv2"
is_coordinator: yes
node_id: 2
router_id: "192.168.100.12"
upstream_ip: "192.168.100.12"
upstream_cidr: 24
cluster_ip: "10.0.0.2"
cluster_cidr: 24
storage_ip: "10.0.1.2"
storage_cidr: 24
ipmi_host: "pvchv2-lom.{{ local_domain }}"
ipmi_user: "{{ username_ipmi_host }}"
ipmi_password: "{{ passwd_ipmi_host }}"
- hostname: "pvchv3"
is_coordinator: yes
node_id: 3
router_id: "192.168.100.13"
upstream_ip: "192.168.100.13"
upstream_cidr: 24
cluster_ip: "10.0.0.3"
cluster_cidr: 24
storage_ip: "10.0.1.3"
storage_cidr: 24
ipmi_host: "pvchv3-lom.{{ local_domain }}"
ipmi_user: "{{ username_ipmi_host }}"
ipmi_password: "{{ passwd_ipmi_host }}"
pvc_bridge_device: bondU
pvc_bridge_mtu: 1500
pvc_sriov_enable: True
pvc_sriov_device:
- phy: ens1f0
mtu: 9000
vfcount: 6
pvc_upstream_device: "{{ networks['upstream']['device'] }}"
pvc_upstream_mtu: "{{ networks['upstream']['mtu'] }}"
pvc_upstream_domain: "{{ networks['upstream']['domain'] }}"
pvc_upstream_subnet: "{{ networks['upstream']['subnet'] }}"
pvc_upstream_floatingip: "{{ networks['upstream']['floating_ip'] }}"
pvc_upstream_gatewayip: "{{ networks['upstream']['gateway_ip'] }}"
pvc_cluster_device: "{{ networks['cluster']['device'] }}"
pvc_cluster_mtu: "{{ networks['cluster']['mtu'] }}"
pvc_cluster_domain: "{{ networks['cluster']['domain'] }}"
pvc_cluster_subnet: "{{ networks['cluster']['subnet'] }}"
pvc_cluster_floatingip: "{{ networks['cluster']['floating_ip'] }}"
pvc_storage_device: "{{ networks['storage']['device'] }}"
pvc_storage_mtu: "{{ networks['storage']['mtu'] }}"
pvc_storage_domain: "{{ networks['storage']['domain'] }}"
pvc_storage_subnet: "{{ networks['storage']['subnet'] }}"
pvc_storage_floatingip: "{{ networks['storage']['floating_ip'] }}"
```
#### `pvc_log_to_file`
* *optional*
Whether to log PVC output to the file `/var/log/pvc/pvc.log`. Must be one of, unquoted: `True`, `False`.
If unset, a default value of "False" is set in the role defaults.
#### `pvc_log_to_stdout`
* *optional*
Whether to log PVC output to stdout, i.e. `journald`. Must be one of, unquoted: `True`, `False`.
If unset, a default value of "True" is set in the role defaults.
#### `pvc_log_colours`
* *optional*
Whether to include ANSI coloured prompts (`>>>`) for status in the log output. Must be one of, unquoted: `True`, `False`.
Requires `journalctl -o cat` or file logging in order to be visible and useful.
If set to False, the prompts will instead be text values.
If unset, a default value of "True" is set in the role defaults.
#### `pvc_log_dates`
* *optional*
Whether to include dates in the log output. Must be one of, unquoted: `True`, `False`.
Requires `journalctl -o cat` or file logging in order to be visible and useful (and not clutter the logs with duplicate dates).
If unset, a default value of "False" is set in the role defaults.
#### `pvc_log_keepalives`
* *optional*
Whether to log the regular keepalive messages. Must be one of, unquoted: `True`, `False`.
If unset, a default value of "True" is set in the role defaults.
#### `pvc_log_keepalive_cluster_details`
* *optional*
* *ignored* if `pvc_log_keepalives` is `False`
Whether to log cluster and node details during keepalive messages. Must be one of, unquoted: `True`, `False`.
If unset, a default value of "True" is set in the role defaults.
#### `pvc_log_keepalive_storage_details`
* *optional*
* *ignored* if `pvc_log_keepalives` is `False`
Whether to log storage cluster details during keepalive messages. Must be one of, unquoted: `True`, `False`.
If unset, a default value of "True" is set in the role defaults.
#### `pvc_log_console_lines`
* *optional*
The number of output console lines to log for each VM, to be used by the console log endpoints (`pvc vm log`).
If unset, a default value of "1000" is set in the role defaults.
#### `pvc_vm_shutdown_timeout`
* *optional*
The number of seconds to wait for a VM to `shutdown` before it is forced off.
A value of "0" disables this functionality.
If unset, a default value of "180" is set in the role defaults.
#### `pvc_keepalive_interval`
* *optional*
The number of seconds between node keepalives.
If unset, a default value of "5" is set in the role defaults.
**WARNING**: Changing this value is not recommended except in exceptional circumstances.
#### `pvc_fence_intervals`
* *optional*
The number of keepalive intervals to be missed before other nodes consider a node `dead` and trigger the fencing process. The total time elapsed will be `pvc_keepalive_interval * pvc_fence_intervals`.
If unset, a default value of "6" is set in the role defaults.
**NOTE**: This is not the total time until a node is fenced. A node has a further 6 (hardcoded) `pvc_keepalive_interval`s ("saving throw" attepmts) to try to send a keepalive before it is actually fenced. Thus, with the default values, this works out to a total of 60 +/- 5 seconds between a node crashing, and it being fenced. An administrator of a very important cluster may want to set this lower, perhaps to 2, or even 1, leaving only the "saving throws", though this is not recommended for most clusters, due to timing overhead from various other subsystems.
#### `pvc_suicide intervals`
* *optional*
The number of keepalive intervals without the ability to send a keepalive before a node considers *itself* to be dead and reboots itself.
A value of "0" disables this functionality.
If unset, a default value of "0" is set in the role defaults.
**WARNING**: This option is provided to allow additional flexibility in fencing behaviour. Normally, it is not safe to set a `pvc_fence_failed_action` of `migrate`, since if the other nodes cannot fence a node its VMs cannot be safely started on other nodes. This would also apply to nodes without IPMI-over-LAN which could not be fenced normally. This option provides an alternative way to guarantee this safety, at least in situations where the node can still reliably shut itself down (i.e. it is not hard-locked). The administrator should however take special care and thoroughly test their system before using these alternative fencing options in production, as the results could be disasterous.
#### `pvc_fence_successful_action`
* *optional*
The action the cluster should take upon a successful node fence with respect to running VMs. Must be one of, unquoted: `migrate`, `None`.
If unset, a default value of "migrate" is set in the role defaults.
An administrator can set the value "None" to disable automatic VM recovery migrations after a node fence.
#### `pvc_fence_failed_action`
* *optional*
The action the cluster should take upon a failed node fence with respect to running VMs. Must be one of, unquoted: `migrate`, `None`.
If unset, a default value of "None" is set in the role defaults.
**WARNING**: See the warning in the above `pvc_suicide_intervals` section for details on the purpose of this option. Do not set this option to "migrate" unless you have also set `pvc_suicide_intervals` to a non-"0" value and understand the caveats and risks.
#### `pvc_fence_migrate_target_selector`
* *optional*
The migration selector to use when running a `migrate` command after a node fence. Must be one of, unquoted: `mem`, `load`, `vcpu`, `vms`.
If unset, a default value of "mem" is set in the role defaults.
**NOTE**: These values map to the standard VM meta `selector` options, and determine how nodes select where to run the migrated VMs.
#### `pvc_osd_memory_limit`
* *optional*
The memory limit, in bytes, to pass to the Ceph OSD processes. Only set once, during cluster bootstrap; subsequent changes to this value must be manually made in the `files/*/ceph.conf` static configuration for the cluster in question.
If unset, a default value of "4294967296" (i.e. 4GB) is set in the role defaults.
As per Ceph documentation, the minimum value possible is "939524096" (i.e. ~1GB), and the default matches the Ceph system default. Setting a lower value is only recommended for systems with relatively low memory availability, where the default of 4GB per OSD is too large; it is recommended to increase the total system memory first before tweaking this setting to ensure optimal storage performance across all workloads.
#### `pvc_zookeeper_heap_limit`
* *optional*
The memory limit to pass to the Zookeeper Java process for its heap.
If unset, a default vlue of "256M" is set in the role defaults.
The administrator may set this to a lower value on memory-constrained systems or if the memory usage of the Zookeeper process becomes excessive.
#### `pvc_zookeeper_stack_limit`
* *optional*
The memory limit to pass to the Zookeeper Java process for its stack.
If unset, a defautl value of "512M" is set in the role defaults.
The administrator may set this to a lower value on memory-constrained systems or if the memory usage of the Zookeeper process becomes excessive.
#### `pvc_api_listen_address`
* *required*
Address for the API to listen on; `0.0.0.0` indicates all interfaces.
#### `pvc_api_listen_port`
* *required*
Port for the API to listen on.
#### `pvc_api_enable_authentication`
* *required*
Whether to enable authentication on the API. Must be one of, unquoted: `True`, `False`.
#### `pvc_api_secret_key`
* *required*
A secret key used to sign and encrypt API Flask cookies.
Generate using `uuidgen` or `pwgen -s 32` and adjusting length as required.
#### `pvc_api_tokens`
* *required*
A list of API tokens that are allowed to access the PVC API. At least one should be specified. Each list element contains the following sub-elements:
##### `description`
* *required*
A human-readable description of the token. Not parsed anywhere, but used to make this list human-readable and identify individual tokens by their use.
##### `token`
* *required*
The API token.
Generate using `uuidgen` or `pwgen -s 32` and adjusting length as required.
#### `pvc_api_enable_ssl`
* *required*
Whether to enable SSL for the PVC API. Must be one of, unquoted: `True`, `False`.
#### `pvc_api_ssl_cert_path`
* *optional*
* *required* if `pvc_api_enable_ssl` is `True` and `pvc_api_ssl_cert` is not set.
The path to an (existing) SSL certificate on the node system for the PVC API to use.
#### `pvc_api_ssl_cert`
* *optional*
* *required* if `pvc_api_enable_ssl` is `True` and `pvc_api_ssl_cert_path` is not set.
The SSL certificate, in text form, for the PVC API to use. Will be installed to `/etc/pvc/api-cert.pem` on the node system.
#### `pc_api_ssl_key_path`
* *optional*
* *required* if `pvc_api_enable_ssl` is `True` and `pvc_api_ssl_key` is not set.
The path to an (existing) SSL private key on the node system for the PVC API to use.
#### `pvc_api_ssl_key`
* *optional*
* *required* if `pvc_api_enable_ssl` is `True` and `pvc_api_ssl_key_path` is not set.
The SSL private key, in text form, for the PVC API to use. Will be installed to `/etc/pvc/api-key.pem` on the node system.
#### `pvc_ceph_storage_secret_uuid`
* *required*
The UUID for Libvirt to communicate with the Ceph storage cluster. This UUID will be used in all VM configurations for the block device.
Generate using `uuidgen`.
#### `pvc_dns_database_name`
* *required*
The name of the PVC DNS aggregator database.
#### `pvc_dns_database_user`
* *required*
The username of the PVC DNS aggregator database user.
#### `pvc_dns_database_password`
* *required*
The password of the PVC DNS aggregator database user.
Generate using `pwgen -s 16` and adjusting length as required.
#### `pvc_api_database_name`
* *required*
The name of the PVC API database.
#### `pvc_api_database_user`
* *required*
The username of the PVC API database user.
#### `pvc_api_database_password`
* *required*
The password of the PVC API database user.
Generate using `pwgen -s 16` and adjusting length as required.
#### `pvc_replication_database_user`
* *required*
The username of the PVC DNS aggregator database replication user.
#### `pvc_replication_database_password`
* *required*
The password of the PVC DNS aggregator database replication user.
Generate using `pwgen -s 16` and adjusting length as required.
#### `pvc_superuser_database_user`
* *required*
The username of the PVC DNS aggregator database superuser.
#### `pvc_superuser_database_password`
* *required*
The password of the PVC DNS aggregator database superuser.
Generate using `pwgen -s 16` and adjusting length as required.
#### `pvc_asn`
* *optional*
The private autonomous system number used for BGP updates to upstream routers.
A default value of "65001" is set in the role defaults if left unset.
#### `pvc_routers`
A list of upstream routers to communicate BGP routes to.
#### `pvc_nodes`
* *required*
A list of all nodes in the PVC cluster and their node-specific configurations. Each node must be present in this list. Each list element contains the following sub-elements:
##### `hostname`
* *required*
The (short) hostname of the node.
##### `is_coordinator`
* *required*
Whether the node is a coordinator. Must be one of, unquoted: `yes`, `no`.
##### `node_id`
* *required*
The ID number of the node. Should normally match the number suffix of the `hostname`.
##### `router_id`
* *required*
The BGP router-id value for upstream route exchange. Should normally match the `upstream_ip`.
##### `upstream_ip`
* *required*
The non-CIDR IP address of the node in the `upstream` network.
##### `upstream_cidr`
* *required*
The CIDR bit mask of the node `upstream_ip` address. Must match the `upstream` network.
##### `cluster_ip`
* *required*
The non-CIDR IP address of the node in the `cluster` network.
##### `cluster_cidr`
* *required*
The CIDR bit mask of the node `cluster_ip` address. Must match the `cluster` network.
##### `storage_ip`
* *required*
The non-CIDR IP address of the node in the `storage` network.
##### `storage_cidr`
* *required*
The CIDR bit mask of the node `storage_ip` address. Must match the `storage` network.
##### `ipmi_host`
* *required*
The IPMI hostname or non-CIDR IP address of the node management controller. Must be reachable by all nodes.
##### `ipmi_user`
* *required*
The IPMI username for the node management controller. Unless a per-host override is required, should usually use the previously-configured global `username_ipmi_host`. All notes from that entry apply.
##### `ipmi_password`
* *required*
The IPMI password for the node management controller. Unless a per-host override is required, should usually use the previously-configured global `passwordname_ipmi_host`. All notes from that entry apply.
#### `pvc_bridge_device`
* *required*
The device name of the underlying network interface to be used for "bridged"-type client networks. For each "bridged"-type network, an IEEE 802.3q vLAN and bridge will be created on top of this device to pass these networks. In most cases, using the reflexive `networks['cluster']['raw_device']` or `networks['upstream']['raw_device']` from the Base role is sufficient.
#### `pvc_bridge_mtu`
* *required*
The MTU of the underlying network interface to be used for "bridged"-type client networks. This is the maximum MTU such networks can use.
#### `pvc_sriov_enable`
* *optional*
Whether to enable or disable SR-IOV functionality.
#### `pvc_sriov_device`
* *optional*
A list of SR-IOV devices. See the Daemon manual for details.
#### `pvc_<network>_*`
The next set of entries is hard-coded to use the values from the global `networks` list. It should not need to be changed under most circumstances. Refer to the previous sections for specific notes about each entry.

View File

@@ -4,7 +4,7 @@ The PVC Node Daemon is the heart of the PVC system and runs on each node to mana
The node daemon is build using Python 3.X and is packaged in the Debian package `pvc-daemon`. The node daemon is build using Python 3.X and is packaged in the Debian package `pvc-daemon`.
Configuration of the daemon is documented in [the manual](/manuals/daemon), however it is recommended to use the [Ansible configuration system](https://github.com/parallelvirtualcluster/pvc-ansible) to configure the PVC cluster for you from scratch. Configuration of the daemon is documented in [the manual](/manuals/daemon), however it is recommended to use the [Ansible configuration interface](/manuals/ansible) to configure the PVC system for you from scratch.
## Overall architecture ## Overall architecture
@@ -60,7 +60,7 @@ The PVC node daemon ins build with Python 3 and is run directly on nodes. For de
The Daemon is configured using a YAML configuration file which is passed in to the API process by the environment variable `PVCD_CONFIG_FILE`. When running with the default package and SystemD unit, this file is located at `/etc/pvc/pvcnoded.yaml`. The Daemon is configured using a YAML configuration file which is passed in to the API process by the environment variable `PVCD_CONFIG_FILE`. When running with the default package and SystemD unit, this file is located at `/etc/pvc/pvcnoded.yaml`.
For most deployments, the management of the configuration file is handled entirely by the [PVC Ansible framework](https://github.com/parallelvirtualcluster/pvc-ansible) and should not be modified directly. Many options from the Ansible framework map directly into the configuration options in this file. For most deployments, the management of the configuration file is handled entirely by the [PVC Ansible framework](/manuals/ansible) and should not be modified directly. Many options from the Ansible framework map directly into the configuration options in this file.
### Conventions ### Conventions

69
docs/manuals/testing.md Normal file
View File

@@ -0,0 +1,69 @@
# Testing procedures
This manual documents the standard procedures used to test PVC before release. This is a living document and will change frequently as new features are added and new corner cases are found.
As PVC does not currently feature any sort of automated tests, this is the primary way of ensuring functionality is as expected and the various components are operating correctly.
## Basic Tests
### Hypervisors
0. Stop then start all PVC node daemons sequentially, ensure they start up successfully.
0. Observe primary coordinator migration between nodes during startup sequence.
0. Verify reachability of floating IPs on each node across primary coordinator migrations.
0. Manually shuffle primary coordinator between nodes and verify as above (`pvc node primary`).
0. Automatically shuffle primary coordinator between nodes and verify as above (`pvc node secondary`).
### Virtual Machines
0. Deploy a new virtual machine using `vminstall` using managed networking and storage.
0. Start the VM on the first node, verify reachability over managed network (`pvc vm start`).
0. Verify console logs are operating (`pvc vm log -f`).
0. Migrate VM to another node via auto-selection and back again (`pvc vm migrate` and `pvc vm unmigrate`).
0. Manually shuffle VM between nodes and verify reachability on each node (`pvc vm move`).
0. Kill the VM and ensure restart occurs (`virsh destroy`).
0. Restart the VM (`pvc vm restart`).
0. Shutdown the VM (`pvc vm shutdown`).
0. Forcibly stop the VM (`pvc vm stop`).
### Virtual Networking
0. Create a new managed virtual network (`pvc network add`).
0. Verify network is present on all nodes.
0. Verify network gateway is reachable across all nodes (`pvc node primary`).
## Advanced Tests
### Fencing
0. Trigger node kernel panic and observe fencing behaviour (`echo c | sudo tee /proc/sysrq-trigger`).
0. Verify node is fenced successfully.
0. Verify primary coordinator status transfers successfully.
0. Verify VMs are migrated away from node successfully.
### Ceph Storage
0. Create an RBD volume.
0. Create an RBD snapshot.
0. Remove an RBD snapshot.
0. Remove an RBD volume.

2
format
View File

@@ -11,7 +11,7 @@ fi
pushd $( git rev-parse --show-toplevel ) &>/dev/null pushd $( git rev-parse --show-toplevel ) &>/dev/null
echo ">>> Formatting..." echo "Formatting..."
black --safe ${check} --exclude api-daemon/migrations . black --safe ${check} --exclude api-daemon/migrations .
ret=$? ret=$?
if [[ $ret -eq 0 ]]; then if [[ $ret -eq 0 ]]; then

2
lint
View File

@@ -7,7 +7,7 @@ fi
pushd $( git rev-parse --show-toplevel ) &>/dev/null pushd $( git rev-parse --show-toplevel ) &>/dev/null
echo ">>> Linting..." echo "Linting..."
flake8 flake8
ret=$? ret=$?
if [[ $ret -eq 0 ]]; then if [[ $ret -eq 0 ]]; then

View File

@@ -48,7 +48,7 @@ import re
import json import json
# Daemon version # Daemon version
version = "0.9.43" version = "0.9.42"
########################################################## ##########################################################