Commit Graph

3234 Commits

Author SHA1 Message Date
Joshua Boniface c61d7bc313 Add linting to build-and-deploy
Ensures that bad code isn't deployed during testing.
2021-10-07 11:51:05 -04:00
Joshua Boniface c0f7ba0125 Add limit negation to VM list
When using the "state", "node", or "tag" arguments to a VM list, add
support for a "negate" flag to look for all VMs *not in* the state,
node, or tag state.
2021-10-07 11:50:52 -04:00
Joshua Boniface 761032b321 Add note about fencing at remote sites 2021-10-04 19:58:08 -04:00
Joshua Boniface 3566e13e79 Correct TOC in architecture page 2021-10-04 01:54:22 -04:00
Joshua Boniface 6b324029cf Correct spelling errors 2021-10-04 01:51:58 -04:00
Joshua Boniface 13eeabf44b Double image sizes for example clusters 2021-10-04 01:47:35 -04:00
Joshua Boniface d86768d3d0 Adjust toc_depth for RTD theme 2021-10-04 01:45:05 -04:00
Joshua Boniface a167757600 Revamp about and architecture docs
Makes these a little simpler to follow and provides some more up-to-date
information based on recent tests and developments.
2021-10-04 01:42:08 -04:00
Joshua Boniface a95d9680ac Adjust bump-version changelog heading level 2021-10-04 01:41:48 -04:00
Joshua Boniface 63962f10ba Move changelog headers down one more level 2021-10-04 01:41:22 -04:00
Joshua Boniface a7a681d92a Adjust indent of index/README versions 2021-10-04 00:33:24 -04:00
Joshua Boniface da9248cfa2 Bump version to 0.9.38 2021-10-03 22:32:41 -04:00
Joshua Boniface aa035a61a7 Correct latency units and format name 2021-10-03 17:06:34 -04:00
Joshua Boniface 7c8ba56561 Revamp test result display
Instead of showing CLAT percentiles, which are very hard to interpret
and understand, instead use the main latency buckets.
2021-10-03 15:49:01 -04:00
Joshua Boniface bba73980de Revamp postinst for the API daemon
Ensures that the worker is always restarted and make the NOTE
conditional more specific.
2021-10-03 15:15:26 -04:00
Joshua Boniface 32b3af697c Tweak fio tests for benchmarks
1. Remove ramp_time as this was giving very strange results.

2. Up the runtime to 75 seconds to compensate.

3. Print the fio command to the console to validate.
2021-10-03 15:06:18 -04:00
Joshua Boniface 7c122ac921 Add benchmark format to list 2021-10-03 15:05:58 -04:00
Joshua Boniface 0dbf139706 Adjust ETA for benchmarks 2021-10-02 04:51:01 -04:00
Joshua Boniface c909beaf6d Add format parsing for format 1 storage benchmarks 2021-10-02 04:46:44 -04:00
Joshua Boniface 2da49297d2 Add version 2 benchmark list formatting 2021-10-02 02:47:17 -04:00
Joshua Boniface 0ff9a6b8c4 Handle benchmark running state properly 2021-10-02 01:54:51 -04:00
Joshua Boniface 28377178d2 Fix missing argument in database insert 2021-10-02 01:49:47 -04:00
Joshua Boniface e06b114c48 Update to storage benchmark format 1
1. Runs `fio` with the `--format=json` option and removes all terse
format parsing from the results.

2. Adds a 15-second ramp time to minimize wonky ramp-up results.

3. Sets group_reporting, which isn't necessary with only a single job,
but is here for consistency.
2021-10-02 01:41:08 -04:00
Joshua Boniface 0058f19d88 Fix handling of array of information
With a benchmark info we only ever want test one, so pass only that to
the formatter. Simplifies the format function.
2021-10-02 01:28:39 -04:00
Joshua Boniface 056cf3740d Avoid versioning benchmark lists
This wouldn't work since each individual test is versioned. Instead add
a placeholder for later once additional format(s) are defined.
2021-10-02 01:25:18 -04:00
Joshua Boniface 58f174b87b Add format option to benchmark info
Allows specifying of raw json or json-pretty formats in addition to the
"pretty" formatted option.
2021-10-02 01:13:50 -04:00
Joshua Boniface 37b98fd54f Add benchmark format function support
Allows choosing different list and info functions based on the benchmark
version found. Currently only implements "legacy" version 0 with more to
be added.
2021-10-02 01:07:25 -04:00
Joshua Boniface f83a345bfe Add test format versioning to storage benchmarks
Adds a test_format database column and a value in the API return for the
test format version, starting at 0 for the existing format as of 0.9.37.

References #143
2021-10-02 00:55:27 -04:00
Joshua Boniface ce06e4d81b Load benchmark results as JSON
Load the JSON at the API side instead of client side, because that's
what the API doc says it is and it just makes more sense.
2021-09-30 23:40:24 -04:00
Joshua Boniface 23977b04fc Bump version to 0.9.37 2021-09-30 02:08:14 -04:00
Joshua Boniface bb1cca522f Revamp benchmark tests
1. Move to a time-based (60s) benchmark to avoid these taking an absurd
amount of time to show the same information.

2. Eliminate the 256k random benchmarks, since they don't really add
anything.

3. Add in a 4k single-queue benchmark as this might provide valuable
insight into latency.

4. Adjust the output to reflect the above changes.

While this does change the benchmarking, this should not invalidate any
existing benchmarks since most of the test suit is unchanged (especially
the most important 4M sequential and 4K random tests). It simply removes
an unused entry and adds a more helpful one. The time-based change
should not significantly affect the results either, just reduces the
total runtime for long-tests and increase the runtime for quick tests to
provide a better picture.
2021-09-29 20:51:30 -04:00
Joshua Boniface 9a4dce4e4c Add primary node to benchmark job name
Ensures tracking of the current primary node the job was run on, since
this may be relevant for performance reasons.
2021-09-28 09:58:22 -04:00
Joshua Boniface f6f6f07488 Add timeouts to queue gets and adjust
Ensure that all keepalive timeouts are set (prevent the queue.get()
actions from blocking forever) and set the thread timeouts to line up as
well. Everything here is thus limited to keepalive_interval seconds
(default 5s) to keep it uniform.
2021-09-27 16:10:27 -04:00
Joshua Boniface 142c999ce8 Re-add success log output during migration 2021-09-27 11:50:55 -04:00
Joshua Boniface 1de069298c Fix missing character in log message 2021-09-27 00:49:43 -04:00
Joshua Boniface 55221b3d97 Simplify VM migration down to 3 steps
Remove two superfluous synchronization steps which are not needed here,
since the exclusive lock handles that situation anyways.

Still does not fix the weird flush->unflush lock timeout bug, but is
better worked-around now due to the cancelling of the other wait freeing
this up and continuing.
2021-09-27 00:03:20 -04:00
Joshua Boniface 0d72798814 Work around synchronization lock issues
Make the block on stage C only wait for 900 seconds (15 minutes) to
prevent indefinite blocking.

The issue comes if a VM is being received, and the current unflush is
cancelled for a flush. When this happens, this lock acquisition seems to
block for no obvious reason, and no other changes seem to affect it.
This is certainly some sort of locking bug within Kazoo but I can't
diagnose it as-is. Leave a TODO to look into this again in the future.
2021-09-26 23:26:21 -04:00
Joshua Boniface 3638efc77e Improve log messages during VM migration 2021-09-26 23:15:38 -04:00
Joshua Boniface c2c888d684 Use event to non-block wait and fix inf wait 2021-09-26 22:55:39 -04:00
Joshua Boniface febef2e406 Track status of VM state thread 2021-09-26 22:55:21 -04:00
Joshua Boniface 2a4f38e933 Simplify locking process for VM migration
Rather than using a cumbersome and overly complex ping-pong of read and
write locks, instead move to a much simpler process using exclusive
locks.

Describing the process in ASCII or narrative is cumbersome, but the
process ping-pongs via a set of exclusive locks and wait timers, so that
the two sides are able to synchronize via blocking the exclusive lock.
The end result is a much more streamlined migration (takes about half
the time all things considered) which should be less error-prone.
2021-09-26 22:08:07 -04:00
Joshua Boniface 3b805cdc34 Fix failure to connect to libvirt in keepalive
This should be caught and abort the thread rather than failing and
holding up keepalives.
2021-09-26 20:42:01 -04:00
Joshua Boniface 06f0f7ed91 Fix several bugs in fence handling
1. Output from ipmitool was not being stripped, and stray newlines were
throwing off the comparisons. Fixes this.

2. Several stages were lacking meaningful messages. Adds these in so the
output is more clear about what is going on.

3. Reduce the sleep time after a fence to just 1x the
keepalive_interval, rather than 2x, because this seemed like excessively
long even for slow IPMI interfaces, especially since we're checking the
power state now anyways.

4. Set the node daemon state to an explicit 'fenced' state after a
successful fence to indicate to users that the node was indeed fenced
successfully and not still 'dead'.
2021-09-26 20:07:30 -04:00
Joshua Boniface fd040ab45a Ensure pvc-flush is after network-online 2021-09-26 17:40:42 -04:00
Joshua Boniface e23e2dd9bf Fix typo in log message 2021-09-26 03:35:30 -04:00
Joshua Boniface ee4266f8ca Tweak CLI helptext around OSD actions
Adds some more detail about OSD commands and their values.
2021-09-26 01:29:23 -04:00
Joshua Boniface 0f02c5eaef Fix typo in sgdisk command options 2021-09-26 00:59:05 -04:00
Joshua Boniface 075abec5fe Use re.search instead of re.match
Required since we're not matching the start of the string.
2021-09-26 00:55:29 -04:00
Joshua Boniface 3a1cbf8d01 Raise basic exceptions in CephInstance
Avoids no exception to reraise errors on failures.
2021-09-26 00:50:10 -04:00
Joshua Boniface a438a4155a Fix OSD creation for partition paths and fix gdisk
The previous implementation did not work with /dev/nvme devices or any
/dev/disk/by-* devices due to some logical failures in the partition
naming scheme, so fix these, and be explicit about what is supported in
the PVC CLI command output.

The 'echo | gdisk' implementation of partition creation also did not
work due to limitations of subprocess.run; instead, use sgdisk which
allows these commands to be written out explicitly and is included in
the same package as gdisk.
2021-09-26 00:12:28 -04:00