Commit Graph

2440 Commits

Author SHA1 Message Date
Joshua Boniface e403146bcf Correct bad stop_keepalive_timer call 2021-10-07 14:41:12 -04:00
Joshua Boniface bde684dd3a Remove redundant wording from header 2021-10-07 12:20:04 -04:00
Joshua Boniface 992e003500 Replace headers with links in CHANGELOG.md 2021-10-07 12:17:44 -04:00
Joshua Boniface eaeb860a83 Add missing period to changelog sentence 2021-10-07 12:10:35 -04:00
Joshua Boniface 1198ca9f5c Move changelog into dedicated file
The changelog was getting far too long for the README/docs index to
support, so move it into CHANGELOG.md and link to it instead.
2021-10-07 12:09:26 -04:00
Joshua Boniface e79d200244 Bump version to 0.9.39 2021-10-07 11:52:38 -04:00
Joshua Boniface 5b3bb9f306 Add linting to build-and-deploy
Ensures that bad code isn't deployed during testing.
2021-10-07 11:51:05 -04:00
Joshua Boniface 5501586a47 Add limit negation to VM list
When using the "state", "node", or "tag" arguments to a VM list, add
support for a "negate" flag to look for all VMs *not in* the state,
node, or tag state.
2021-10-07 11:50:52 -04:00
Joshua Boniface c160648c5c Add note about fencing at remote sites 2021-10-04 19:58:08 -04:00
Joshua Boniface fa37227127 Correct TOC in architecture page 2021-10-04 01:54:22 -04:00
Joshua Boniface 2cac98963c Correct spelling errors 2021-10-04 01:51:58 -04:00
Joshua Boniface 8e50428707 Double image sizes for example clusters 2021-10-04 01:47:35 -04:00
Joshua Boniface a4953bc6ef Adjust toc_depth for RTD theme 2021-10-04 01:45:05 -04:00
Joshua Boniface 3c10d57148 Revamp about and architecture docs
Makes these a little simpler to follow and provides some more up-to-date
information based on recent tests and developments.
2021-10-04 01:42:08 -04:00
Joshua Boniface 26d8551388 Adjust bump-version changelog heading level 2021-10-04 01:41:48 -04:00
Joshua Boniface 57342541dd Move changelog headers down one more level 2021-10-04 01:41:22 -04:00
Joshua Boniface 50f8afd749 Adjust indent of index/README versions 2021-10-04 00:33:24 -04:00
Joshua Boniface 3449069e3d Bump version to 0.9.38 2021-10-03 22:32:41 -04:00
Joshua Boniface cb66b16045 Correct latency units and format name 2021-10-03 17:06:34 -04:00
Joshua Boniface 8edce74b85 Revamp test result display
Instead of showing CLAT percentiles, which are very hard to interpret
and understand, instead use the main latency buckets.
2021-10-03 15:49:01 -04:00
Joshua Boniface e9b69c4124 Revamp postinst for the API daemon
Ensures that the worker is always restarted and make the NOTE
conditional more specific.
2021-10-03 15:15:26 -04:00
Joshua Boniface 3948206225 Tweak fio tests for benchmarks
1. Remove ramp_time as this was giving very strange results.

2. Up the runtime to 75 seconds to compensate.

3. Print the fio command to the console to validate.
2021-10-03 15:06:18 -04:00
Joshua Boniface a09578fcf5 Add benchmark format to list 2021-10-03 15:05:58 -04:00
Joshua Boniface 73be807b84 Adjust ETA for benchmarks 2021-10-02 04:51:01 -04:00
Joshua Boniface 4a9805578e Add format parsing for format 1 storage benchmarks 2021-10-02 04:46:44 -04:00
Joshua Boniface f70f052df1 Add version 2 benchmark list formatting 2021-10-02 02:47:17 -04:00
Joshua Boniface 1e8841ce69 Handle benchmark running state properly 2021-10-02 01:54:51 -04:00
Joshua Boniface 9c7d39d523 Fix missing argument in database insert 2021-10-02 01:49:47 -04:00
Joshua Boniface 011490bcca Update to storage benchmark format 1
1. Runs `fio` with the `--format=json` option and removes all terse
format parsing from the results.

2. Adds a 15-second ramp time to minimize wonky ramp-up results.

3. Sets group_reporting, which isn't necessary with only a single job,
but is here for consistency.
2021-10-02 01:41:08 -04:00
Joshua Boniface 8de63b2785 Fix handling of array of information
With a benchmark info we only ever want test one, so pass only that to
the formatter. Simplifies the format function.
2021-10-02 01:28:39 -04:00
Joshua Boniface 8f8f00b2e9 Avoid versioning benchmark lists
This wouldn't work since each individual test is versioned. Instead add
a placeholder for later once additional format(s) are defined.
2021-10-02 01:25:18 -04:00
Joshua Boniface 1daab49b50 Add format option to benchmark info
Allows specifying of raw json or json-pretty formats in addition to the
"pretty" formatted option.
2021-10-02 01:13:50 -04:00
Joshua Boniface 9f6041b9cf Add benchmark format function support
Allows choosing different list and info functions based on the benchmark
version found. Currently only implements "legacy" version 0 with more to
be added.
2021-10-02 01:07:25 -04:00
Joshua Boniface 5b27e438a9 Add test format versioning to storage benchmarks
Adds a test_format database column and a value in the API return for the
test format version, starting at 0 for the existing format as of 0.9.37.

References #143
2021-10-02 00:55:27 -04:00
Joshua Boniface 3e8a85b029 Load benchmark results as JSON
Load the JSON at the API side instead of client side, because that's
what the API doc says it is and it just makes more sense.
2021-09-30 23:40:24 -04:00
Joshua Boniface 19ac1e17c3 Bump version to 0.9.37 2021-09-30 02:08:14 -04:00
Joshua Boniface 252175fb6f Revamp benchmark tests
1. Move to a time-based (60s) benchmark to avoid these taking an absurd
amount of time to show the same information.

2. Eliminate the 256k random benchmarks, since they don't really add
anything.

3. Add in a 4k single-queue benchmark as this might provide valuable
insight into latency.

4. Adjust the output to reflect the above changes.

While this does change the benchmarking, this should not invalidate any
existing benchmarks since most of the test suit is unchanged (especially
the most important 4M sequential and 4K random tests). It simply removes
an unused entry and adds a more helpful one. The time-based change
should not significantly affect the results either, just reduces the
total runtime for long-tests and increase the runtime for quick tests to
provide a better picture.
2021-09-29 20:51:30 -04:00
Joshua Boniface f39b041471 Add primary node to benchmark job name
Ensures tracking of the current primary node the job was run on, since
this may be relevant for performance reasons.
2021-09-28 09:58:22 -04:00
Joshua Boniface 3b41759262 Add timeouts to queue gets and adjust
Ensure that all keepalive timeouts are set (prevent the queue.get()
actions from blocking forever) and set the thread timeouts to line up as
well. Everything here is thus limited to keepalive_interval seconds
(default 5s) to keep it uniform.
2021-09-27 16:10:27 -04:00
Joshua Boniface e514eed414 Re-add success log output during migration 2021-09-27 11:50:55 -04:00
Joshua Boniface b81e70ec18 Fix missing character in log message 2021-09-27 00:49:43 -04:00
Joshua Boniface c2a473ed8b Simplify VM migration down to 3 steps
Remove two superfluous synchronization steps which are not needed here,
since the exclusive lock handles that situation anyways.

Still does not fix the weird flush->unflush lock timeout bug, but is
better worked-around now due to the cancelling of the other wait freeing
this up and continuing.
2021-09-27 00:03:20 -04:00
Joshua Boniface 5355f6ff48 Work around synchronization lock issues
Make the block on stage C only wait for 900 seconds (15 minutes) to
prevent indefinite blocking.

The issue comes if a VM is being received, and the current unflush is
cancelled for a flush. When this happens, this lock acquisition seems to
block for no obvious reason, and no other changes seem to affect it.
This is certainly some sort of locking bug within Kazoo but I can't
diagnose it as-is. Leave a TODO to look into this again in the future.
2021-09-26 23:26:21 -04:00
Joshua Boniface bf7823deb5 Improve log messages during VM migration 2021-09-26 23:15:38 -04:00
Joshua Boniface 8ba371723e Use event to non-block wait and fix inf wait 2021-09-26 22:55:39 -04:00
Joshua Boniface e10ac52116 Track status of VM state thread 2021-09-26 22:55:21 -04:00
Joshua Boniface 341073521b Simplify locking process for VM migration
Rather than using a cumbersome and overly complex ping-pong of read and
write locks, instead move to a much simpler process using exclusive
locks.

Describing the process in ASCII or narrative is cumbersome, but the
process ping-pongs via a set of exclusive locks and wait timers, so that
the two sides are able to synchronize via blocking the exclusive lock.
The end result is a much more streamlined migration (takes about half
the time all things considered) which should be less error-prone.
2021-09-26 22:08:07 -04:00
Joshua Boniface 16c38da5ef Fix failure to connect to libvirt in keepalive
This should be caught and abort the thread rather than failing and
holding up keepalives.
2021-09-26 20:42:01 -04:00
Joshua Boniface c8134d3a1c Fix several bugs in fence handling
1. Output from ipmitool was not being stripped, and stray newlines were
throwing off the comparisons. Fixes this.

2. Several stages were lacking meaningful messages. Adds these in so the
output is more clear about what is going on.

3. Reduce the sleep time after a fence to just 1x the
keepalive_interval, rather than 2x, because this seemed like excessively
long even for slow IPMI interfaces, especially since we're checking the
power state now anyways.

4. Set the node daemon state to an explicit 'fenced' state after a
successful fence to indicate to users that the node was indeed fenced
successfully and not still 'dead'.
2021-09-26 20:07:30 -04:00
Joshua Boniface 9f41373324 Ensure pvc-flush is after network-online 2021-09-26 17:40:42 -04:00