Commit Graph

624 Commits

Author SHA1 Message Date
Joshua Boniface a8899a1d66 Fix ordering of pvcnoded unit
We want to be after network.target and want network-online.target
2021-11-18 16:56:49 -05:00
Joshua Boniface 817dffcf30 Bump version to 0.9.44 2021-11-11 16:20:38 -05:00
Joshua Boniface eda2a57a73 Add Munin plugin for Ceph utilization 2021-11-08 15:21:09 -05:00
Joshua Boniface 6e9fcd38a3 Bump version to 0.9.43 2021-11-08 02:29:17 -05:00
Joshua Boniface 78faa90139 Reformat recent changes with Black 2021-11-06 03:27:07 -04:00
Joshua Boniface 23b1501f40 Fix linting error F541 f-string placeholders 2021-11-06 03:26:03 -04:00
Joshua Boniface 66bfad3109 Fix linting errors F522/F523 unused args 2021-11-06 03:24:50 -04:00
Joshua Boniface c41664d2da Reformat code with Black code formatter
Unify the code style along PEP and Black principles using the tool.
2021-11-06 03:02:43 -04:00
Joshua Boniface 2e7b9b28b3 Add some delay and additional tries to fencing 2021-10-27 16:24:17 -04:00
Joshua Boniface 55f397a347 Fix bad location of config sets 2021-10-12 17:23:04 -04:00
Joshua Boniface dfebb2d3e5 Also validate on failures 2021-10-12 17:11:03 -04:00
Joshua Boniface e88147db4a Bump version to 0.9.42 2021-10-12 15:25:42 -04:00
Joshua Boniface b8204d89ac Go back to passing if exception
Validation already happened and the set happens again later.
2021-10-12 14:21:52 -04:00
Joshua Boniface fe73dfbdc9 Use current live value for bridge_mtu
This will ensure that upgrading without the bridge_mtu config key set
will keep things as they are.
2021-10-12 12:24:03 -04:00
Joshua Boniface 8f906c1f81 Use power off in fence instead of reset
Use a power off (and then make the power on a requirement) during a node
fence. Removes some potential ambiguity in the power state, since we
will know for certain if it is off.
2021-10-12 11:04:27 -04:00
Joshua Boniface 2d9fb9688d Validate network MTU after initial read 2021-10-12 10:53:17 -04:00
Joshua Boniface f13cc04b89 Bump version to 0.9.41 2021-10-09 19:39:21 -04:00
Joshua Boniface 95e01f38d5 Adjust log type of object setup message 2021-10-09 19:23:12 -04:00
Joshua Boniface 3122d73bf5 Avoid duplicate runs of MTU set
It wasn't the validator duplicating, but the update duplicating, so
avoid that happening properly this time.
2021-10-09 19:21:47 -04:00
Joshua Boniface 7ed8ef179c Revert "Avoid duplicate runs of MTU validator"
This reverts commit 56021c443a.
2021-10-09 19:11:42 -04:00
Joshua Boniface caead02b2a Set all log messages to information state
None of these were "success" messages and thus shouldn't have been ok
state.
2021-10-09 19:09:38 -04:00
Joshua Boniface 87bc5f93e6 Avoid duplicate runs of MTU validator 2021-10-09 19:07:41 -04:00
Joshua Boniface 203893559e Use correct isinstance instead of type 2021-10-09 19:03:31 -04:00
Joshua Boniface 2c51bb0705 Move MTU validation to function
Prevents code duplication and ensures validation runs when an MTU is
updated, not just on network creation.
2021-10-09 19:01:45 -04:00
Joshua Boniface 46d3daf686 Add logger message when setting MTU 2021-10-09 18:56:18 -04:00
Joshua Boniface e9d05aa24e Ensure vx_mtu is always an int() 2021-10-09 18:52:50 -04:00
Joshua Boniface 6ce28c43af Add MTU value checking and log messages
Ensures that if a specified MTU is more than the maximum it is set to
the maximum instead, and adds warning messages for both situations.
2021-10-09 18:48:56 -04:00
Joshua Boniface c45f8f5bd5 Have VXNetworkInstance set MTU if unset
Makes this explicit in Zookeeper if a network is unset, post-migration
(schema version 6).

Addresses #144
2021-10-09 17:52:57 -04:00
Joshua Boniface 3690a2c1e0 Fix migration bugs and invalid vx_mtu
Addresses #144
2021-10-09 17:35:10 -04:00
Joshua Boniface 50d8aa0586 Add handlers for client network MTUs
Refactors some of the code in VXNetworkInterface to handle MTUs in a
more streamlined fashion. Also fixes a bug whereby bridge client
networks were being explicitly given the cluster dev MTU which might not
be correct. Now adds support for this option explicitly in the configs,
and defaults to 1500 for safety (the standard Ethernet MTU).

Addresses #144
2021-10-09 17:02:27 -04:00
Joshua Boniface 6ee4c55071 Correct flawed conditional in verify_ipmi 2021-10-07 15:11:19 -04:00
Joshua Boniface c27359c4bf Bump version to 0.9.40 2021-10-07 14:42:04 -04:00
Joshua Boniface 46078932c3 Correct bad stop_keepalive_timer call 2021-10-07 14:41:12 -04:00
Joshua Boniface bdb9db8375 Bump version to 0.9.39 2021-10-07 11:52:38 -04:00
Joshua Boniface da9248cfa2 Bump version to 0.9.38 2021-10-03 22:32:41 -04:00
Joshua Boniface 23977b04fc Bump version to 0.9.37 2021-09-30 02:08:14 -04:00
Joshua Boniface f6f6f07488 Add timeouts to queue gets and adjust
Ensure that all keepalive timeouts are set (prevent the queue.get()
actions from blocking forever) and set the thread timeouts to line up as
well. Everything here is thus limited to keepalive_interval seconds
(default 5s) to keep it uniform.
2021-09-27 16:10:27 -04:00
Joshua Boniface 142c999ce8 Re-add success log output during migration 2021-09-27 11:50:55 -04:00
Joshua Boniface 1de069298c Fix missing character in log message 2021-09-27 00:49:43 -04:00
Joshua Boniface 55221b3d97 Simplify VM migration down to 3 steps
Remove two superfluous synchronization steps which are not needed here,
since the exclusive lock handles that situation anyways.

Still does not fix the weird flush->unflush lock timeout bug, but is
better worked-around now due to the cancelling of the other wait freeing
this up and continuing.
2021-09-27 00:03:20 -04:00
Joshua Boniface 0d72798814 Work around synchronization lock issues
Make the block on stage C only wait for 900 seconds (15 minutes) to
prevent indefinite blocking.

The issue comes if a VM is being received, and the current unflush is
cancelled for a flush. When this happens, this lock acquisition seems to
block for no obvious reason, and no other changes seem to affect it.
This is certainly some sort of locking bug within Kazoo but I can't
diagnose it as-is. Leave a TODO to look into this again in the future.
2021-09-26 23:26:21 -04:00
Joshua Boniface 3638efc77e Improve log messages during VM migration 2021-09-26 23:15:38 -04:00
Joshua Boniface c2c888d684 Use event to non-block wait and fix inf wait 2021-09-26 22:55:39 -04:00
Joshua Boniface febef2e406 Track status of VM state thread 2021-09-26 22:55:21 -04:00
Joshua Boniface 2a4f38e933 Simplify locking process for VM migration
Rather than using a cumbersome and overly complex ping-pong of read and
write locks, instead move to a much simpler process using exclusive
locks.

Describing the process in ASCII or narrative is cumbersome, but the
process ping-pongs via a set of exclusive locks and wait timers, so that
the two sides are able to synchronize via blocking the exclusive lock.
The end result is a much more streamlined migration (takes about half
the time all things considered) which should be less error-prone.
2021-09-26 22:08:07 -04:00
Joshua Boniface 3b805cdc34 Fix failure to connect to libvirt in keepalive
This should be caught and abort the thread rather than failing and
holding up keepalives.
2021-09-26 20:42:01 -04:00
Joshua Boniface 06f0f7ed91 Fix several bugs in fence handling
1. Output from ipmitool was not being stripped, and stray newlines were
throwing off the comparisons. Fixes this.

2. Several stages were lacking meaningful messages. Adds these in so the
output is more clear about what is going on.

3. Reduce the sleep time after a fence to just 1x the
keepalive_interval, rather than 2x, because this seemed like excessively
long even for slow IPMI interfaces, especially since we're checking the
power state now anyways.

4. Set the node daemon state to an explicit 'fenced' state after a
successful fence to indicate to users that the node was indeed fenced
successfully and not still 'dead'.
2021-09-26 20:07:30 -04:00
Joshua Boniface fd040ab45a Ensure pvc-flush is after network-online 2021-09-26 17:40:42 -04:00
Joshua Boniface e23e2dd9bf Fix typo in log message 2021-09-26 03:35:30 -04:00
Joshua Boniface 0f02c5eaef Fix typo in sgdisk command options 2021-09-26 00:59:05 -04:00