a5763c9d25 
					 
					
						
						
							
							Fix possible race condition applying schemas  
						
						... 
						
						
						
						Found an instance where two of these fired too close together, and
caused a fatal error. Use a write lock, and then catch the schema.apply
function in case it fails anyways. 
						
						
					 
					
						2024-01-11 10:21:01 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						123c7ce857 
					 
					
						
						
							
							Update copyright header on all files for 2024  
						
						... 
						
						
						
						Last release of 2023 is probably the best time to do this. 
						
						
					 
					
						2023-12-29 11:16:59 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e654fbba08 
					 
					
						
						
							
							Move debug condition handling to Logger  
						
						... 
						
						
						
						Avoids many dozens of conditionals sprinkled throughout the code by
centralizing this check into the main Logger instance. 
						
						
					 
					
						2023-12-27 13:01:45 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3e4cc53fdd 
					 
					
						
						
							
							Add node network statistics and utilization values  
						
						... 
						
						
						
						Adds a new physical network interface stats parser to the node
keepalives, and leverages this information to provide a network
utilization overview in the Prometheus metrics. 
						
						
					 
					
						2023-12-21 15:45:01 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0f24184b78 
					 
					
						
						
							
							Explicitly clear resources of fenced node  
						
						... 
						
						
						
						This actually solves the bug originally "fixed" in
5f1432ccdd 
						
						
					 
					
						2023-12-11 12:14:56 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1ba37fe33d 
					 
					
						
						
							
							Restore VM resource allocation location  
						
						... 
						
						
						
						Commit 5f1432ccdd 
						
						
					 
					
						2023-12-11 11:52:59 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1a05077b10 
					 
					
						
						
							
							Fix missing fstring  
						
						
						
						
					 
					
						2023-12-11 11:29:49 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7bc0760b78 
					 
					
						
						
							
							Add time to "starting keepalive" message  
						
						... 
						
						
						
						Matches the pvchealthd output and provides a useful message detail to
this otherwise contextless message. 
						
						
					 
					
						2023-12-10 00:40:32 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1fb0463dea 
					 
					
						
						
							
							Adjust daemon service startup  
						
						... 
						
						
						
						Add healthd, adjust workerd, lower waittime 
						
						
					 
					
						2023-11-30 03:28:02 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						03a738f878 
					 
					
						
						
							
							Move config parser into daemon_lib  
						
						... 
						
						
						
						And reformat/add config values for API. 
						
						
					 
					
						2023-11-30 00:05:37 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4a2eba0961 
					 
					
						
						
							
							Improve node output messages (from pvchealthd)  
						
						... 
						
						
						
						1. Output startup "list" entries in cyan with s state
2. Add start of keepalive run message 
						
						
					 
					
						2023-11-29 21:21:51 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						83ceb41138 
					 
					
						
						
							
							Add daemon name to Logger entries  
						
						
						
						
					 
					
						2023-11-29 15:18:37 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2545a7b744 
					 
					
						
						
							
							Allow similar for IPMI hostnames  
						
						
						
						
					 
					
						2023-11-28 16:09:01 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ce907ff26a 
					 
					
						
						
							
							Allow specifying static IPs instead of a file  
						
						
						
						
					 
					
						2023-11-28 15:28:31 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fc3d292081 
					 
					
						
						
							
							Add missing subdirectory configs  
						
						
						
						
					 
					
						2023-11-27 13:40:07 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eab1ae873b 
					 
					
						
						
							
							Ensure upstream_gateway key will exist  
						
						
						
						
					 
					
						2023-11-27 13:37:57 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eaf93cdf96 
					 
					
						
						
							
							Readd missing subsystem configurations  
						
						
						
						
					 
					
						2023-11-27 13:33:41 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c8f4cbb39e 
					 
					
						
						
							
							Fix node entry keys  
						
						
						
						
					 
					
						2023-11-27 13:24:01 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bcc57638a9 
					 
					
						
						
							
							Refactor pvcnoded to use new configuration  
						
						
						
						
					 
					
						2023-11-26 15:41:25 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						18e43a9377 
					 
					
						
						
							
							Adjust name in worker log output  
						
						
						
						
					 
					
						2023-11-16 02:25:14 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aef38639cf 
					 
					
						
						
							
							Rename pvcapid-worker to pvcworkerd  
						
						
						
						
					 
					
						2023-11-15 20:31:39 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5f1432ccdd 
					 
					
						
						
							
							Fix memory allocation updates and add more debug  
						
						... 
						
						
						
						Previously, we were assigning memalloc/memprov/vcpualloc during an
earlier phase using the main d_domain list. I'm not sure exactly why,
but this was throwing off stats after a fence. Instead, set these values
later on while parsing the actually-active VMs. 
						
						
					 
					
						2023-11-10 10:29:32 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d6b8808448 
					 
					
						
						
							
							Clean up fencing handler  
						
						... 
						
						
						
						1. Remove all format strings in favour of f-strings
2. Ensure all logger messages have a prefix
3. Add a few more logger messages for clarity 
						
						
					 
					
						2023-11-10 10:09:54 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						83c4c6633d 
					 
					
						
						
							
							Readd RBD lock detection and clearing on startup  
						
						... 
						
						
						
						This is still needed due to the nature of the locks and freeing them on
startup, and to preserve lock=fail behaviour on VM startup.
Also fixes the fencing lock flush to directly use the client library
outside of Celery. I don't like this hack but it seems prudent until we
move fencing to the workers as well. 
						
						
					 
					
						2023-11-10 01:33:48 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2c15036f86 
					 
					
						
						
							
							Add KeyDB to node startup services  
						
						... 
						
						
						
						Also ensure API worker starts on all nodes, not just coordinators. 
						
						
					 
					
						2023-11-05 19:26:38 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						30d7e49401 
					 
					
						
						
							
							Start API worker with node daemon on coordinators  
						
						
						
						
					 
					
						2023-11-04 13:08:16 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8b93f9a80e 
					 
					
						
						
							
							Handle OSD index errors during stats collection  
						
						
						
						
					 
					
						2023-11-01 21:33:40 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0769f1ea52 
					 
					
						
						
							
							Increase service start time to 10s  
						
						
						
						
					 
					
						2023-10-23 22:24:03 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						457b7bed3d 
					 
					
						
						
							
							Handle exceptions in fence migrations  
						
						
						
						
					 
					
						2023-09-16 22:56:09 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						48662e90c1 
					 
					
						
						
							
							Remove obsolete monitoring_instance passing  
						
						
						
						
					 
					
						2023-09-15 22:47:45 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						079381c03e 
					 
					
						
						
							
							Move printing to end and add runtime  
						
						
						
						
					 
					
						2023-09-15 22:40:09 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4d51318a40 
					 
					
						
						
							
							Make monitoring interval configurable  
						
						
						
						
					 
					
						2023-09-15 16:54:51 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						254303b9d4 
					 
					
						
						
							
							Use coordinator_state instead of router_state  
						
						... 
						
						
						
						Makes it much clearer what this variable represents. 
						
						
					 
					
						2023-09-15 16:47:56 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						40b7d68853 
					 
					
						
						
							
							Separate monitoring and move to 60s interval  
						
						... 
						
						
						
						Removes the dependency of the monitoring subsystem from the node
keepalives, and runs them at a 60s interval to avoid excessive backups
if a plugin takes too long.
Adds its own logs and related items as required.
Finally adds a new required argument to the run() of plugins, the
coordinator state, which can be used by a plugin to determine actions
based on whether the node is a primary, secondary, or non-coordinator. 
						
						
					 
					
						2023-09-15 16:47:11 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cb413e5ce6 
					 
					
						
						
							
							[Bookworm] Fix Ceph 16 OSD stat parsing  
						
						
						
						
					 
					
						2023-08-31 00:45:03 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed087d83c2 
					 
					
						
						
							
							Found cpuload to 2 decimal places  
						
						
						
						
					 
					
						2023-08-29 21:41:44 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7c07fbefff 
					 
					
						
						
							
							Adjust keepalive health printing and ordering  
						
						
						
						
					 
					
						2023-02-24 11:08:30 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f4eef30770 
					 
					
						
						
							
							Add JSON health to cluster data  
						
						
						
						
					 
					
						2023-02-15 15:26:57 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bc88d764b0 
					 
					
						
						
							
							Add logging flag for montioring plugin output  
						
						
						
						
					 
					
						2023-02-13 22:04:39 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2ee52e44d3 
					 
					
						
						
							
							Move Ceph cluster health reporting to plugin  
						
						... 
						
						
						
						Also removes several outputs from the normal keepalive that were
superfluous/static so that the main output fits on one line. 
						
						
					 
					
						2023-02-13 12:13:56 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3c742a827b 
					 
					
						
						
							
							Initial implementation of monitoring plugin system  
						
						
						
						
					 
					
						2023-02-13 12:06:26 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						726d0a562b 
					 
					
						
						
							
							Update copyright header year  
						
						
						
						
					 
					
						2022-10-06 11:55:27 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5942aa50fc 
					 
					
						
						
							
							Avoid raise/handle deadlocks  
						
						... 
						
						
						
						Can cause log flooding in some edge cases and isn't really needed any
longer. Use a proper conditional followed by an actual error handler. 
						
						
					 
					
						2022-10-03 14:04:12 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8d0f26ff7a 
					 
					
						
						
							
							Add additional kb_ values to OSD stats  
						
						... 
						
						
						
						Allows for easier parsing later to get e.g. % values and more details on
the used amounts. 
						
						
					 
					
						2022-08-11 11:06:36 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						23b1501f40 
					 
					
						
						
							
							Fix linting error F541 f-string placeholders  
						
						
						
						
					 
					
						2021-11-06 03:26:03 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c41664d2da 
					 
					
						
						
							
							Reformat code with Black code formatter  
						
						... 
						
						
						
						Unify the code style along PEP and Black principles using the tool. 
						
						
					 
					
						2021-11-06 03:02:43 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2e7b9b28b3 
					 
					
						
						
							
							Add some delay and additional tries to fencing  
						
						
						
						
					 
					
						2021-10-27 16:24:17 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						55f397a347 
					 
					
						
						
							
							Fix bad location of config sets  
						
						
						
						
					 
					
						2021-10-12 17:23:04 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fe73dfbdc9 
					 
					
						
						
							
							Use current live value for bridge_mtu  
						
						... 
						
						
						
						This will ensure that upgrading without the bridge_mtu config key set
will keep things as they are. 
						
						
					 
					
						2021-10-12 12:24:03 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8f906c1f81 
					 
					
						
						
							
							Use power off in fence instead of reset  
						
						... 
						
						
						
						Use a power off (and then make the power on a requirement) during a node
fence. Removes some potential ambiguity in the power state, since we
will know for certain if it is off. 
						
						
					 
					
						2021-10-12 11:04:27 -04:00