97329bb90d 
					 
					
						
						
							
							Sort Ceph pool data by name  
						
						... 
						
						
						
						There is no guarantee that both commands output the pools in the same
order, so sort them by name first so the iteration over the pools by ID
is successful. 
						
						
					 
					
						2024-07-22 13:26:27 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dcb9c0d12c 
					 
					
						
						
							
							Improve fence handling conditions  
						
						... 
						
						
						
						Use the intermediate output text when judging the fence status, rather
than the retcode of the stop as this should be more reliable. 
						
						
					 
					
						2024-05-08 10:55:15 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						79ad09ae59 
					 
					
						
						
							
							Switch virtual memory free to allocated  
						
						... 
						
						
						
						Avoids incorrect reporting if cache/buffers exceeds normal. 
						
						
					 
					
						2024-04-19 10:25:33 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a5763c9d25 
					 
					
						
						
							
							Fix possible race condition applying schemas  
						
						... 
						
						
						
						Found an instance where two of these fired too close together, and
caused a fatal error. Use a write lock, and then catch the schema.apply
function in case it fails anyways. 
						
						
					 
					
						2024-01-11 10:21:01 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						123c7ce857 
					 
					
						
						
							
							Update copyright header on all files for 2024  
						
						... 
						
						
						
						Last release of 2023 is probably the best time to do this. 
						
						
					 
					
						2023-12-29 11:16:59 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e654fbba08 
					 
					
						
						
							
							Move debug condition handling to Logger  
						
						... 
						
						
						
						Avoids many dozens of conditionals sprinkled throughout the code by
centralizing this check into the main Logger instance. 
						
						
					 
					
						2023-12-27 13:01:45 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3e4cc53fdd 
					 
					
						
						
							
							Add node network statistics and utilization values  
						
						... 
						
						
						
						Adds a new physical network interface stats parser to the node
keepalives, and leverages this information to provide a network
utilization overview in the Prometheus metrics. 
						
						
					 
					
						2023-12-21 15:45:01 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0f24184b78 
					 
					
						
						
							
							Explicitly clear resources of fenced node  
						
						... 
						
						
						
						This actually solves the bug originally "fixed" in
5f1432ccdd 
						
						
					 
					
						2023-12-11 12:14:56 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1ba37fe33d 
					 
					
						
						
							
							Restore VM resource allocation location  
						
						... 
						
						
						
						Commit 5f1432ccdd 
						
						
					 
					
						2023-12-11 11:52:59 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1a05077b10 
					 
					
						
						
							
							Fix missing fstring  
						
						
						
						
					 
					
						2023-12-11 11:29:49 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7bc0760b78 
					 
					
						
						
							
							Add time to "starting keepalive" message  
						
						... 
						
						
						
						Matches the pvchealthd output and provides a useful message detail to
this otherwise contextless message. 
						
						
					 
					
						2023-12-10 00:40:32 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1fb0463dea 
					 
					
						
						
							
							Adjust daemon service startup  
						
						... 
						
						
						
						Add healthd, adjust workerd, lower waittime 
						
						
					 
					
						2023-11-30 03:28:02 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						03a738f878 
					 
					
						
						
							
							Move config parser into daemon_lib  
						
						... 
						
						
						
						And reformat/add config values for API. 
						
						
					 
					
						2023-11-30 00:05:37 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4a2eba0961 
					 
					
						
						
							
							Improve node output messages (from pvchealthd)  
						
						... 
						
						
						
						1. Output startup "list" entries in cyan with s state
2. Add start of keepalive run message 
						
						
					 
					
						2023-11-29 21:21:51 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						83ceb41138 
					 
					
						
						
							
							Add daemon name to Logger entries  
						
						
						
						
					 
					
						2023-11-29 15:18:37 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2545a7b744 
					 
					
						
						
							
							Allow similar for IPMI hostnames  
						
						
						
						
					 
					
						2023-11-28 16:09:01 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ce907ff26a 
					 
					
						
						
							
							Allow specifying static IPs instead of a file  
						
						
						
						
					 
					
						2023-11-28 15:28:31 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fc3d292081 
					 
					
						
						
							
							Add missing subdirectory configs  
						
						
						
						
					 
					
						2023-11-27 13:40:07 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eab1ae873b 
					 
					
						
						
							
							Ensure upstream_gateway key will exist  
						
						
						
						
					 
					
						2023-11-27 13:37:57 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eaf93cdf96 
					 
					
						
						
							
							Readd missing subsystem configurations  
						
						
						
						
					 
					
						2023-11-27 13:33:41 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c8f4cbb39e 
					 
					
						
						
							
							Fix node entry keys  
						
						
						
						
					 
					
						2023-11-27 13:24:01 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bcc57638a9 
					 
					
						
						
							
							Refactor pvcnoded to use new configuration  
						
						
						
						
					 
					
						2023-11-26 15:41:25 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						18e43a9377 
					 
					
						
						
							
							Adjust name in worker log output  
						
						
						
						
					 
					
						2023-11-16 02:25:14 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aef38639cf 
					 
					
						
						
							
							Rename pvcapid-worker to pvcworkerd  
						
						
						
						
					 
					
						2023-11-15 20:31:39 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5f1432ccdd 
					 
					
						
						
							
							Fix memory allocation updates and add more debug  
						
						... 
						
						
						
						Previously, we were assigning memalloc/memprov/vcpualloc during an
earlier phase using the main d_domain list. I'm not sure exactly why,
but this was throwing off stats after a fence. Instead, set these values
later on while parsing the actually-active VMs. 
						
						
					 
					
						2023-11-10 10:29:32 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d6b8808448 
					 
					
						
						
							
							Clean up fencing handler  
						
						... 
						
						
						
						1. Remove all format strings in favour of f-strings
2. Ensure all logger messages have a prefix
3. Add a few more logger messages for clarity 
						
						
					 
					
						2023-11-10 10:09:54 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						83c4c6633d 
					 
					
						
						
							
							Readd RBD lock detection and clearing on startup  
						
						... 
						
						
						
						This is still needed due to the nature of the locks and freeing them on
startup, and to preserve lock=fail behaviour on VM startup.
Also fixes the fencing lock flush to directly use the client library
outside of Celery. I don't like this hack but it seems prudent until we
move fencing to the workers as well. 
						
						
					 
					
						2023-11-10 01:33:48 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2c15036f86 
					 
					
						
						
							
							Add KeyDB to node startup services  
						
						... 
						
						
						
						Also ensure API worker starts on all nodes, not just coordinators. 
						
						
					 
					
						2023-11-05 19:26:38 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						30d7e49401 
					 
					
						
						
							
							Start API worker with node daemon on coordinators  
						
						
						
						
					 
					
						2023-11-04 13:08:16 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8b93f9a80e 
					 
					
						
						
							
							Handle OSD index errors during stats collection  
						
						
						
						
					 
					
						2023-11-01 21:33:40 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0769f1ea52 
					 
					
						
						
							
							Increase service start time to 10s  
						
						
						
						
					 
					
						2023-10-23 22:24:03 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						457b7bed3d 
					 
					
						
						
							
							Handle exceptions in fence migrations  
						
						
						
						
					 
					
						2023-09-16 22:56:09 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						48662e90c1 
					 
					
						
						
							
							Remove obsolete monitoring_instance passing  
						
						
						
						
					 
					
						2023-09-15 22:47:45 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						079381c03e 
					 
					
						
						
							
							Move printing to end and add runtime  
						
						
						
						
					 
					
						2023-09-15 22:40:09 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4d51318a40 
					 
					
						
						
							
							Make monitoring interval configurable  
						
						
						
						
					 
					
						2023-09-15 16:54:51 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						254303b9d4 
					 
					
						
						
							
							Use coordinator_state instead of router_state  
						
						... 
						
						
						
						Makes it much clearer what this variable represents. 
						
						
					 
					
						2023-09-15 16:47:56 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						40b7d68853 
					 
					
						
						
							
							Separate monitoring and move to 60s interval  
						
						... 
						
						
						
						Removes the dependency of the monitoring subsystem from the node
keepalives, and runs them at a 60s interval to avoid excessive backups
if a plugin takes too long.
Adds its own logs and related items as required.
Finally adds a new required argument to the run() of plugins, the
coordinator state, which can be used by a plugin to determine actions
based on whether the node is a primary, secondary, or non-coordinator. 
						
						
					 
					
						2023-09-15 16:47:11 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cb413e5ce6 
					 
					
						
						
							
							[Bookworm] Fix Ceph 16 OSD stat parsing  
						
						
						
						
					 
					
						2023-08-31 00:45:03 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed087d83c2 
					 
					
						
						
							
							Found cpuload to 2 decimal places  
						
						
						
						
					 
					
						2023-08-29 21:41:44 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7c07fbefff 
					 
					
						
						
							
							Adjust keepalive health printing and ordering  
						
						
						
						
					 
					
						2023-02-24 11:08:30 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f4eef30770 
					 
					
						
						
							
							Add JSON health to cluster data  
						
						
						
						
					 
					
						2023-02-15 15:26:57 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bc88d764b0 
					 
					
						
						
							
							Add logging flag for montioring plugin output  
						
						
						
						
					 
					
						2023-02-13 22:04:39 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2ee52e44d3 
					 
					
						
						
							
							Move Ceph cluster health reporting to plugin  
						
						... 
						
						
						
						Also removes several outputs from the normal keepalive that were
superfluous/static so that the main output fits on one line. 
						
						
					 
					
						2023-02-13 12:13:56 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3c742a827b 
					 
					
						
						
							
							Initial implementation of monitoring plugin system  
						
						
						
						
					 
					
						2023-02-13 12:06:26 -05:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						726d0a562b 
					 
					
						
						
							
							Update copyright header year  
						
						
						
						
					 
					
						2022-10-06 11:55:27 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5942aa50fc 
					 
					
						
						
							
							Avoid raise/handle deadlocks  
						
						... 
						
						
						
						Can cause log flooding in some edge cases and isn't really needed any
longer. Use a proper conditional followed by an actual error handler. 
						
						
					 
					
						2022-10-03 14:04:12 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8d0f26ff7a 
					 
					
						
						
							
							Add additional kb_ values to OSD stats  
						
						... 
						
						
						
						Allows for easier parsing later to get e.g. % values and more details on
the used amounts. 
						
						
					 
					
						2022-08-11 11:06:36 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						23b1501f40 
					 
					
						
						
							
							Fix linting error F541 f-string placeholders  
						
						
						
						
					 
					
						2021-11-06 03:26:03 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c41664d2da 
					 
					
						
						
							
							Reformat code with Black code formatter  
						
						... 
						
						
						
						Unify the code style along PEP and Black principles using the tool. 
						
						
					 
					
						2021-11-06 03:02:43 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2e7b9b28b3 
					 
					
						
						
							
							Add some delay and additional tries to fencing  
						
						
						
						
					 
					
						2021-10-27 16:24:17 -04:00