ENH: Add rule to remove downtime #297

lizsalmon · 2025-01-15T16:39:38Z

Description:

This PR:

Adds values to the enum in hypervisor_states.py to add states for all possible machines. REBOOTED, EMPTY, DOWN
Adds in a rule to remove the downtime on a hypervisor being "patched and rebooted" only once it is back up (in rebooted state)

Submitter:

Have you (where applicable):

Added unit tests?
Checked the latest commit runs on Dev?
Updated the example config file(s) and README?

Reviewer

Does this PR:

Place non-StackStorm code into the lib directory?
Have unit tests for the action/sensor and lib layers?
Have clear and obvious action parameter names and descriptions?

Let the patch and reboot action end just after starting a reboot and then use a sensor to pick up when the machine is back before removing the downtime.

Also added "EMPTY" and "DOWN"

codecov · 2025-01-15T16:48:24Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.12%. Comparing base (5c66bae) to head (6466ae4).
Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #297      +/-   ##
==========================================
- Coverage   98.14%   98.12%   -0.02%     
==========================================
  Files         126      126              
  Lines        4575     4580       +5     
  Branches      242      243       +1     
==========================================
+ Hits         4490     4494       +4     
- Misses         76       77       +1     
  Partials        9        9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Update icinga downtime actions and tests to use IcingaObject enums

gmatthews20 · 2025-01-16T11:50:03Z

lib/workflows/hv_patch_and_reboot.py

-    finally:
-        remove_downtime(
-            icinga_account=icinga_account,
-            object_type="Host",
-            object_name=hypervisor_name,
-        )


Can we leave this in and change to except, so the downtime is removed if either the patch or reboot commands fail

Should I only catch SSHException?

Yes I think so, you may also have to re-raise the exception after removing the downtime

gmatthews20 · 2025-01-16T11:54:17Z

sensors/hypervisor.state_change.yaml

@@ -2,7 +2,7 @@
 class_name: HypervisorStateSensor
 entry_point: src/hypervisor_state_sensor.py
 description: Monitor state of Hypervisors
-poll_interval: 600
+poll_interval: 30


Can we increase the poll_interval back to 10 mins

gmatthews20 · 2025-01-16T11:56:00Z

rules/hv.remove.downtime.yaml

+criteria:
+  trigger.previous_state:
+    type: equals
+    pattern: DOWN


This might need to be DOWN or DRAINED, so we catch any that reboot faster than the polling interval

Changed criteria to be either DOWN or DRAINED

lizsalmon added 3 commits January 10, 2025 16:19

Check for rebooted using sensor

13262af

Let the patch and reboot action end just after starting a reboot and then use a sensor to pick up when the machine is back before removing the downtime.

Changed the sensor to have a "REBOOTED" option

7bb4df9

Also added "EMPTY" and "DOWN"

Add tests

0b634d9

lizsalmon changed the title ~~Add Rule to remove downtime~~ Add rule to remove downtime Jan 15, 2025

lizsalmon changed the title ~~Add rule to remove downtime~~ ENH: Add rule to remove downtime Jan 15, 2025

lizsalmon force-pushed the patch-reboot-complete branch from ea43ded to 713875b Compare January 15, 2025 16:45

Update downtime actions

2daac3b

Update icinga downtime actions and tests to use IcingaObject enums

lizsalmon force-pushed the patch-reboot-complete branch from 713875b to 2daac3b Compare January 16, 2025 09:08

lizsalmon marked this pull request as ready for review January 16, 2025 09:11

gmatthews20 requested changes Jan 16, 2025

View reviewed changes

lizsalmon force-pushed the patch-reboot-complete branch from 5aadc94 to 8a7496f Compare January 17, 2025 09:44

Respond to comments

6466ae4

Changed criteria to be either DOWN or DRAINED

lizsalmon force-pushed the patch-reboot-complete branch from 8a7496f to 6466ae4 Compare January 17, 2025 09:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add rule to remove downtime #297

ENH: Add rule to remove downtime #297

lizsalmon commented Jan 15, 2025

codecov bot commented Jan 15, 2025 •

edited

Loading

gmatthews20 Jan 16, 2025

lizsalmon Jan 17, 2025

gmatthews20 Jan 17, 2025

gmatthews20 Jan 16, 2025

gmatthews20 Jan 16, 2025

ENH: Add rule to remove downtime #297

Are you sure you want to change the base?

ENH: Add rule to remove downtime #297

Conversation

lizsalmon commented Jan 15, 2025

Description:

Submitter:

Reviewer

codecov bot commented Jan 15, 2025 • edited Loading

Codecov Report

gmatthews20 Jan 16, 2025

Choose a reason for hiding this comment

lizsalmon Jan 17, 2025

Choose a reason for hiding this comment

gmatthews20 Jan 17, 2025

Choose a reason for hiding this comment

gmatthews20 Jan 16, 2025

Choose a reason for hiding this comment

gmatthews20 Jan 16, 2025

Choose a reason for hiding this comment

codecov bot commented Jan 15, 2025 •

edited

Loading