Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add rule to remove downtime #297

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

lizsalmon
Copy link
Contributor

Description:

This PR:

  • Adds values to the enum in hypervisor_states.py to add states for all possible machines. REBOOTED, EMPTY, DOWN
  • Adds in a rule to remove the downtime on a hypervisor being "patched and rebooted" only once it is back up (in rebooted state)

Submitter:

Have you (where applicable):

  • Added unit tests?
  • Checked the latest commit runs on Dev?
  • Updated the example config file(s) and README?

Reviewer

Does this PR:

  • Place non-StackStorm code into the lib directory?
  • Have unit tests for the action/sensor and lib layers?
  • Have clear and obvious action parameter names and descriptions?

Let the patch and reboot action end just after starting a reboot and
then use a sensor to pick up when the machine is back before removing
the downtime.
Also added "EMPTY" and "DOWN"
@lizsalmon lizsalmon changed the title Add Rule to remove downtime Add rule to remove downtime Jan 15, 2025
@lizsalmon lizsalmon changed the title Add rule to remove downtime ENH: Add rule to remove downtime Jan 15, 2025
@lizsalmon lizsalmon force-pushed the patch-reboot-complete branch from ea43ded to 713875b Compare January 15, 2025 16:45
Copy link

codecov bot commented Jan 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.12%. Comparing base (5c66bae) to head (6466ae4).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #297      +/-   ##
==========================================
- Coverage   98.14%   98.12%   -0.02%     
==========================================
  Files         126      126              
  Lines        4575     4580       +5     
  Branches      242      243       +1     
==========================================
+ Hits         4490     4494       +4     
- Misses         76       77       +1     
  Partials        9        9              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Update icinga downtime actions and tests to use IcingaObject enums
@lizsalmon lizsalmon force-pushed the patch-reboot-complete branch from 713875b to 2daac3b Compare January 16, 2025 09:08
@lizsalmon lizsalmon marked this pull request as ready for review January 16, 2025 09:11
Comment on lines 44 to 49
finally:
remove_downtime(
icinga_account=icinga_account,
object_type="Host",
object_name=hypervisor_name,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave this in and change to except, so the downtime is removed if either the patch or reboot commands fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I only catch SSHException?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think so, you may also have to re-raise the exception after removing the downtime

@@ -2,7 +2,7 @@
class_name: HypervisorStateSensor
entry_point: src/hypervisor_state_sensor.py
description: Monitor state of Hypervisors
poll_interval: 600
poll_interval: 30
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we increase the poll_interval back to 10 mins

criteria:
trigger.previous_state:
type: equals
pattern: DOWN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need to be DOWN or DRAINED, so we catch any that reboot faster than the polling interval

@lizsalmon lizsalmon force-pushed the patch-reboot-complete branch from 5aadc94 to 8a7496f Compare January 17, 2025 09:44
Changed criteria to be either DOWN or DRAINED
@lizsalmon lizsalmon force-pushed the patch-reboot-complete branch from 8a7496f to 6466ae4 Compare January 17, 2025 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants