Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question/Feedback]: Provide guidance & best practices on reacting to recommended alerts #390

Open
1 task done
Kalaivin opened this issue Oct 24, 2024 · 2 comments
Open
1 task done
Assignees
Labels
AMBA Core Issues / PR's related AMBA Core enhancement New feature or request

Comments

@Kalaivin
Copy link

Check for previous/existing GitHub issues

  • I have checked for previous/existing GitHub issues

Description

When monitoring IT systems, it’s a best practice to first identify the events or system properties of interest, then determine the appropriate actions to take when these events or property changes occur. Only after this should monitoring and alerts be configured to ensure these events and changes are observed. This approach reduces alert noise and clarifies the actions needed when a particular alert is triggered. Ultimately, it also allows for automation of these actions, reducing the time to resolution.

My request is to update the documentation to provide additional context for each alert: why is it important, and what are the recommended actions when the alert is fired?

Thank you!

@Kalaivin Kalaivin added the question Further information is requested label Oct 24, 2024
@Brunoga-MS Brunoga-MS added the AMBA Core Issues / PR's related AMBA Core label Oct 29, 2024
@ThojoUno
Copy link

ThojoUno commented Nov 9, 2024

Coming from a System Center Operations Manager (SCOM) background, alert tuning is always a hot topic. SCOM was always delivered with a default set of alert rules and monitored enabled out of the box. After deploying SCOM, you would spend the next several months tuning alerts by running "Top 10" alert reports to see what the top talkers are. You either address the hardware or software issue and/or tune the alerts to a threshold that prevents the alert from firing so often. Other times, you just disable the alert, because it is not relevant to your environment.

Initially, I recommend only enabling Severity 0 alerts to be emailed out via the Action group, and then go into Azure monitor weekly for the top alerts firing, and address/remediate each alert.

I like having all the alerts firing initially, because I can go into Azure monitor and get an overall health of the environment by seeing the alerts that are firing. As time goes on, you will be adjusting thresholds and windows, and alert severities to match what is supportable in your environment. There is no one size fits all set of alerts or configuration...

@judyer28 judyer28 self-assigned this Dec 16, 2024
@judyer28 judyer28 added enhancement New feature or request AMBA Core Issues / PR's related AMBA Core and removed AMBA Core Issues / PR's related AMBA Core question Further information is requested labels Dec 16, 2024
@judyer28
Copy link
Contributor

@Kalaivin, thank you for your interest in AMBA. This is a good suggestion, and I have marked this as an enhancement for future consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AMBA Core Issues / PR's related AMBA Core enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants