Alerts Generation is not consistent

I created a alert policy for an APM Error Percentage Metric. Incident preference is set as “By condition and entity” . For three violations of alert condition , here is what happened.

  1. First violation did not result in any alert being generated but simply opened incident
  2. Second violation after few minutes resulted in alert being generated and received an email notification
  3. Third violation is showing in “Events” Tab that alert is being generated but did not get any email so far.

Any idea why this behavior is so inconsistent ?

Hi there @rabansa - sorry to hear that you are experiencing such unexpected behavior. It would be very helpful if you could please share a permalink to the policy so that we can take a look at what’s going on and dive deeper with you.

Thanks hross for getting back on this . . is this what you are looking for ? I dont see any explicit control for generating permalink

@rabansa When you are looking at the Events tab for an incident, this is showing the timeline for all events that occur under a single incident, but as some events can happen concurrently, it is important to understand the actual, expected behavior of an incident.

First and foremost, alert notifications are sent out on a per incident basis, where you will receive a maximum of three notifications for every incident:

  1. Notify when the incident opens after a critical violation.
  2. Notify when a user acknowledges the incident (optional).
  3. Notify when the incident closes, after all associated critical violations have resolved.

When your incident preferences are set to “By Policy,” the first known violation will open an incident. The act of opening an incident is what generates the first notification. Any other subsequent violations that open under that policy while there is still an open incident will roll up under the existing incident, and will not generate any new notifications.

Looking at Incident #20029714 that was opened for the policy you linked to, the incident events follow what I have described. The first violation opens the incident, after which notifications are sent out. Subsequent violations open, and are rolled up under the incident; as violations close, the incident remains open until every critical violation that was rolled up have resolved. Once they have all resolved, the incident closes, and closing notifications are sent.

If you wish to receive more notifications, you will want your violations to roll up less often by updating the incident preferences for your alert policy to “By Condition” or “By Condition and Entity.”

Here is a great post from our Community explaining how to make this change, and more things to keep in mind when selecting your policy’s incident preferences:

1 Like

Thank you @cwhite for taking the time to explain this . I want to bring your attention to my first post again , I have explicitly mentioned that incident preference is set as “By condition and entity” . Thats the reason expectation was to receive notification per condition violation. Hope that clarifies the concern ?

@rabansa I definitely misread that, thank you for correcting me. After checking our logs, it looks like the incident preferences for that policy were updated 01 Apr 23:44:31 UTC, which was after incident #20029714 had been opened and several violations had already rolled up under it. In that scenario, additional opening notifications are not sent after the fact, but new violations will open new incidents (rather than rolling up).

It looks like there has only been one subsequent violation from the policy, which opened a new incident (#20169104). As far as I can tell based on the latest incident, your incident preferences are working correctly. I did notice that incident #20029714 sent additional closing notifications after you had updated your preferences, which may have been confusing.

EDIT: Correcting myself again, it looks like the extra notifications I saw in incident #20029714 were not closing notifications, but were notifications that are sent when the incident was acknowledged, This aligns with how our incidents are designed to sent notifications.