Alert violations not opening when expected

Hi,

I’ve attempted to create some alerts for when certain critical non-web transactions have an abnormally long duration. In an attempt to test that the alerts are working properly, I temporarily set the condition on one of them to alert when the duration exceeds 1 second at least once in 1 minute. I can see on the chart preview that there have been several breaches of the critical threshold since I made the change, however no incidents are being opened, and no alerts are being sent out.

For reference, here’s an example of one of the NRQL queries I’m using for an alert:

SELECT MAX(duration) FROM Transaction WHERE transactionType = 'Other' AND appName = '<app name>' AND name = '<full transaction name>'

Is this an issue with my configuration, or is something else going on here?

Hey @cody.woolsey -

There could be a number of things going on. The most common issue we see with incidents not being opened is related to how the incident preferences are set up, so you’ll want to check that:

For example if you have the preference set to “by policy” and you already have an open incident for a condition on that policy, you will not see another incident opened with the 2nd condition is violated.

If that doesn’t seem to be the answer, please share a link to the policy (only we can open it!) and we’ll keep digging with you.

Hi @hross,

The “Incident Preference” for the policy is set to “By Condition And Entity”, and no violations have ever been opened for this policy.

Here’s a link to the specific alert condition I’ve setup for testing.

And here’s a link to the overall policy for these conditions.

Thanks!

Hello @cody.woolsey, it looks like this is happening because your data is arriving “late.” Essentially, data points are arriving with timestamps roughly 6-8 minutes in the past, and your evaluation offset is not high enough to catch it.

You can read more about this phenomenon here: Relic Solution: Better Latent Than Never – How Data Latency Affects NRQL Alert Conditions.

In order to work around this, for your specific case I would suggest that you increase the evaluation offset to 9 or 10 minutes. This should allow us to capture all instances of late data.

Please try that configuration out and let me know how it goes!

5 Likes

That seemed to do the trick, thanks Jeffrey!

You’re welcome, glad I could help!