NRQL based condition not opening incidents

Alerts Question Template

Hello,

I’ve created a policy with a NRQL based on error metrics provided via the noticeError Java API. But when I have my test app throw multiple errors, the violation is triggered, but no incident is opened and I don’t receive the notification (email). Policies based on APM related metrics however do open incidents and I do receive the notification.

I see the violations but no incident is opened. Thanks, and please let me know if there’s any information I can provide to help.

I have tried with many different settings. The last one was with Incident Preference set to By condition, and TransactionError query results > 0 for at least 5 minutes, with the app throwing errors every 30 seconds for about 15 minutes

This is the condition:

I’m not sure where to find this:

entity that is misbehaving, scoped to the relevant time window.

The relevant period is 03/22 between 14 and 15pm PST.

I’d appreciate if you could help me figure out what’s wrong.

Thanks!

1 Like

I’ll answer myself with my findings, might be helpful to someone else.

It looks like there’s a difference between what we see in the condition configuration page and when the alerting system actually runs the query to find out if there’s a violation.

I’m not yet sure of the specifics, in my case it might be related to me being using the local NR agent and perhaps that taking longer to send data to NR. But fact is that increasing the aggregation window to 5 minutes and offset evaluation to 2 windows seems to have done the trick.

In release and production environments the behaviour will probably differ and require some different fine-tuning, but for now this will do.

1 Like

Hi @Tomaz_Fernandes ,

I’m glad that you’ve managed to figure this one out and we really appreciate you sharing your solution here!

I just wanted to chime in here to elaborate a bit more on this:

I’m not yet sure of the specifics, in my case it might be related to me being using the local NR agent and perhaps that taking longer to send data to NR. But fact is that increasing the aggregation window to 5 minutes and offset evaluation to 2 windows seems to have done the trick.

It sounds like you’re spot on here. There are a couple of things to keep in mind when configuring a NRQL condition like this:

  • It’s important to account for latent data, which is what the offset works for. This tends to be particularly problematic for data coming from AWS or other cloud providers. More background on this can be found here: Relic Solution: Better Latent Than Never – How Data Latency Affects NRQL Alert Conditions
  • It’s always a good idea to verify how your data is coming in via Dashboards rather than relying on the preview graph in Alerts. Scoping in with a TIMESERIES 1 minute clause will give you a very clear view of how data is coming in and this can help a lot with fine tuning.

If you have other questions about this, let me know!

1 Like

This topic was automatically closed after 365 days. New replies are no longer allowed.