NRQL Alert condition not triggering violations properly

Hi New Relic Team,

I have one query regarding Alerts using NRQL queries. I have below query to check the number of failures for a monitor in Synthetics .

SELECT count(*) from SyntheticCheck where result=‘FAILED’ and monitorName=‘XYZ’ WHERE locationLabel = ‘ABC’

The threshold kept as static and it should trigger an alert when the ‘sum of query result’ equal to ‘3’ for at least once in ‘15 mins’ . Incident preference in Alert policy is ‘By condition and entity’ only .
My monitor runs in every 5 mins. I want to trigger an alert only when it fail for 3 consecutive times . Since ‘sum of query result’ will be calculated as rolling sum, I am expecting if the test fails for 3 consecutive times during the monitor run , the sum of the query result will be 3 for 15 mins(monitor frequency is 5 mins).
But I see multiple consecutive failures for monitor in Synthetics and the graph in alert condition also exceeding value 3, still it didn’t trigger any violation or alerts and haven’t received notifications. Can anyone please let me know why it is not opening a new violation? There was no open violations /incidents before the monitor started failing.

Also I have queries on Evaluation offset. It is kept as 3 mins in alert condition. So my assumptions is , sum of query result will be calculated since 3 mins ago from the current time.But in incidents , could see the query result as

“SELECT count(*) from SyntheticCheck where result=‘FAILED’ and monitorName=‘XYZ’ SINCE 3 minutes ago UNTIL 2 minutes ago”.

What does it mean by ‘Since 3 minutes ago until 2 minutes ago’. If the offset is 3 minutes, then in order to check the query value for 15 mins , should I need to increase the interval to 18 mins from 15 mins?
If anyone can explain in details on the alert mechanism ,how it will calculate query results, when it will trigger alert and how does the offset works ,it will really help us to configure alert condition accordingly.

Thanks in advance!
regards,
Athira

Hi @athira.mathamkode

Not sure if you already did this, but did you have a look at this page.

https://docs.newrelic.com/docs/alerts/new-relic-alerts/defining-conditions/create-alert-conditions-nrql-queries

Also what is the alert policy that you are using, please review that once.

thanks

2 Likes

Hi @MKhanna,

I have created conditions and Policies after following the docs only.
For the same alert condition , I got alerts last week. But yesterday we again had monitor failures but no violations opened for the same .
In preview chart for the condition , could see the query results crossed the threshold for more than 15 mins but still no violation triggered.

Thanks,
Athira

Did you check if the violation or incident was open?

we faced an issue where condition failed. 1 alert went out and then we did not get any alerts.

also for alerts this setting is the key

If you have multiple entities in the policy then choose the 3rd one.

also if possible share screen shots or more details so its easier to try or relate.

thanks

2 Likes

Hi @athira.mathamkode! Was @MKhanna’s last comment able to help you troubleshoot further? Eager to see where you are in getting this solved!

There is no violations opened for the failure. We have multiple alert conditions in policy and the preference is set as “By condition and entity” .

I couldn’t upload screenshots . It shows upload failed.

regards,
Athira

Hi @athira.mathamkode,

I’d be happy to dig in and investigate why this alert condition isn’t behaving as expected. To help me do that, will you please send over a link to the alert condition? Don’t worry – only valid users of your account and New Relic admins will be able to use the link.

Before I see the alert condition, I can help to shed some light on your other questions, however.

Evaluation Offset: every NRQL alert condition comes with an invisible SINCE x minutes ago UNTIL x-1 minutes ago. This is because the alerts evaluation system always looks at a single minute at a time. Evaluation offset controls the value of x in that clause. So, with Evaluation offset set to 3 minutes, the invisible clause will be SINCE 3 minutes ago UNTIL 2 minutes ago. This allows the Synthetics monitor time to run as well as time to get ingested by our collectors – Synthetics monitors are especially prone to latent data as a failed monitor involves 3 separate checks and can take a while waiting for a timeout, but the timestamp associated with the check is set to whenever the check began. You can read more about data latency (which this is directly related to) in this article.

Mechanism of a sum of NRQL alert condition: the alerts evaluation system is only looking at 1 minute at a time, but with sum of, each minute will be added to a rolling total, and when the value of that rolling total breaches the threshold a violation will be opened. With a setting of 15 minutes, that rolling total is summed over the course of a 15-minute period.

I hope this helps, and I look forward to being able to investigate your specific alert condition.

5 Likes

Hi @Fidelicatessen,
Thanks a lot for such a detailed explanation and article for data latency.
We found what is the issue for not triggering alerts. Actually the failed check was running for more than 3 mins and the offset we kept was 3 mins.
We changed the offset value to 5 mins and after that we could see alerts are triggered for failures and we are receiving notifications.
regards,
Athira

1 Like

Hi @athira.mathamkode,

Excellent news! I’m glad I could provide some useful information.

As always, let us know if further difficulties crop up.

1 Like

Hi @Fidelicatessen,
I’m facing a similar issue where I’m trying to get the NRQL based alert policy to trigger a violation. I have 2 notification channels (Emails) assigned to this policy. I’ve intentionally failed the test case for the last 24 hours and it meets the alert policy, but I don’t see any email due to this failure.

,

Am I missing something here?

Hi @priyadarshini.r.kolw - from your screenshot, the chart shows a maximum of 6, not exceeding 6. It is also looks like the duration is not 10 minutes when 6 is reached.

2 Likes

Thanks for the clarification @stefan_garnham. I’ll tweak the threshold values now to observe when the violation is triggered.

2 Likes