NRQL alert firing when it shouldn't

We have a NRQL alert monitoring the number of Redis instances that are reporting.
Normally this should be 3 - if it drops below 3, we get an alert.
We got a series of alerts for this today - a bunch at around 12am and again at 5pm (UTC+1).

The graphs on the incident alert page show the number of samples briefly dropping to zero, triggering the alert. (https://alerts.newrelic.com/accounts/5202/incidents/137988146/violations?id=961044841)

However, if I run the same query on Insights, it’s just a flat line showing 3 instances continuously reporting. https://insights.newrelic.com/accounts/5202/query?query=SELECT%20filter(uniqueCount(entityName),%20WHERE%20`cluster.role`%20%3D%20’master’)%20%20as%20%27Masters%27%20FROM%20RedisSample%20WHERE%20%60label.environment%60%20%3D%20%27production%27%20TIMESERIES%20SINCE%20%272020-06-25%2016:00:00%27%20UNTIL%20%272020-06-25%2016:10:00%27

What’s the difference between how that NRQL query is computed in Insights vs Alerts?

Hello @jdelStrother, NRQL alert conditions will evaluate the results of your query in one-minute slices, and by default, we look at data that’s three minutes old. That three-minute offset is a safeguard to account for most instances of data latency. However, if some data from your integration takes more than 3 minutes to reach your account, that data is subsequently “missed” by our alerts evaluation system, as it will have a timestamp matching a point in time that’s already been evaluated.

The general practice is to actually set the evaluation offset to 15-20 minutes on a NRQL alert condition that queries cloud integration data. This is to account for the delay in the AWS integration data.

More info here:

3 Likes