Let’s peek under the hood for a moment
Let’s talk for a moment about how New Relic Alerts evaluates your data. Once per minute, every alert condition looks at the data stream it’s given and evaluates it numerically against the condition’s threshold. One minute later, the data is again evaluated. Each minute is evaluated discretely (on a pass/fail scale), without regard to any data before or after that single minute.
The evaluation system will build a model of the data, if you have a threshold with a time window of more than 1 minute (e.g. “for at least 15 minutes” will keep track of each minute’s pass/fail result for a rolling 15 minutes). Once the alerts evaluation system gets enough fail results in a row, it will open a violation.
Keep in mind that each minute is evaluated discretely. The evaluation system does not look at any of the minutes in the past, other than to develop the pass/fail model over a rolling time window.
One minute at a time, got it
If you think about this for a moment, you might see how NRQL queries using
stddev are a lot less useful than they seem, when used in an alert condition. After all, if you calculate the standard deviation over an hour (or 24 hours), that can be meaningful. But
percentile(duration,95) calculated over only 60 seconds is less meaningful.
Whoah! So … how do I set up an alert condition to monitor standard deviation over the past 24 hours?
Since the alert evaluation system only looks at a single, discrete minute at a time, but NRQL queries in Insights are much more flexible, you just need to figure out a way to wrangle a standard Insights NRQL query (which can perform functions over longer periods than a single minute) into an alert condition. Here is one way you can accomplish that.
- Set up a cron job to run a script once per minute (since the alerts evaluation system expects to see a data point every minute). Alternatively, you can use a Synthetics Scripted API Test Monitor to run a script for you once each minute.
- In the script, use the Insights Query API to run the exact NRQL query you want. As an example,
SELECT stddev(someAttribute) FROM SomeEventType SINCE 24 hours ago.
- The script should then parse the JSON that is returned and extract the value or values that are important.
- Next, the script should re-package the important values as a JSON object.
- Finally, the script would use the Insights Event API to insert the JSON as a custom event.
- Once the cron job and script are up and running, set up a NRQL alert condition to monitor the attribute in the custom event that is of interest.
With this method, you get exactly the value you are looking for inserted as a custom event into Insights once per minute, which allows the alerts evaluation system to evaluate, for example, a 24-hour calculation of standard deviation, or the 95th percentile over the past 12 hours, or the count of events over the past 30 minutes – anything you can write a standard NRQL query in Insights for, you can now set up an alert condition to monitor!
I hope this helps to better understand how the alerts evaluation system works, as well as providing a way to expand its functionality. Let us know if you come up with other ways to do this!