Let’s talk for a moment about how New Relic Alerts evaluates your data. Once per evaluation window (which can be configured from 30 seconds all the way to 2 hours), every alert condition looks at the data stream it’s given and evaluates it numerically against the condition’s threshold. One window later, the data is again evaluated. Each window is evaluated discretely (on a pass/fail scale), without regard to any data before or after that single window*.
The evaluation system will build a model of the data, if you have a threshold with a duration of more than 1 evaluation window (e.g. “for at least 15 minutes” with a 1-minute evaluation window will keep track of each minute’s pass/fail result for a rolling 15 minutes). Once the alerts evaluation system gets enough fail results in a row, it will open a violation.
Keep in mind that each window is evaluated discretely. The evaluation system does not look at any of the windows in the past, other than to develop the pass/fail model over a rolling time window*.
If you think about this for a moment, you might see how NRQL queries using
stddev are a lot less useful than they seem, when used in an alert condition. After all, if you calculate the standard deviation over 24 hours, that can be meaningful. But
percentile(duration,95) calculated over only 60 seconds is less meaningful.
Whoah! So … how do I set up an alert condition to monitor standard deviation over the past 24 hours?
Since the alert evaluation system only looks at a single, discrete window at a time, but NRQL queries in general are much more flexible, you just need to figure out a way to wrangle a standard NRQL query (which can perform functions over longer periods than a single alerts evaluation window) into an alert condition. Here is one way you can accomplish that.
- Set up a cron job to run a script once per minute (since the alerts evaluation system expects to see a data point every minute). Alternatively, you can use a Synthetics Scripted API Test Monitor to run a script for you once each minute.
- In the script, use the Query API or Nerdgraph to run the exact NRQL query you want. As an example,
SELECT stddev(someAttribute) FROM SomeEventType SINCE 24 hours ago.
- The script should then parse the JSON that is returned and extract the value or values that are important.
- Next, the script should re-package the important values as a JSON object.
- Finally, the script would use the Insights Event API to insert the JSON as a custom event.
- Once the cron job and script are up and running, set up a NRQL alert condition to monitor the attribute in the custom event that is of interest.
With this method, you get exactly the value you are looking for inserted as a custom event into Insights once per minute (or whatever time frame you’d like), which allows the alerts evaluation system to evaluate, for example, a 24-hour calculation of standard deviation, or the 95th percentile over the past 12 hours – anything you can write a standard NRQL query in Insights for, you can now set up an alert condition to monitor!
I hope this helps to better understand how the alerts evaluation system works, as well as providing a way to expand its functionality. Let us know if you come up with other ways to do this!
*The exception to this model is Sliding Windows Aggregation (SWA), which will evaluate 1 window’s worth of data every X seconds, and you determine the value of X using the slide-by interval. Read more about how SWA works in this documentation.