Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Relic Solution: Alert on X Synthetics Failures in Y Minutes

synthetics
nrqlalerts
rfb

#1

In the past, the Synthetics alerting mechanism was limited. You would get an alert if your monitor failed from a single location. When there were network disruptions between a location and your application, this could cause “flapping” on the alert, where it is raised and cleared multiple times. Now, through the use of NRQL Alerts, you can create a scenario where an Alert is only sent after a certain number of failures.

Requirement: NRQL Alerts

For this to work you need access to NRQL Alerts which is available with your Alerts subscription. To get started, go into your Alerts Policy and configure a new condition of NRQL.

Query Settings

Since we are going to look at Synthetics failures, you want a query like this:

You should see a UI pop up that indicates if you have recently had any failures. My monitor just recently had a failure so I can see the blip on the graph right after 01:30 PM.

Threshold Settings

Here is where we can supply the “X” failures and “Y” minutes. This is what you need to set:

  • You must change the drop down to say “sum of query results”. This is so Alerts will keep adding +1 every time you get a failure. And you must change the threshold dropdown to “above” since we expect to normally have 0 failures.
  • Lastly you put in your value of “X” failures (3 in my example) and “Y” minutes (15 minutes in my example).

22%20AM

Lastly you need to put in the name for your condition. This will be used in the Alert Notifications.

Notification Result

In my testing the alert notification looked like this. Note that the name “Multi Location failed 3 times in 15 minutes” was the name I put in for my condition.

When you click on View Incident Details you see a screen like this that shows the SUM of the number of failures:

And if you click on that button that says “Go to SyntheticCheck Overview” it will take you to an Insights query. Note that this does not show the SUM, but shows the TIMESERIES of the individual errors.

If you want the list of the exact ERROR messages, you could change your query:

Or you could navigate to an Insights dashboard that has details on the recent failures (I added a filter for my specific monitor that is going haywire).

Some things to consider:

  • Consider how frequently your monitor runs and the # of locations compared with the “Y” minutes. If your monitor runs every 5 minutes from just 1 location, you would only get a maximum 1 failure per 5 minutes. Do the math really quickly when setting this up, and test, test, test!
  • Also remember to change the dropdown to “sum of query results”

What else?
NRQL queries are brand new, and we know our customers are brilliant at using our products in interesting ways. What have you discovered you can do with NRQL Alerts and Synthetics?


How to set up Synthetics alerts in a practical way?
Feature Request: Allow selection of multiple entities per "SYNTHETICS MONITOR FAILURE"
Is there a create an error threshold before triggering an alert
Violation condition for synthetics test?
Site down for X amount of time before alert is fired for site checks
Relic Solution - Scripted Browser Error Handling and Alerting on Step Failures
Alert only when all locations fail
Best Practice Guide: Synthetics
Configure synthetics monitor to send an alert if it fails 2x in a row
#2

Fantastic share! Exactly what I was after!


#3

@kahrens Thank you so much for posting this write up! We’ll give it a try and will let you know if it improves our “flapping” synthetic alerts. It would still be nice to have these thresholds right on the Synthetic Alerts. But this is a great option in the meantime.

Thanks!!!


#5

Please note that this does not ensure failures from multiple monitoring locations before alerting. Only that there are multiple failures from any location. It’s not well integrated, but a workaround using uniqueCount can be found on this page: Synthetics Pings Failed From One Location (but were successful from others): Feature Idea


#6

This is really a long awaited solution, but to be honest I’m struggling to understand it and my tests have not been successfull yet.

My doubts: If the monitor runs every 5 mins, then how could it fail more than 3 times within 15 mins? I would expect that the treshold should be above 2. Or where goes my reasoning wrong?

I tried to set up a monitor and alert condition as explained here, but it never fires. I tried to test for Success and a reverse condition, but this also did never fire.

The monitor is here: https://synthetics.newrelic.com/accounts/1536085/monitors/6c188dc6-09c0-45f6-adcb-ad9d6ec3acb4 and the alert policy is here: https://alerts.newrelic.com/accounts/1536085/policies/380107

What did I miss?

Kind regards
Wolfgang