Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Is it possible to only alert once a second location has also failed in a similar manner?


#1

Is it possible to only alert once a second location has also failed in a similar manner? New Relic seems to frequently have issues in single locations, which makes for extremely brittle monitoring.

I’d love to see the system work like this;

  • A single location has an issue.
  • It notifies another location there was an issue.
  • The first location retries.
  • If both the first and second location fail, THEN send an alert.

Asking for a Second Opinion could greatly improve the reliability of these alerts.

Thanks,
Chris Henry


New Relic Monitor Crying Wolf?
Synthetics Pings Failed From One Location (but were successful from others): Feature Idea
#2

Hey @henrych - great points and observations! I see you added your use case here:

https://discuss.newrelic.com/t/ping-failures-from-one-location/30648/7?u=lisa

I am going to file a feature request on this subject for our product managers to see. :slight_smile:


#3

@henrych, you should be able to craft a workaround using insights and synthetic API queries here.

Setup a monitor with multiple locations but no alerting

Use synthetics to build a query picking the results from a fixed monitor name and facet on the location label. Something like this:

select
filter( latest(result), where 1=1 ) as LatestStatus,
filter( max(timestamp), where result = 'SUCCESS') as LastSuccess,
filter( max(timestamp), where result = 'FAILED') as LastFailure
from SyntheticCheck 
where monitorName ='MyMonitorName'
facet locationLabel 
since 1 week ago

This should give you the latest summary of all regional monitors for a given synthetic check.

Now you’ve got the summary data, use a synthetic API monitor to query insights for this data - see here for more details: https://docs.newrelic.com/docs/insights/new-relic-insights/adding-querying-data/querying-your-data-remotely#cURL

You’ll now have the summary data available in a single JSON response and can build your own rules as to what generates an alert - perhaps only when all locations fail?

If you’re really looking for a more forgiving monitor, you might decide to implement a timeout and retry when querying insights so you only get an alert “if everything fails for two subsequent queries”

You might also want to raise different alerts if a single location is out for more than say 20 minutes so your customer service team can at least be aware of the issue even if you can’t necessarily do anything to resolve it.

P


#4

Wow, @pault - this is pretty incredible. Thank you for sharing this! It is this sort of ingenuity that can be a real gamechanger - thank you again for sharing!


#5

@pault thank you for providing the Insights NRQL example for querying Synthetics.

Below is a snip for alerting when all locations fail using a Synethics API Test (as suggested) against a Synthetics Scripted Browser via Insights NRQL. It is crude but could be expanded further.

//Define your authentication credentials
var myAccountID = 'ACCOUNT_ID';
var myQueryKey = 'QUERY_KEY';
//Generate Query key at https://insights.newrelic.com/accounts/ACCOUNT_ID/manage/api_keys

/*
select
filter( latest(result), where 1=1 ) as LatestStatus
from SyntheticCheck 
where monitorName ='MONITOR_NAME'
facet locationLabel
since 2 hours ago
*/
// Generate escaped nrql from Query key settings at https://insights.newrelic.com/accounts/ACCOUNT_ID/manage/api_keys
var nrql = 'select%20filter%28%20latest%28result%29%2C%20where%201%3D1%20%29%20as%20LatestStatus%20from%20SyntheticCheck%20%20where%20monitorName%20%3D%27MONITOR_NAME%27%20facet%20locationLabel%20since%202%20hours%20ago';
var assert = require('assert');
var options = {
    //Define endpoint URI
    uri: 'https://insights-api.newrelic.com/v1/accounts/' + myAccountID + '/query?nrql=' + nrql,
    //Define query key and expected data type.
    headers: {
    'X-Query-Key': myQueryKey,
    'Accept': 'application/json'
}
};

//Define expected results using callback function.
function callback (err, response, body){
//Log JSON results from endpoint to Synthetics console.
 var result = JSON.parse(body).facets;
 var total = 0;
 var failures = 0;
 for (var key in result) {
   for (var key2 in result[key].results) {
     var location = result[key].name;
     var latest = result[key].results[key2].latest;
     console.log(location, latest);
     total++;
     if (latest != 'SUCCESS') {
       failures++;
     }
   }   
 } 
 console.log("total locations : " + total, "failed locations: " + failures);
 assert.ok (total != failures, 'All locations have failed');
 console.log('end of script');
}

//Make GET request, passing in options and callback.
$http.get(options,callback);

#6

We’ve run into this on and off pretty much since we started using Synthetics - is this still the only way?
Usually it’s just a blip, but sometimes we are forced to change to a different site. Like today, the Newark location refused to work correctly (two other locations were fine).


#7

@Pault thank you its very useful.
I have written a API monitor to generate an alert when two/more locations fail.
I have a small doubt here, the Synthetic API monitor which we setup should have same locations which normal monitor we setup?


#8

tl;dr - don’t worry about this.

The API isn’t a region-centric endpoint so you can choose to run your “monitor of monitors” from any region you choose. It won’t make a difference to your results or timely delivery of alerts here.

The only question you might have is regarding the availability of a single geography with the New Relic infrastructure - what happens when an entire region is offline? You have this risk with synthetic monitoring in terms the loss of a region would mean you wouldn’t know that your monitors aren’t running. The same risk extends to monitoring via layers so perhaps you might want to run your secondary monitors in another region to add a layer of protection in that you could code and detect when monitors aren’t running regularly.

However, you’ve now just pushed the problem to another layer - what if your “monitor of monitors” region fails and you loose monitoring - maybe you need a 3rd layer in a 3rd region.

This can all get out of hand if your’e not careful and it all depends how you run your operations - do you already use insights to graph key metrics that your team(s) see regularly? If so, I’d suggest graphing the number of executed monitors across an account and also aggregate time spent running all monitors is a more valuable sniff test. If either of these drops significantly, you’ll be able to more quickly spot that things are wrong. I’ve used exactly these graphs before to help highlight when private minions go offline (far more likely than loosing a new relic region)

One last thing to remember with this multi-layered monitor approach is you’re potentially doubling your cost as you pay per execution.

You might now want to think about doing smarter bulkier monitors performing multiple service monitors within a single “product” monitor, ultimately firing multiple results into custom tables for each of the monitored services. With this, you can possibly even reduce your new relic costs.

Lots of options here. It all depends on your requirements and how smart you want to get with your coding.


#9

Hey folks - just wanted to share that there is another workaround for this that you might find useful: