Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Synthetics Pings Failed From One Location (but were successful from others): Feature Idea

feature-idea

#1

Vote Now! | Is the ideal situation that users would be able to configure the number of locations that are seeing failures?

  • I want this, too
  • I have more details to share (reply below)
  • I have a different solution

0 voters

We take feature ideas seriously and our product managers review every one when plotting their roadmaps. However, there is no guarantee this feature will be implemented. This post ensures the idea is put on the table and discussed though. So please vote and share your extra details with our team. :thumbsup:


Ping failures from one location
Feature Idea: Synthetic Ping failures should trigger only when all locations fail at the same time
#2

Please paste the permalink to the page in question below:

If asking about a particular script please copy your script or relevant snippet below:

Please share your question/describe your issue below. Include any screenshots that may help us understand your question:

We recently encountered an issue where pings failed from one location but were successful from others (no issue on the site during that time and not sure what happened here). Is there a way to ensure that multiple consecutive failures from multiple locations have to be observed prior to triggering an alert? Hypothetical case of an issue where new relic location might have a problem does not end up triggering alerts for the team in the middle of the night.


#3

@nari - we actually ensure that the failure is not a faulty one by triple-checking any failures. So if we see a failure we’ll check it 2 more times before we mark it as down and send alerts.


#4

Thanks. Just to confirm … if I have the pings from 2 different locations set to ping every min and only one location is constantly failing for whatever reason, the alert is not triggered since the other location is getting a successful response. In the event that we encounter 3 consecutive failures across both locations, the alert would be triggered. Is that correct?


#5

@nari the 3 consecutive failures is per location. So for it to be failing in Tokyo, we would have checked that location 2 more times before marking that location as failing. This is regardless of what’s happening in any of the other locations.


#6

I second the original request. If Synthetics is monitoring from say 9 locations and only one location is seeing a problem, the issue is more likely with the New Relic monitor at that location (e.g., a network issue at that site) than with the site that you’re monitoring. So the alerting logic could be something like “if three consecutive failures AND at least one other monitor is also reporting an issue for the monitored site, then alert”.


#7

Hey @aht- I definitely see your point, however this doesn’t take into account usability/accessibility between regions.

Network issues between your site and our Synthetics monitor locations could give you insight into users in different parts of the world. If we only alerted against failures in more than one location, you may never be alerted about issues that may be affecting users in US West versus US East who may be able to access the site just fine.

Beyond the ability to de-select monitor locations yourself, do you think you could (in a few words) describe a use case for a feature that may allow you to keep your monitors across varying regions, and alert based on your criteria?

Ideally, would this be within Synthetics itself, or do you think that control over those constraints would be better served in our Alerts beta?

Also, if I’m completely misunderstanding, feel free to correct me. =)


#8

I think the ideal situation would be for users to be able configure the number of locations that are seeing failures.

Very often, we see a single region alerting, but that’s not always helpful or actionable. It really just means that New Relic’s network connectivity went down, not necessarily our users.

Ideally, I’d like to be able to configure a threshold for number of locations that detect an issue before sending an alert. As in, don’t alert until 3 locations have detected a problem.

Thanks,
Chris Henry


#9

Closing the loop - got your feature request (and included all comments) from here, @henrych :smile:


#10

Our use case for Synthetics alerts is pretty similar, so I thought I’d toss it in as well. :smile:

We use Synthetics to monitor API endpoints hosted in Windows Azure (West US). Earlier this morning there was some kind of internet connectivity problem between the SF Synthetics location (AWS US West) and Azure West US, which caused network timeouts on all of the monitored endpoints. (The other 4 Synthetics locations we monitor from were all green)

While the knowledge that our APIs aren’t accessible from AWS US West may be useful, there’s no point in creating an alert because there’s nothing our team can do about it.

The ideal solution for us would be for a Synthetics check to fail from multiple locations before an alert is created. (If a failure in one location automatically triggered a “second opinion” check from a different location, that would be pretty cool!)


#11

Hey All,

thanks for this thread.
We definitely need this kind of engagement to help better understand how to better help our customers.

I have recorded an entry in our backlog to go over this thread and see if we can translate some of your requests into action on our side.

Thanks again
Ivan


Flapping Synthetic Alerts
#12

Any update on this issue? Would love to see this implemented, as well.


#13

Updates still to come, @jproudfo! I will make sure my product managers know you are interested as well! Passing along your feature request now! :thumbsup:


#14

Same here, I think giving the option to customers to choose how many regions need to fail before triggering alerts would be great.

Our use-case: We have 2 load balanced web servers in north America (where most of our traffic comes from). One of our use of New Relic is to monitor the service availability. We see a lot a of failure on the Sydney and Brazil monitors which understandable (network,etc…). While it is very useful for us to have these insights. For us our service is not technically down (we chose to not have a replica in Asia/pacific).


#15

Thanks for sharing your use case with us, @gmolter ! I will be sure to pass it along to my product managers via feature request.


#16

Just adding my viewpoint here that this is a really important feature.

We had a load of alerts from failed synthetic ping checks this morning in one location.

looking at the status page for the provider who owns the london IPs, they say had connectivity issues at the same time as the alerts we received.

Our applications weren’t down, so some concensus between different geographic locations would be great to stop some of these false positives.


#17

We completely understand your request @nedsbeds! I’ll make your request is passed to my product managers via feature request.


#18

Another +1 for this being a user-configurable feature. I’m much more interested in whether my site is actually down over whether a particular part of the world is having connection issues at the moment. Being able to say “only trigger if more than one location fails” would greatly cut down on the myriad of false-positive alerts I receive now.


#19

Gotcha, @jro! Too many false positives can be :thumbsdown:. I will create a feature request for you and be sure I include the details and use case that you provided. Thanks so much—check back on this thread for future updates.


#20

Adding my team’s vote/request for this too. On Friday Sep 2nd, we got a number of failures from only the Newark NJ probe location, and nowhere else. We monitor each of our sites from 2 random U.S. probe locations. False positives like this, which there isn’t any corrective action for us to take, adds to alert fatigue. User adjustable setting/flag for “two or more location failures” or for “any location failure” (today’s behavior) would meet our needs while maintaining NR’s current behavior.