Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Using Insights to calculate MTBF and MTTR


#1

Hi,

Is it possible to use Synthetics and Insights to calculate MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair)? I am looking to create an Insight dashboard with those values for specific applications.

/D


#2

Hey @daniel.osterhof - Thanks for providing the interesting use case! Just to make sure we’re on the same page, could you provide a bit more detail on “Failure” and “Repair” events in regards to your New Relic apps? How do these events manifest in the New Relic UI?

For example, is MTBF the mean time between 2 adjacent Synthetic monitor events where the monitor changes state from “successful” to “failing”?


#3

Alternatively it could be the time between an alert being opened to being closed?


#4

Hi,

The MTBF could be the average time between for example a Synthetic Monitor failure over the last 12 months.
But in general that could be any type of alert within New Relic, average time between alert is triggering over a period of time.

MTTR would be the average time a type of alert are active over a period of time.

Does that explain what i would like to achieve?


#5

@btribbia - I think this is the type of information that would be great to query on, without the web hook from alerts into Insights solution.

SELECT average(timeResolved - timeOpened) as 'MTTR' From Incidents WHERE policyId = xxxx OR conditionId = yyyyy SINCE 12 months ago


#6

Hey @daniel.osterhof

Thanks for the use case clarification and example. I had a discussion with the team regarding potential options here. We do log all Synthetics monitor check data within Insights. Additionally, @stefan_garnham provided a great example for sending Alert data to Insights by using a custom webhook.

That said, there is no way presently to average() on time between two events within Insights. I believe providing MTBF and MTTR within Synthetics is a great idea, and I have filed a feature request on our behalf with our Product Managers, for this functionality.

In the meantime, we recommend downloading Synthetics data from Insights, either by csv export or API, and calculating these averages separately.


#7

somewhat unbelievable that NewRelic still doesn’t provide an ability to do this out the box.

After a lot of trials and experiments, reading and account manager advice I have got to a point where MTTR by app (well policy_name) is close… still some way to go and not sure my MTTR calculation is right as yet to test it outside of NewRelic.

This is how I’ve done it:

  • I set up a webhook as an alerting Notification Channel
  • Set up Alert Policies as application or service names (as the customer JSON format for the webhook doesn’t pass the app name!!). You’ll need to use the policy_name to group the alerts unfortunately
  • let a few alerts happen and check they’re present in Insights
  • create a dashboard and name it ‘MTTR stuff’ or something
  • Go into insights and run the following query:
    SELECT (max(timestamp)-min(timestamp))/60/1000/count(incident_id) as ‘Average (min)’, (max(timestamp)-min(timestamp))/60/1000 as ‘Total Duration (min)’, count(incident_id) FROM Alert where current_state = ‘closed’ FACET policy_name, severity SINCE 4 weeks AGO
  • give the output from the query a name and ‘add to dashboard’ saving it to the dashboard you created earlier

This is my stab at doing what NewRelic should be doing automatically. I’d be very happy for any tips on how to make this better or enhance the average/MTTR calculation.

Anyone who’s managed to create the meantime before failure and wants to share their NRQL would be a star!


#8

Excellent share, @jake2! I hope someone can share their helpful tips and NRQL solutions with you. :blush: Thanks again for posting how you are solving for this.