Self Healing Systems!



Is someone using New relic with integration to any other tooling to create self healing and auto scaling systems?

If yes, Can you please share some examples and use cases to knowledge sharing.

Idea I am looking for is, if APM detects that apdex is falling and it is due to lack of resources are you auto scaling? if yes how, what are the steps.
If Infrastructure sees that a service is dead or having memory leak, can it trigger a service restart or maybe server restart also? if yes, please share how you achieved it?



I’ve not implemented it myself but my thoughts were:

Custom endpoint to connect alert web hook
Custom endpoint uses alert/policy to execute specific code

Examples I had counsidered were shutting down servers for due to low traffic based on transaction count, adding servers when traffic increases and apdex goes below a specific threshold, re-starting processes due to specific error types.

Unfortunately there’s just not enough time in the day to actually implement it!

1 Like

@MKhanna Internally we use New Relic to monitor New Relic… So there are a number of our engineering teams that rely on NR monitoring to determine their resource usage plans and scaling.

One particular example is Synthetics, where the load on public locations can change at a moments notice (think about customers creating hundreds of monitors running every minute, flooding the system with new jobs to run).

The Synthetics team have Insights queries that look at total system latency, as well as a number of other metrics. Those insights queries link up with a capacity management tool that they created to take the results of those queries and use the results to increase/decrease the amount of minions reporting to a given location, be that AWS based locations or Linode.


In that case, you should have some useful NRQL queries to add to the NRQL Library @RyanVeitch :wink:


They’re not so useful for you @stefan_garnham - A large portion of it is custom instrumented events that report to an internal account.
The queries wouldn’t work for you without having that custom instrumentation report to your account.
The best I can do right now is to get a feature request filed for you - hoping to get the Synthetics team to open up those useful custom events to the public :smiley:

1 Like

I second this request!!.

thank you @stefan_garnham and @RyanVeitch for your thoughts, hope others can weigh in too.

@Linds @ebeach, if there is a way kindly broadcast this talk for all the members input!!

1 Like

Ok, so the data may not be currently available and there may be some intellectual property reasons to not provide the information. I understand.

Still, it would be great to get an insight into how New Relic uses New Relic :slight_smile:

It’s late on a Friday afternoon so I make no apologies for using poor pun’s. I’m trying to catch up with @philweber :smiley:


historically, we have used our ticketing system to run self-healing scripts. the concept will be the same for our cloud presence, but the vehicle for the scripts may be migrated to something else that the cloud team owns internally. (ansible perhaps?)

the basics are that we define situations where self-healing makes sense, and when our various monitoring platforms send events to our ticketing system, they are evaluated and scripts are executed as appropriate. the major benefit here is that the script results will be added to the incident notes, thus elevating our ITSM nerdvana. :slight_smile:


Ironically, I was just asked this question today. I think @anon85944545’s response is exactly what I would have said. All we want is a tool that can make a webhook to something to tell that something to do some automation tasks. Auto-scaling will be handled by the specific Cloud environment and not by our monitoring tool.

The use-case that has been socialized is auto-scaling disks.

Disk A hits 90%, extend by X GB. Disk A hits 90% again, extend by X GB. Disk A hit 90% again, extend the disk and wake someone up

There are lots of different iterations of that model, but that is the general idea.


A post was merged into an existing topic: Moving .Net logs

Hi All @Certified_Explorers,

I have a use case in which I need some ideas,

We have a process that is running on the server and I am keeping an alert on it, if a process goes down. I open an incident. The next step I want to do is, execute my powershell to start that process.

Now I am having a hard time adding the link on how can I start that service, we have Jira, but its not scaled to use it for RPC tasks.

Can this be done by NR directly or any other tool interfacing with NR, that can run a script on Remote windows server.

1 Like

You’re going to need an event management solution or some sort of API front end to take those events and execute some logic on them and then trigger your scripts. You can try Lambda but it doesn’t support PowerShell. Google Functions only does JS. Azure Functions does PowerShell though :wink: