Custom alerts for http errors 500

Please paste the permalink to the page in question below:

https://docs.newrelic.com/docs/accounts-partnerships/education/getting-started-new-relic/permalink

Please share your question/describe your issue below. Include any screenshots that may help us understand your question:

Hi,

I would like to trigger an alert on slack each time our ruby application server encounter an error 500.
Is it possible to create an alert policy based on the http response code or the server logs content ?

5 Likes

At present it’s not possible to create an alert based off of a specific HTTP response code. We’ll put in a feature request for you on that.

In the meantime you could create some logic in your app that sends up a custom metric reporting as a 0 when you don’t see the 500 response and then as a 1 when you do. If you create an alert policy condition that creates an incident whenever you see the value of the metric > 0 at least once in 5 mins this could alert you when you get these errors. You would need to report the metric every minute whether it’s 0 or 1 in order to get proper incident closure and opening but it’s an option.

Hope this helps get you started!

Do you have any sense for where this is in the priority list? I see similar (if not the same) feature request going back many months across a bunch of different platforms. This seems like pretty basic functionality that I’m really surprised doesn’t exist.

I don’t have an ETA for this but I can assure you that customer requests are always looked at by the product manager when decisions on where to spend engineering time are made.

I think if you allow me to explain a little about our philosophy and what’s going on inside New Relic it might make more sense why this isn’t necessarily something that can be just slipped into Alerts. It actually goes back to how information about errors is collected by the agent monitoring your app.

When New Relic is collecting information about errors that are seen, we don’t instrument or trace every occurrence of every error. We’re trying to have a minimal impact on your application’s performance and so we collect a representative sampling of the errors seen. We’ll send up traces of some of these that you can view in the APM UI so that you can try and get a handle on what’s happening with your app. If we instrumented and traced every error then New Relic might wind up having a big impact on both application performance or even bandwidth usage in the event that a lot of errors occurred.

We have a similar philosophy for collection slow transaction traces. Most of the agents only collect a trace on the slowest transaction that ended in a given minute.

I hope this helps shed light on why alerting on every occurrence of a certain error code is not trivial. We’re not actually guaranteeing that we collect information about every single error seen. Since you can only alert off of data that is sent up to New Relic, if we’re collecting a representative sampling then you might miss a notification you’d expect.

This is why I’ve suggested a work around where you’re sending up a custom metric for the error code you see. You could also write a plugin that sends up every error code you see and alert based on a percentage of response codes being one you don’t want to see, or if you see one at all.

Please let me know if you’ve got more questions or if any of this doesn’t make sense. We want you to be able to get the relevant and timely notifications you need and sometimes the product doesn’t work or wasn’t designed to work in quite the way you’d like. I know that can be frustrating, especially when the reasoning behind why it behaves the way it does isn’t clear. Hopefully this clears things up some for you.

3 Likes

Thanks for the details reply!

Any time! I’m glad you found the information useful. It’s hard to see the entire picture from the outside sometimes and I hate to think someone is frustrated as a result of that.

Hi Guys,

Triggering an alert for HTTP 500 errors is also a feature we would like to use for our clients. We don’t specifically need to be alerted for every single instance of a HTTP 500 error however it would be beneficial for us to be notified when a spike in these errors has occurred within a specific time.

For example: Setting up an alert condition where ‘X’ amount of HTTP 500 errors have occurred within a 10 minute period.

Another example: Alternatively this could work off a percentage metric where if our accepted error rate is 1% within a 10 minute period (for HTTP 500 errors) we would be alerted if the HTTP 500 error rate went above 2% within a 10 minute time frame.

Specifically for our major clients, we want to be aware of when these scenarios happen. From a Support perspective, if we are alerted that a spike has occurred, we could proactively dig into our logs (at that point time) and work towards understanding why there has been an influx in these errors and for a specific time period. Essentially this would result in handling an incident more efficiently, and if required let us communicate to our client(s) in advance.

Appreciate your time, feel free to make any comments.

Thanks
Pierre

Hi @pierre_dang. Thank you very much for your detailed input on why you’d like to see 500 errors reported in Alerts. I’ve gone ahead and added your suggestions and use case to the feature request for our product managers for their review.

Is there any updates to this, is it something that will be implemented soon or has it already been implemented?

No exciting updates to share on this. Outside of using the custom metric API and alerting on that signal (as David mentioned above), we do want to make this easier and have roadmap plans to address it long term.

2 Likes

How about allowing defining a NRQL query for custom alerts? This would allow to trigger on 5xx errors as well as on other aggregated metrics. We already got a nice graph in Insights that shows only errors with a 5xx response, and I’d like to trigger when the rate of these passes a certain threshold.

NRQL Alerting is something that we’re actively developing right now, actually! Keep an eye out for news from @NateHeinrich here in the Alerts topic. I’ll file a feature request on our behalf so that you can get notified of any updates if/when they’re available.

2 Likes

hello,
We are currently trialing NewRelic and today had a period of 500 server error reponses from UAT.
“The man” had us turn to our new monitoring system, to find out what was happening. NOTHING. not a thing. So i googled. only to find out that NewRelic doesn’t capture 500 server errors. How can this be?

Having previous experience of app dynamics, i thought that this would be a default trigger point for an alert. Could you please give a timeframe for the implementation of the NRQL alerting? I need to somehow explain this away if we have any hope of proceeding with the product.

Chris

@chris.tranter Unfortunately it is not possible to provide timeframes for feature requests. In some cases our developers will post about an upcoming feature when they have integrated the code into a release that is coming out soon, and they are confident the feature will be in that release. That is the closest we can come to providing an update. We may have a feature scoped for a month or two out, but due to technical issues with implementation, have to push that to four or five months out (or longer, depending on the circumstance). It would be unfair to raise your expectations for a specific timeframe, then end up having to postpone the release.

I would like to address the concept regarding capturing 500 errors. It would be more accurate to say [most of] our agents to not capture handled exceptions. Unhandled exceptions are captured. More often than not the issue with 500 errors is we may not get a stack trace. A notable exception is that with Node.js, both handled and unhandled exceptions are captured, but we don’t get a trace for unhandled exceptions as this would terminate the node application, including our agent.

Right now Alerts can be triggered on a percentage of errors threshold. As this thread points out, pinpointing specific errors is the feature that has been requested as there is no specific provision for this now. In fact, this type of request is just one of the reasons we’re working so diligently on NRQL alerting. We just do not have any specifics right now on when it will be available.

thanks for the reply kyle.

it sounds like NRQL is the answer i need to plug the gaps.

Thanks
Chris

1 Like

HI Chrish , could you help me on how to acheive this

Could you give us a little information about your specific use case, @prasanna.s?

1 Like

Here is a NRQL that we use for alerting on specific errorcodes (using 500 as an example):

SELECT count(*) FROM TransactionError WHERE appId = <App Product #> AND error.class = ‘HttpClientError 500’

you can find how to pull the appId from this link in the New Relic Docs: https://docs.newrelic.com/docs/apis/rest-api-v2/requirements/find-product-id

3 Likes

Thanks so much for sharing that example @jayson.conry - that’s really great stuff! @prasanna.s - let us know if that helps you alert on 500 errors!

Hello,

Is the feature for tracking down http erros based on Respose code added in New Relic? Or is there any workaround where i can track http error codes and break them down based on hosts.

Please let me know, we are in need of this feature and any help from NR team would be appreciated :slight_smile:

Regards,
-Vishal Poptani