Problem: I’m asking NR to alert me if it sees 5 specific errors in 5 minutes, but it’s sending me alerts if it happens 1 time in 5 minutes.
I have an app “API” which is kind of like a traffic cop for our product. It accepts client connections and connects out to several services. I have constructed an alert that the intent is to notify me when my app generates more than 5 http 500 errors to clients in a 5 minute period. The alert configuration looks like this:
When target application
Search Metric Names… “Network/Inbound/StatusCode/500”
has “a minimum value” “above”
“5” “at least once in” “5 minutes”
However, the behavior we are seeing is NR triggers an alert for any 500 that the app returns to a client. When we dig into it, we see two different graphs depending where we look:
- In “View Incident” view, the graph spikes up to 5 and then comes back down to 0. It sent an alert because the spike hit “5”.
- When moving to Overview and Error views, those graphs show 1 error.
To try and understand what’s happening, I’m looking for a few things:
- I’m looking for detailed documentation about the “Network/Inbound/StatusCode/500” metric to figure out if it’s counting what I think it’s counting.
- I’m looking for recommendation of whether I should be using something other than “a minimum value”. And why?
- I’m looking for recommendation of whether I should be using something other than “at least once in”. And why?
TIA … Todd