Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Incorrect reporting of "Containers had restarted too much"?

alerts

#1

We are seeing a peculiar issue and wanted to know if anyone or everyone is seeing the same.

I see that there is a “default kubernetes alert policy” our of the box when you integrate kubernetes with NewRelic. Its got some good alert conditions that we wanted to make use of. However, there is a particular alert condition called “Container restarted too much” is very noisy and is giving incorrect data from the past. We’ve has our containers crash at somepoint during the oprations maintenance schedule (three weeks back) and we generated bunch of violations/incidents, which we did some clean up on. However, this particular alert condition does not seem to go away even after acknowleding the alerts, closing the violations manually and making sure indeed there is no issue with the containers itself currently.

We have a setting that should alert us if the no of restarts are above 100 in 5mins. However, it appears to me that, its considering that 5mins timeframe from the past an triggering alerts as soon as we enable the condition.

Now how can I have this condition look at the current data instead of past data.


#2

Hi @sai.annavajjula

This is a metric we pull directly from Kubernetes that steadily climbs based on how many restarts have happened over the life of the container. I would recommend disabling this alert condition if it is too noisy.

You may also want to start a thread in the Feature Ideas section requesting that we add a feature that does a delta on this count each minute, which would tell you how many restarts a container has had since the last check. That way, other users could provide their workarounds and even their vote on your idea.