Starting last week, our High Disk Util alarms stopped closing even when the conditions drop below the threshold. It has been specific to two servers and three storage volumes. When I follow-up on the alarm, I find that the the conditions are below the threshold and so then I manually close the alarms. I have restarted the NR Infra agents and yesterday they were upgraded to v1.18. This is specifically related to storage, high CPU alarms clear normally. I’m trying to figure out what changed as this all worked fine until Wed June 2. Thank you for any assistance.
This is related to the topic that I just created a few minutes ago. We also have just a few Warnings that don’t seem to close after the conditions drop below threshold. I have not seen a repeatable pattern with these like I have with the alerts. Is there a way to manually close Warnings like you can with Alarms? Thank you.
This continues to be a problem that affects two host machines. The NR Infrastructure agents are at v1.18 and one of the servers was patched and restarted to see if that made a difference. The alarm is for high disk util as you can see here.
And this is what the agent reported to New Relic. I manually closed the alarm around 7 AM but this clearly shows that the conditions were below threshold for many hours and yet the alarm still will not clear. This problem started last week. I am not aware of any changes to our environment that would explain this. All other NR alarms work as expected, including mem and CPU alarms on these same hosts, it seems to only affect storage. Thank you for any assistance.
Would you be able to send over a permalink to the condition and one for the manually closed incident?
You can create a permalink by clicking share or the three-dot share icon in the top right of the page. This will allow for a New Relic employee to view the exact account, page, and time period you are viewing. Here is a document that explains this in more detail: https://docs.newrelic.com/whats-new/share-dashboards-curated-views-permalinks
Yes, I will do that, thank you. Also, I have been experimenting with changing the threshold and have learned more. So far, when the alert condition is set to 100% for 60 min or 59 min, the alarms will not close on their own once conditions drop below threshold. But, when set to 30 min or 50 min, they will close. I just set it to 55 min to try to find the breaking point.
UPDATE (20210616): I have been experimenting with changing the threshold and have learned more. So far, when the alert condition is set to 100% for 60 min or 59 min, the alarms will not close on their own once conditions drop below threshold. But, when set to 30 min or 50 min, they will close. I just set it to 55 min to try to find the breaking point.
Here is an alarm from yesterday that I closed manually. At this point, the threshold was set to 100% for 59 minutes.
As you can see, util fell below 100% many hours before I manually closed it.
Hi, I just reread your comment and realize that you want a permalink for the alert condition as well. Here it is: https://one.newrelic.com/-/06vjAVvrgQP
I have been experimenting with different times to find the breaking point. So far, 59 and 60 minutes do not auto-close but 30, 50, and 55 do auto-close. It is now set for 56 minutes and so I will know tomorrow if it worked or not.