Alerts do not close automatically

Hello everyone!!

Please can you help me or guide me with the following situation.

Through a synthetic monitor (API script) I am monitoring the availability (ICMP PING) of the company’s internal servers.
Monitoring works as follows:

Problem:

  • The alerts are not automatically closed when a code = 1 is detected again on the server and I have to wait for the “Close open incidents after:” to be fulfilled in order to see the closed alert.

I would like to know what I am doing wrong so that the alerts do not close automatically?

permalink alert condition:
https://onenr.io/0nQx34YEBjV

I appreciate your help.

1 Like

Hi @NewRelic.CO

I just took a look at your alert condition and recommend two changes. One is vital for closing your incidents in a timely manner, the other will cause them to open more quickly and reliably.

  1. Add a Loss of Signal threshold. This is because your query returns NULL values when there are no results, and NULL can’t be evaluated numerically.
  2. Change your streaming aggregation method to use Event Timer instead of Event Flow. This will reduce your time-to-detect, since aggregation windows will get evaluated more quickly (or at all). You can use a minimum value of 5 seconds for Event Timer, which will work well here.

I hope these suggestions help – let us know how it goes!

4 Likes

Thanks a lot! @Fidelicatessen

I’m going to make the changes to the alert condition and wait for an alert to follow up.

Something that happened to me to indicate initially is that the polling frequency for the synthetic transaction is 5 minutes.

Could we leave this discussion open while we check the correct operation?

Of course! If my suggestions don’t help, I may defer this to a support engineer, however.

Hi @Fidelicatessen!,

Two devices had packet loss and the NR alert was indeed generated. (I share images)

but as evidenced in the image of the ping monitoring logs, the devices recovered connectivity in the next poll, however the alert has not disappeared.

Is it better to change that zero when connectivity is lost for a 2 or another different number?

Alert Condition: https://onenr.io/0qwLdNKxmw5

Hi @NewRelic.CO

This is a wrinkle I had not expected. I’m going to ask technical support to help with this. You should see one of my support engineer colleagues in this thread soon.

Hi @NewRelic.CO - Can you provide a link to that alert Issue in your screenshot? Thanks!

Hi! @dkoyano,
Share permalink of alerts that were presented this weekend.

Alerts:
https://onenr.io/0yw4pE7q3w3
https://onenr.io/0LREW8XqlQa

Condition Alert:
https://onenr.io/0ZQWA9PL3RW

Thank you very much.

@NewRelic.CO - I took a look at this and you are going to want a Loss of Signal threshold setup to close the violation. Please see the Explorer’s Hub here for more information on this.

Hi @dkoyano.
Thanks a lot.

  1. After setting the signal loss, the active alerts are closed and create a new signal loss alert.
  2. After the devices respond to ping again, the signal loss alerts are not closed automatically.

Note: The alerted devices have been responding to the ping for more than 40 minutes.

condition link: https://onenr.io/08jqnE6DlQl

Alert link: https://onenr.io/0oQD4Ap8NQy

we share evidence:

Hi @dkoyano and @Fidelicatessen ,

Is it possible that you share another option with which we can close the alerts automatically?

Thanks.

Hi @NewRelic.CO ,

This is a current solution that we have in place to manage alerts.
As per your previous post and screenshot, I have noticed that Loss of Signal is set to 5 minutes, so in order to auto-close Loss of signal violation signal must be present for consecutive 5 minutes, if the value drops at any point back to 0 (Null) counter will reset.
As per your condition you might want to use only the option to close open violations with the Loss of Signal feature.

Hope this helps.