New incidents/alerts are not opening when there is an already open incident

I am trying to achive below for my service.

The requirement

  • If there are 5 are more 500 errors thrown from the application within 1 mins, there needs to be an incident as soon as possible.
  • The actual requirement says that it should be well within 5 seconds but whatever earliest newrelic can offer would work.
  • The incidents should not close automatically even though there have been no new 500s. Someone will assess and close them manually within 24 hours after doing an RCA.

What I have got working

  • I have an alert incident created with a NRQL Query as below:
SELECT count(*) FROM TransactionError WHERE appName = 'XYZ-service' and http.statusCode = 500
  • The example graphs shows the data properly, how and when alerts should have been triggered.

  • I have defined a condition as more than 4 occurrence within 1 min should raise an incident. Since I don’t want these to be automatically closed, I have unchecked the close alerts automatically in signal-loss.

  • Since these are rare errors, I have used event-timer as the streaming methods. Also since I wanted it to be ASAP, I have chosen window size of 1 min only and the timer to aggregate after each 5 secs. Also I have a gap-filling defined to set the null values to 0.

  • As seen in the chart above, there should have been 5 incidents but there are only 3, also they have been closed automatically. Screenshot below

  • Alert policy is set to One issue per incident

Please help with below things:

  • How to open incidents event when there are alerts already open for same condition?
  • What is the earliest newrelic can generate alerts for the incidents. Right now it seems it takes > 2 mins for the email to be notified.

Link to the alert: https://one.newrelic.com/nrai/alerts-classic/policies?account=3689718&state=f6012476-8ded-1a4e-cffb-e58ef53f1af9

Thank You in advance.

Hi @mritunjay.dubey ,

Welcome to Explorers Hub.
Notifications are sent based on alert conditions, from your screenshot we can see that value needs to exceed a critical threshold for 1 minute, which means after the violation occurred, the issue will be created one minute after the breach was detected and notification will be sent, hence the duration between issue and notification is <2 minutes.
The current limitation on alert conditions is that the threshold duration should be at least 60 seconds.

As for the auto-closing issue, this should be logged as a Feature request as our alert condition will close the issue when the opposite of the condition is met, in your case if the breach is recorded once in 1-minute the issue will be created, but if there is no threshold breach in next minute issue will be closed.

Let us know if you have additional questions.

KR, Marin