I am trying to achive below for my service.
- If there are 5 are more 500 errors thrown from the application within 1 mins, there needs to be an incident as soon as possible.
- The actual requirement says that it should be well within 5 seconds but whatever earliest newrelic can offer would work.
- The incidents should not close automatically even though there have been no new 500s. Someone will assess and close them manually within 24 hours after doing an RCA.
- I have an alert incident created with a NRQL Query as below:
SELECT count(*) FROM TransactionError WHERE appName = 'XYZ-service' and http.statusCode = 500
The example graphs shows the data properly, how and when alerts should have been triggered.
I have defined a condition as more than 4 occurrence within 1 min should raise an incident. Since I don’t want these to be automatically closed, I have unchecked the close alerts automatically in signal-loss.
Since these are rare errors, I have used event-timer as the streaming methods. Also since I wanted it to be ASAP, I have chosen window size of 1 min only and the timer to aggregate after each 5 secs. Also I have a gap-filling defined to set the null values to
As seen in the chart above, there should have been 5 incidents but there are only 3, also they have been closed automatically. Screenshot below
Alert policy is set to One issue per incident
Please help with below things:
- How to open incidents event when there are alerts already open for same condition?
- What is the earliest newrelic can generate alerts for the incidents. Right now it seems it takes > 2 mins for the email to be notified.
Thank You in advance.