Alerts stopped working

As this seems the only way to get support now, can you let me know why email and slack alerts have stopped on my account.

@sandy - can you share some links for the alert incidents you expect to have sent notifications to Slack/Email?

I can take a look internally for any potential issues with those.

https://alerts.newrelic.com/accounts/461476/incidents/56461455/violations?id=550267088

Hey @sandy -

The very first violation in that incident comes from February this year - and is somehow still open.

While that violation is open, the incident will not close.

Notifications rely on the incident lifecycle - Open, Acknowledged, Closed. You get a notification for each of these.

Since that incident has not yet closed, you have not received a new notification from it since February.

The following link should take you to the violation that remains open.

https://alerts.newrelic.com/accounts/461476/incidents/56461455/violations?id=252762012

You’ll see a button marked manually close violation - which will help you to close the incident, and you should then be notified.


Side Note: Reconfiguring your incident preference can help you no longer have such long running incidents, and can help ensure you get the notifications you need:

2 Likes

Hi,

Are you saying we need to manually acknowledge each violation, otherwise any subsequent violations will not be alerted and effectively hidden ? And this applies even if the initial violation - e.g. a ping check had recovered after 5 mins ?

@sandy - Not necessarily -

You shouldn’t need to manually close violations (unless you have opted in to violations not auto-closing).

Typically when the opposite of a condition threshold is met, the violation will close. For example, if your threshold is: Transaction Error rate greater than 5% for at least 10 minutes , then for the violation triggered by this condition to close, the application must have a Transaction Error rate lower than 5% for at least 10 minutes.

Violations are what trigger incidents to open. Incident acknowledgement is not required for it to close - however violations must end before the incident can close.

In your case, since the violation was open since February - it’s very likely that it’s status got stuck on our side, and manually closing it is the only way to end the incident.

Typically this isn’t required though - since normal operating behaviour is for violations to close themselves.

Ok that makes more sense - something got stuck on the NewRelic side.

We’ve been using alerts for a while, and up to now they’ve been working as I’d expect with no manual interventions.

I’ll manually clear the stuck ones and see if things go back to normal.

1 Like

Thanks @sandy - let us know how that goes & if you have any further questions.

I have a downtime alert policy. On the summary page it shows 1 open incident for each site checked (see screenshot).

When I click through though, there are no open incidents, so I can’t clear them, and can’t be fully sure that future alerts will be received. Also, under alerts / open incidents there are no Downtime related ones, yet we did miss receiving downtime alerts as described above. Can you check if we have some data corruption on our account ?

@sandy - I think those open incidents just reference the same open violation here:
https://alerts.newrelic.com/accounts/461476/incidents/56461455/violations?id=252762012

Could you try manually close that?