Doesn't work alert about AWS alb 5xx errors when another alert occured about 503 error and opened incident don't close automaticary

I send aws metrics by Kinesis Firehose with Amazon CloudWatch Metric Streams.
After enabling the feature, I tested the alert on 5xx and 503 error alerts using NRQL queries.
503 error alert worked but 5xx error did’t worked.
The graph showing critical violation in 5xx error condition.

my NRQL settings is following and settings same condition for 2 NRQL queries.
Conditions are set for different policies

SELECT sum(`aws.applicationelb.HTTPCode_ELB_503_Count`) FROM Metric WHERE `collector.name` = 'cloudwatch-metric-streams' AND `aws.accountId` = 'xxxxxxxxxxx'
SELECT sum(`aws.applicationelb.HTTPCode_ELB_5xx_Count`) FROM Metric WHERE `collector.name` = 'cloudwatch-metric-streams' AND `aws.accountId` = 'xxxxxxxxxxxxx'

https://one.newrelic.com/launcher/nrai.launcher?pane=eyJuZXJkbGV0SWQiOiJhbGVydGluZy11aS1jbGFzc2ljLnBvbGljaWVzIiwibmF2IjoiUG9saWNpZXMiLCJwb2xpY3lJZCI6IjEzMTQzODAifQ==&sidebars[0]=eyJuZXJkbGV0SWQiOiJucmFpLm5hdmlnYXRpb24tYmFyIiwibmF2IjoiUG9saWNpZXMifQ==&platform[accountId]=2802209
https://one.newrelic.com/launcher/nrai.launcher?pane=eyJuZXJkbGV0SWQiOiJhbGVydGluZy11aS1jbGFzc2ljLnBvbGljaWVzIiwibmF2IjoiUG9saWNpZXMiLCJwb2xpY3lJZCI6IjEzMTQzNzgifQ==&sidebars[0]=eyJuZXJkbGV0SWQiOiJucmFpLm5hdmlnYXRpb24tYmFyIiwibmF2IjoiUG9saWNpZXMifQ==&platform[accountId]=2802209

also, opend 503 incident didn’t close automaticary when the error is keep showing 0

plz help me thx

For alerts not getting triggered, you could check if there is any data gap. You might want to fill the data gaps with a suitable option
image

For alert not getting closed automatically, you could use following parameter while setting up the alert:

Please check this for more details: Create NRQL alert conditions | New Relic Documentation

2 Likes

thx to reply!

i setting Fill data gaps with Custom to static value 0
als setting signal is lost after 5 minutes.

I tested it again, and strangely, the 503 error no longer causes an incident, but a 5xx error causes an incident.

Maybe New Relic don’t raises incidents from different policies?

my setting is here.
both graph showing metrics above critical threshold but only one incident occured.

503 errors


5xx errors


That’s strange.

I can confirm that newrelic triggers alerts from different policies. We are using in our application without any issues. Did you check the query data by running it separately? May be you want to re-verify the alert condition or metric name. or someone from newrelic support team might be able to help with this specific case.

so confused…

Does the price plan affect?
out account plan is Original pricing
and I haven’t paid anything until now
This is also strange :frowning:

Hi @dev113

Are you still encountering this issue? I see that both todocu_development_alb_high_503_errors and todocu_development_alb_high_5xx_errors are opening violations (see this link).

I did notice that Evaluation Offset is set very low on both of these conditions (2m on 503 errors, 1m on 5xx errors). I ran a check on the backend and saw that, in the past 3 days, some data came in late on the 5xx condition, meaning it failed to get evaluated. The data was ~140 seconds later than the offset – I would recommend raising Evaluation Offset on both conditions to 4 or 5 minutes to account for latent data. This will make the conditions more reliable.

If you’d like to learn more about latent data and how it affects NRQL alert conditions, take a look at this article.

If you’re still encountering false negatives on either of these conditions, please link to a query showing the data breaching the threshold (not a screenshot) and I’ll investigate.

1 Like

@Fidelicatessen thx!

resolved this issue by your answer.
i expected alert more quickly because using CloudWatch Metric Stream, but its ok