Issues with alerting

Hi there

I am attempting to get my alter to trigger - but its not working. The condition is ‘equals 0 at least once in 5 minutes’

I can see in the graph that this condition has been met several times over the last few hours but still there is notification raised in slack.

Hi @faf29fba9eab7b9b6e70

I ran the same query you are trying to alert on in the Query Builder, though I changed from Sum to Latest, and as you can see, the most recent data is not 0s, it is NULLs.

The Alerts evaluator requires a numeric value to evaluate on, and NULLs do not equal 0s in this evaluation.

As far as I can see, this metric has only come in once in the past month, as you can see here:

So, an alert condition targeting when this value drops to zero will not work. Since the value is not hitting Zero, instead it is NULL, it will not be evaluated in the way you may expect. But also, it seems like even if this value did drop to zero, the condition would always be in violation, since there is only one spike above zero, everything else would be considered zero and so this condition would rarely come out of violation.

Can you clarify further what your goals are for this alert. Perhaps we can advise you on alternate set ups to hit your goals, as I don’t think the condition in it’s current form will help.

1 Like

Hi there - thanks

I have a cron task running every hour - I want to run a metric every time it runs. If the task is down I want to alert myself. So essentially if there is no activity from this metric in 2 hours then alert me. I want the alert to trigger again if there is no activity in the next 2 hours - regardless if there is an open incident or not.

It is returning null from the query builder as I only manually triggered the metric from my staging server - so I could verify the alert before pushing to production.

Also what is the default timeframe for the alert query? Is it based on the critical violation? i.e if critical violation is at least once in 120 mins the condition becomes SELECT sum(**) from Metric since 120 minutes ago?

Hey @faf29fba9eab7b9b6e70, You might want to use the Loss of Signal threshold for your use case. Loss of signal occurs when no data matches the NRQL condition over a specific period of time. You can set your loss of signal threshold duration and and also what happens when the threshold is crossed. You can choose to open a signal lost violation.

You can read more about this here-

Alerts are evaluated for every minute. If you’re using sum of query results goes above value at least once in 120 minutes, the alerts system, aggregates values over 120 minutes and if that value breaches the threshold, then you will be alerted. - Sum of query results (limited or intermittent data)