Setting up alert for count(traceId) varying from 0 to max

I need some suggestions on what can be done if I want to monitor the count of a certain attribute which varies from 0 to as high as 40-50 k throughout the day and then by EOD becomes 0 again.
I have set up an alert using a baseline for now but that at times results in false alerts.

what I want to monitor is if at peak time there is a sudden drop in that count.
but since during certain hours in the day it will be low as zero, therefore, setting up an alert such that the count goes below a certain threshold won’t work.
And we cant use the failure percentage as well since it will be as high as 90% or above during those hours when traffic is extremely low like 1 or 2 counts in a minute.

To make it more clear what I am monitoring is order trend.

Hey there @RishavSharma,

I hope you are well!

It looks like you are wanting to create an alert for an order trend that alerts you if there is a drop in count during peak time? I think I understand what your goal is here but I do not know exactly how to accomplish it as it is a bit out of my scope. I am looping in an engineer from our alerts team to provide some further insight.

Would you be able to share what your baseline alert is that you are using and any screen shots showing the false alerts? This will help us further pinpoint a possible solution for you. Thank you!

Hi @RishavSharma , as @michaelfrederick requested, it would be helpful have a link to your baseline alert condition and examples of false alerts that have opened. Since the baseline condition sounds like it would be the best for your use case, it would be helpful to the false incidents and if there are further configurations that can be made.

@michaelfrederick @cschmid1
so basically any query which gives me a successful order I have that query setup in my baseline condition but at times when traffic is less for a timeframe then it becomes a mess
I use multiple queries using different tables like log where I use a specific message string for a service ,
ex : select uniquecount(trace.id) from Log where entity.name = ‘XXX’ and messgae like ‘XXX’ and attribute1 = ‘XXX’

or I use transaction like ,
select uniquecount(traceId) from Transacction where appName = ‘XXX’ and name = ‘XXX’ and request.uri = ‘XXX’ and error is false.

or I use metric queries or queries for pageaction tabel,

But eventually, all the things end up at the same place using baseline, I have finetuned the threshold but the problem is a sudden drop in traffic which is legit at times since it results in the false alerts.

I thought of using the percentage like,
select percentage (count(traceId) , where error is true) from Transaction where appName = ‘XXX’ and name = ‘XXXX’ and error is false ,
but the percentage will be really high when traffic is low and even a portion of it gets rejected, which results in a false alert.

what are the substitutes we have for if-else conditions
is there some kind of nesting that we can do to achieve what I am trying to do.
we have limited nesting capabilities in NRQL but so far I haven’t been able to achieve what I want using it.

Hi @RishavSharma

Thanks for reaching back out with such detail, it will be very helpful.

Unfortunately Alerts are my area of expertise, so I will loop in the engineering team here to have another look.

Should you have any updates, fixes or questions please do let us know!

Hi @RishavSharma - We would need to have a link to the actual condition to see if that would work for you. That having been said, please reference the Explorer’s Hub below to see if this is a good use case for a baseline condition.

Relic Solution: When to Use (and when not to use) Baseline Alerting

Another idea might be to just use a static alert condition and implement the solution in this post where you could use the following in your NRQL query.

hourOf(timestamp) IN ('1:00', '2:00', '3:00')

As far as the actual query you are looking to construct for your use case it will depend on which data type would provide you the most useful attributes to obtaining the data you actually want to measure. We can’t really tell you that. I really think scoping your query to specific times would be your best bet here.

1 Like

thanx, i will try with the things you have shared and will share the permalink in some time.
as of now if you ever receive cases like mine a workaround can be “muting- alerts”.
I have started using muting and will share my experience on that as well.

Hey @rishavsharma2101,

Thank you, we look forward to hearing from you soon regarding your results! Have a great day!