NRQL not working for SNMP Alerts and Policies

Hi,
I have obtained below NRQL query from documentation - SNMP monitoring results have metrics missing | New Relic Documentation

FROM Log SELECT count(*) WHERE collector.name = 'ktranslate' and message ='%OID failed to return results%'
This is to set alert policy if there is any data missing or the SNMP device isn’t reporting data to newrelic.

This NRQL query isn’t working. Can someone help me to figure out the right NRQL query to achieve this scenario.

Thanks!

Hi @Ajeet.Khan

There is a small mistake in your query:

You should use message LIKE '%OID failed to return results%' instead of message ='%OID failed to return results%'.

Can you try and let me know?

thanks

Rodrigo

@rcorgozinho I tried that and stopped the ktranslate-SNMP container for 15 mins at least. I verified the related devices stopped reporting metrics in Newrelic dashboard. But the NRQL query was still reporting 0
This means either NRQL query is wrong and is not able to fetch any errors related to missing data.
Please let me know what else to check.

FYI, I used this query from Newrelic documentation.

Thanks!

Hi @Ajeet.Khan

I’ll be happy to check it further.

Could you let me know when did you stop the container and to which of your accounts in New Relic this information should be reported?

thanks

Rodrigo

@rcorgozinho I stopped the container between - 14th April, Thursday 5:37 pm - 07PM IST (India time). Account - 2925874

I was also trying to get screenshot for you for the missing data but seems like when I started the container back, it reported the missing data for that window. Because when I stopped the container, I saw the data wasn’t reporting on Newrelic dashboard graph for CPU, RAM etc.
However, even if the data is reported back to Newrelic when container starts, It should have send the alert for not receiving the data during that window.

Thanks!

@rcorgozinho Did you got a chance to check my previous comment?
Thanks!

@rcorgozinho Anyone else available to help on this?

Hi @Ajeet.Khan! I may be able to assist with this. Can you provide a link to the alert condition that is not opening incidents for this query?

@dkoyano Here you go - https://one.newrelic.com/nrai/alerts-classic/policies?account=2925874&state=b7c0b363-1e25-1887-c695-08dd19baa8b0

Thanks for the link @Ajeet.Khan. I had a look at your condition and noticed it is using Event Flow for its aggregation method.

Event Flow is used for data that arrives frequently and relatively consistently. If you use the query from your condition, you’ll see your data arrives very infrequently. In fact, I am not seeing any violations of your threshold within the last 30 days.

If your data indeed is infrequent, I would recommend changing your aggregation method to Event Timer which is used for data arriving infrequently or with large gaps in between.

I noticed that your query is using '%OID failed to return results%' but the erroring message isn’t exactly OID failed to return results as there are several characters between the OID and the failed to return results which is another reason why your condition isn’t working. You’ll need to use something like %OID%failed to return results%.

I’m also including an Explorer’s Hub post regarding the new aggregation methods and how to determine which one to use:

Relic Solution: How Can I Figure Out Which Aggregation Method To Use?

I hope this helps!

@dkoyano Yes, the alert was received after changing things as you suggested. But still there is a problem. The alert email doesn’t include which entity(Firewall/Switch/iDRAC/ILO) is having the issue. Can you please help me to get a proper alert with relevant details.
Thanks!

@dkoyano Another issue is that. The alert gets triggered when the container is started again. And it doesn’t get triggered when we stop the container.

Same problem and yet not able to find a solution. I am querying logs as suggested but when the container stops and there are no logs, I don’t get notifications. Is there a way to get notified if logs are being generated or Kentik-Ktranslate container is not running at all.

@Ajeet.Khan - In order for the violating entity (Firewall/Switch/iDRAC/ILO) to be included you will need to facet on this in your query.

@Ajeet.Khan - Can you provide a timeframe (date/time) for when the container is started/stopped?

@Omprakash.Paliwal - Do you have a link to your alert condition. I am guessing this has to do with a Loss of Signal but would need to take a look at how your condition is configured and how the data is coming in to be sure.

Please share the example.
Regarding the timeframe when container was stopped. You already have the account ID and you can check when there was a alert triggered for that policy. So just before that alert the container was stopped.

Please provide the response ASAP. We are facing this issue for production environment hence the urgency.

Hey there @Ajeet.Khan,

In order to serve you better we would kindly appreciate the information that @dkoyano had asked for. By helping out and sending us the timeframe for when this container stopped we can narrow it down quicker. While we do have the account ID if we go in and search blindly we may not find the exact thing you need and it takes us longer to help solve your issue. I apologize for any inconvenience but rest assured we want to give you the best support possible. We appreciate your patience and hope to hear from you soon.

@michaelfrederick If you check my previous comment, you can see the timestamp as well. Hope this helps. Please arrange a call so that I can show it live. Its been time and we had done a lot of follow up and the issue remains as it is. We are paid customer and its affecting our production. If this doesn’t gets solved on priority, I might soon decide to go with some other monitoring tool.
Thanks!

Hey there @Ajeet.Khan,

I appreciate your patience while we try to solve this. I have created a case with our engineers on your behalf to help assist further with this matter. They will reach out shortly via email. Please do not hesitate to reach back out if you need anything else. I hope you have a great day!