Redirect notifications based on target for an Alert

Alerts Question Template

Please describe what you are seeing, and how does this differ from what you were expecting to see?

We were wondering if there is a way that we could route the alert notifications based on some kind of criteria. For example: if the alerts is raised for a certain target (hostnames starting with m) can be redirected to a particular channel and for others too another notification channel. Is it possible to something like this?

Can we modify the alert conditions to consider only certain targets?

Helpful Resources:

Currently, in the included Alerts capabilities, all violations for a single condition get routed to the same location.

We do offer that in our Incident Intelligence offering. This is currently an add-on, but there is a free tier for this. I encourage you to explore that. This uses the attributes in the violation itself to determine which issue to route to.

Also, check out Muting Rules: http://docs.newrelic.com/docs/muting-rules
If you just want to silence notifications from certain VMs

Another question related to this. It seems NRQL might server the purpose since we can introduce appropriate to filter out host which match a certain naming pattern. But I want to recheck about the limit on NRQL conditions. Is it 4000 NRQL conditions per policy or per account?

Hey @siddhant.agarwal

The Alerts limits are documented here: https://docs.newrelic.com/docs/alerts-applied-intelligence/new-relic-alerts/rules-limits-glossary/rules-limits-alerts

As you can see in this screenshot from that doc, policies are limited to 500 conditions of any type. the 4000 NRQL limit is per account.

That probably might be an issue for us. Is there a way to increase this limit on NRQL conditions?

I’m afraid I don’t believe that limit can be boosted.

Can you share more details? Perhaps we can help you keep your conditions below that 4000 limit while still reaching your goals.

If you have your hosts named in an easy to manage way, such that for example the backend applications that handle login authentication are hosted on one of 3 servers named prod-login-[1|2|3|]server

Then you can create a condition with the NRQL WHERE clause as example

WHERE hostName like 'prod-login%'

This will then be part of a policy with the authentication & access team at the other end of the notification channel.

Then you can repeat this with conditions that target the right entities for each team you need to notify.

While it’s certainly possible, 4000 NRQL conditions does feel like more than enough. With more info about how your alerts are, or need to be set up, we can try to advise you on optimising this.

So here is the scenario:

Each of our clients have multiple environments namely (mock, build & prod). So, the way we want to setup alerts is that the notifications for violations from services in prod should be directed to a certain team and from the other two it should be directed to another team.

We have over 200 services which are deployed for each of the clients (approx 60 currently). But this list is ever growing as more and more services and clients are on-boarded to this platform. Now each of these services are containerized and we run multiple containers corresponding to each service for each environments. Something like

200 services X 60 clients X 3 environments per client X min. 2 instances per service, i.e. 72000 instances in total.

What we need is to be able to define multiple conditions for each service, evaluated per host but segregate the alerts for hosts from mock and build environment from the hosts in production. Container is reported as one host in New Relic. We know that we can do such kind of segregation based of New Relic by doing a filtering on host, but that would mean more policies and more NRQL conditions.

How do we ensure that we are able to monitor enough alert metrics and at the same time ensure NRQL condition count is below the limit?

Any updates? We could have combined all the services into one NRQL which would have solved the issue. But the problem is that for each service, the service owners are different and we want to be able to notify them with each alert incident.

Here is how the whole scenario looks like around notifications. We have two stake holders:

  1. Operations
  2. Service Owners

Ask:

  1. Notify Service Owners for all the alert incidents. Notifications channels doesn’t change based on environment type where the alerts is originating from.
  2. Notify Operations team for all the alert incidents. Notification channel differs as described in the previous post.
  3. To be able to modify the thresholds per service in the future. Combining into one NRQL (FACET by appName and host), will lead to a restriction in defining a different threshold for each service.

Would something like Incident Intelligence help here?

Hey @siddhant.agarwal

Sorry for the delay, I thought about this a lot on Friday and tried to come up with some thoughts.

I think the way to tackle this is to focus on notification channels.

If you need 3 different teams to be notified, for Prod/Mock/Build, then that is 3 policies.

In those policies have a few conditions targeting all of the services you need to target, based on the environment. (prod policy targets prod services, etc…).

Once you have that base set of policies set up, anyone else who needs to be notified could either be added as a notification channel to the most pertinent policy. Or, these additional channels could require their own policies.

Each of your NRQL conditions could target multiple services, this is where it is important that your services are named appropriately (a NRQL condition could say WHERE service like 'prod%' for example.

I’m not sure Incident Intelligence would be most helpful here, but you should be able to achieve what you need with NRQL conditions that target multiple services, rather than a condition for each. That will help you reduce the number of conditions needed.

I thought so too. Spent time over the weekend and came up with the same solution of combining multiple services into a single NRQL condition. However, there is a small catch. The conditions we define would use the default thresholds. However, a service owner would have liberty to override the default thresholds in certain scenarios. This capability would be restricted by a very strict review process.
In such a case what I thought was to exclude such services from generic condition, and create a new condition for those services alone (should be a small number). Again, here if the overridden condition matches for multiple services, we can combine all those into one single NRQL condition and exclude from the generic condition. What do you think about it?
Additionally, to support exclusion, does NRQL support NOT IN clause or is just NOT LIKE? How many conditions can be combined under a WHERE clause with an AND operator in NRQL?

Hey @siddhant.agarwal

Yes I would suggest the base set of policies with the base set of conditions.

If certain service owners require different thresholds then a new policy, specifically for tighter controlled thresholds can be created. Just to segment them into their own space to make it easier to manage.

you could leave these in the generic conditions, or exclude them if you wish.

NRQL does indeed support NOT IN, this will take a attribute value, or a list of attribute values.

Here’s an example

SELECT count(*) FROM MyEvent WHERE serviceName NOT IN ('name1', 'name2', 'name3')

I don’t think there is a limit to the number of AND operators or conditional WHERE statements in the NRQL query - but if the query get’s too long there may be a character limit.

What is the character limit? Additionally, what’s limit in the number of parameters that can be passed to NOT IN?

Hi @siddhant.agarwal

Max Query Length is 4096 chars. There is no limit to the NOT IN operator that I’m aware of, other than that they cannot make the whole query exceed 4096 chars.

The limit should be ok. Another related question. How do we access the trial for Incident Intelligence?
The reason that I ask this is because, we still might want to redirect notifications based on the service for which it originated to the respective service owners. Would such a thing be possible via Incident Intelligence?

Yes using Incident intelligence you can set up pathways from a particular source, based on criteria such as service name, to send to a specific destination.

https://docs.newrelic.com/docs/alerts-applied-intelligence/applied-intelligence/incident-intelligence/get-started-incident-intelligence

For trials, these are managed by our sales team. If you have a Sales rep at New Relic, you can contact them. If not you can reach out to https://newrelic.com/contact-sales :slight_smile:

Does Incident Intelligence come with an API?

Yes! https://docs.newrelic.com/docs/alerts-applied-intelligence/applied-intelligence/incident-intelligence/rest-api-applied-intelligence :slight_smile:

Nice. I like that. Thanks for the help. Let me evaluate these inputs and Ill reach out to you in case of any clarifications that may be needed.

1 Like

Sounds good @siddhant.agarwal - thanks :smiley:

4 posts were split to a new topic: [Java] CPU utilization for a JVM