Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Relic Solution: How to Use the Infrastructure Alerts REST API to Its Maximum Potential - Part 2: Compound Alert Conditions

restapi
newrelic-infra
infra
rest
alerts
api
levelup
infrastructure

#1

In a continuation of our series about using the Infrastructure Alerts REST API to its maximum potential, I will be now be describing the ability to create compound alert conditions.

Alerts look like this in New Relic: Policies have one or more conditions. Conditions have up to 2 thresholds - warning and critical. When a threshold is violated it either opens an incident or is rolled up into one which is already open. Incidents send notifications only when they are opened, acknowledged or closed. But what if you want a condition to be more complex than that? What if you want a gatekeeper threshold which must be violated before it will allow either the critical or warning thresholds to be violated? As long as everything within the condition has the same "event_type" then this can be done. This is what we would call a compound alert condition. It’s sort of like 2 conditions lumped together.

What makes this possible is that the Infrastructure Alerts condition POST API call not only has a field for a NRQL "where_clause" but also has other fields dedicated to the warning and critical threshold values. Since NRQL is powerful enough to define threshold values on its own we can use it to create the gatekeeper threshold which must be violated before it will allow the critical or warning violations to occur. Let’s run through an example. I will take the API call apart to show you how it works and then let you see the whole thing put together.

Here’s where the curl is invoked and the alert condition is given a general classification::

curl -X POST 'https://infra-api.newrelic.com/v2/alerts/conditions' \
     -H 'X-Api-Key:{admin_api_key}' -i \
     -H 'Content-Type: application/json' \
     -d \
'{
    "data":{
    "type":"infra_metric",
    "name":"High CPU Utilization",
    "enabled":true,
    "policy_id":{policy_id},

This is where we start to build the critical threshold and define the "event_type" which must be the same for both thresholds. This is a cloud integration alert so our "event_type" is ComputeSample and it will automatically FACET by ec2instanceId because ec2InstanceId is the domain of ComputeSample. All cloud integration metrics start with provider. Lets look for a cpuUtilization percent which is a Minimum of some value:

    "select_value":"provider.cpuUtilization.Min",
    "event_type":"ComputeSample",

With that we’ve started to build the critical threshold but we need to multitask now and whip up the gatekeeper threshold with a NRQL query in the "where_clause". The "event_type" must be the same and we want to know about cpuUtilization percent which is at Maximum 90 (a different team of hotshots will fight the hotter fires). Notice the numeric threshold value at the end of the "where_clause":

    "where_clause":"`provider.cpuUtilization.Max` < 90",

Now let’s go back to the critical threshold we were building. We were looking for a cpuUtilization percent which has a Minimum of some value. Let’s go with 60. This will be violated if a value goes above what we have set. The "time_function" of "all" represents a threshold time function of for at least whereas a "time_function" of "any" represents a threshold time function of at least once in. The threshold below is the equivalent of above 60 for at least 10 minutes:

    "comparison":"above",
    "critical_threshold":{
         "value":60,
         "duration_minutes":10,
         "time_function":"all"
         }
    }
}'

Here is the completed API call:

curl -X POST 'https://infra-api.newrelic.com/v2/alerts/conditions' \
     -H 'X-Api-Key:{admin_api_key}' -i \
     -H 'Content-Type: application/json' \
     -d \
'{
    "data":{
    "type":"infra_metric",
    "name":"High CPU Utilization",
    "enabled":true,
    "policy_id":{policy_id},
    "select_value":"provider.cpuUtilization.Min",
    "event_type":"ComputeSample",
    "where_clause":"`provider.cpuUtilization.Max` < 90",
    "comparison":"above",
    "critical_threshold":{
         "value":60,
         "duration_minutes":10,
         "time_function":"all"
         }
    }
}'

So what we end up with a is a condition which will only violate if the cpuUtilization percent is at Minimum 60 and at Maximum less than 90 for at least 10 minutes straight. It will then FACET by ec2InstanceId and let you know which host is violating.

Here are the other workaround posts:

Part 1: Exclusion Filtering
Part 3: FACET more than 500 hosts
Part 4: Cloud Integration Metrics & Evaluation Offset


Relic Solution: How To Use The Infrastructure Alerts REST API to Its Maximum Potential - Part 4: Cloud Integration Metrics & Evaluation Offset