One of the common questions that gets raised in the community and support tickets is around when notifications are sent in New Relic Alerts. It definitely reminds us of the old “if a tree falls in the woods” adage: If an alert triggers but no notification was sent, did the alert really happen?
Obviously the answer is no. If you’re not receiving a notification, it’s really no use to you in managing and operating your software. Fortunately, we’ve found that in many of the cases we have encountered, the problem has been about configuration of Alerts policies, and more specifically choosing the right incident preference for your policy.
The Scenario:
Your site has an outage. You see it. Your customers see it. Even more concerning than the outage is that leading up to, during, and after the outage, not a single one of your server alerts sent you or your team a notification. Once you get service restored, you check on those Alert policies to see what happened. Sure enough, your policy is there and your alert conditions seem clear.
So what happened? To find out, let’s take a look at what goes into an Alert policy.
The Anatomy of an Alert Policy:
An Alert policy is made up of a few parts:
- One or more conditions that define when violations should be opened and closed
- Notification channels to determine who and how to send notifications to
- Alert policy incident preferences to define how the violations emitted from the policy’s conditions will be grouped into incidents
Checking these areas first when you think an Alert notification should have been sent is the best first step.
Navigate to the Alerts product and select the policy you are interested in to view the conditions (which include the thresholds) and the notifications that have been established.
If everything looks like it is configured correctly for conditions and notifications, the last thing to check is the incident preference.
An incident is a collection of violations, and New Relic can greatly reduce alert noise because it sends notifications when incidents open, are acknowledge and closed. How you choose to create incidents is up to you and will affect how many notifications you will receive.
Here are the three options for how your policy can group violations into incidents:
- By Policy: Only one incident will be open at a time for the entire policy. This creates the fewest number of alert notifications and requires immediate action and closing the incidents to be effective. If there is an active incident when this preference is selected, no new alert notifications can be created until the incident is closed.
- By Condition: One incident will be open at a time for each condition in your policy. This configuration creates more alert notifications and is useful for policies containing conditions that focus on targets (resources) that perform the same job; for example, servers that all serve the same application(s).
- By Condition and Entity: An incident will be created for every violation in your policy. This option creates the most alert notifications and is useful if you need to be notified of every violation or if you have an external system where you want to send alert notifications.
Note that when you create a new alert policy, the default incident preference is “By Policy.” Remember, if you have an open incident AND your alert incident preference is set to “By Policy,” no new alerts notifications will be sent. You’ll need to close the open incident before these notifications will send again. You may want to set the incident preferences to “By Condition” if the alert is truly important, or make sure that you add closing open incidents to your incident response playbooks.
Now that we’ve looked at settings that how and when notifications are sent, you should feel much more confident in knowing that the alert policy you set up will get your attention at the expected times.