Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Fundamentals Certification Tip of the Week, 3 of 5

certification
tipoftheweek

#1

New Relic University’s Fundamentals Certification; ‘Get Certified, Get Socks’.

If you were not directed to this post from the ‘Get Certified, Get Socks’ post, I recommend you take a look at that for some more context before continuing on this thread.

The Fundamentals Certification is difficult, there are a total of over 180 possible questions covering every area of New Relic, you get a random selection of 20 questions for each test you take. It’s impossible to know what will come up in your test, but it’s good to be as prepared as possible. To that end, we are posting weekly Tips of the week, that are geared towards a specific sub-section of the Fundamentals Certification.

The Tip of the Week this week will be aimed at Alerts . With that we will be covering some of the most important pieces of the alerting platform. So, from New Relic Alerts, we’ll look at some of the Alerts terms and what they mean, incident preferences, notification channels, and some bonus resources.



Introduction:

Effective alerting is the key to monitoring and observability. The value you get from monitoring your services, is infinitely multiplied when you know that you will be told if there’s a problem. Not having to constantly stare at dashboards allows you to focus on further iterating and innovating on your services.

But there is a ‘too much’ when it comes to alerting. It’s important to set it up such that the things that wake you up at 3am really should wake you up at 3am. So, having the right conditions set up, and the right notification channels set up, is crucial to effective alerting. So let’s dive right in.

Alerts terms:

There is a full New Relic Glossary here - but for Alerts specific terms, the important ones are detailed below;

Term Definition
Condition Conditions contain the criteria for your alerts. The condition defines the threshold that must be breached for a violation to occur.
Threshold The threshold you define in an Alert Condition is the event that most occur for a violation to occur. An example being: Response Time > 5 seconds for at least 10 minutes. This threshold would be breached if your application response time is consistently slow (or, slower than optimal), for 10 consecutive minutes.
Violation A Violation is a single instance of your conditions thresholds being crossed. Violations are what trigger incidents, or get rolled up into existing incidents.
Incident An Alert Incident is an event, or series of events, that notify you of your thresholds being breached. Incidents contain 1 or more associated violations. Incidents are what trigger notifications.
Notifications Notifications are what our Alert service send to you in the case of an Incident occuring. Notifications are sent on 3 occasions; Incident Open, Incident Acknowledged, Incident Closed.

These are some of the most important terms to know for Alerting. Like I said before, there is a full glossary of New Relic terms that I absolutely recommend you scan through, but for now, these top 5 for alerts should give us enough to make it through this Tip Of The Week.


Incident Preference:

I want to preface this section by saying that my colleague @sschneider did a far better job than I could at deep diving into Incident Preferences here: Relic Solution: Alert Incident Preferences are the Key to Consistent Alert Notifications, but we’ll go through a high level overview here.

Your incident preference settings (which is a policy wide setting) determine what violations in your policy create new incidents.

As we’ve already seen, Notifications are only sent 3 times in an incident’s lifecycle, so it’s important to have your incident creation strategy set up in such a way that you can be notified by the events that are important to you.

There are 3 incident preference settings;

  • By Policy (Default): By Policy means that any critical threshold violated in your policy, under any condition, can trigger a new incident. Since the incident preference here is by policy, we are saying that we would like every other violation within the policy to roll up into this incident. When that initial incident is over, a new violation will create a new incident. But during the lifetime of any incident, all further violations will be included in that incident.
    • What does this mean for my notifications?
      • Since we have only one incident per policy at any given time, we only have the 3 notifications associated with that incident. Incident Open, Incident Acknowledge, and Incident Close.
  • By Condition: As the name suggests, By Condition allows us to say that we want a new incident created for the first violation that breaches each conditions threshold. Every following violation of those conditions are rolled up into the open incident tied to that condition, until that incident is closed.
    • What does this mean for my notifications?
      • For a Policy with 10 conditions, we now have a max of 10 open incidents at a given time. Since each incident can send 3 notifications, we have increased out notification potential from 3 with By Policy, to 30 with By Condition. This is a more granular notification strategy than By Policy.
  • By Condition and Entity: By Condition and Entity lets us get even more granular than By Condition. Where By Condition is going to created 1 incident for each condition, regardless of the number of violations within that condition, By Condition and Entity will create a new incident for every App/Host/Site/Service that is in violation of the threshold configured.
    • What does this mean for my notifications?
      • This is by far the most verbose setting in terms of incidents created and notifications sent. If we take our previous example of a policy with 10 conditions, and we say that each condition has 10 entities attached to it, then where our notification potential for By Policy was 3, By Condition was 30, and By Condition and Entity is 300. That is, a new incident for all 10 entities, and all 10 conditions (100 incidents), with 3 notifications per incident.

So - Making sure this setting is configured properly is key to making sure you are notified when you need to be.

Notification Channels:

The name is quite self explanatory, yes, Notification Channels dictate who gets the message that an incident is occurring.

What’s important here, is making sure the right channels are tied to the right policies.

That is to say, I wouldn’t recommend you have a policy full of APM Application conditions, notifying every team that manages an APM Application. With that set up, the team that manages your payment service will get woken up for issues in the login service, a service they may not be familiar with. This is a specific example, but I’m sure the point stands true for your engineering department too.

For this reason, my suggestion is to set your policies up based on who gets woken up.

In my example we mention a Payment service. There are a couple of things that can be included in that. Synthetics monitors running through the checkout workflow, APM Key Transactions for your payment specific functions, Browser JS Errors on URLs that correlate to the payment service (/checkout) for example.

All of these things can be added as separate conditions in one policy, perhaps named Payments Team Policy. The notifications channels for that policy could be your team’s PagerDuty address, team email address, team Slack Channel, a direct email to your team’s manager, etc…

This way our policies won’t notify the wrong people. But on the topic of who gets woken up for an incident; another tip is to audit your alert policies, and remove conditions that may not be critical. Or, at least, route the non-critical conditions to a separate policy that points to a notification channel that won’t wake you. Alert fatigue is real, ensuring only your most critical conditions come to you immediately is a good way to ensure your Alert Policies are kept tidy, and you and your engineers don’t become fatigued by too many notifications.


Bonus Resources:

  • Alerts - New Relic Documentation parent page for all things Alerts
  • Rest API Docs - New Relic Documentation pages for interacting with Alerts through the Rest API.
  • Incident Preference - Explorers Hub Level Up post on the Incident Preference settings available, and what they mean.
  • Level Up Posts - Explorers Hub Level Up Posts covering Alerting.
  • New Relic University - New Relic University materials on Alerts.

What’s next?

Now, use your Alerting knowledge in your attempts taking the New Relic University Fundamentals course. Post your results back here to earn some socks:


New Relic Fundamentals Certification: Get Certified, Get Socks