As the Alerts beta continues I have noticed that we aren’t all speaking the same language when it comes to terminology. Here is a handy guide to what we mean when we throw around “notification” or “critical threshold” in tickets and forum responses. I am going to approach this from the point of view of someone who has just signed up for the beta when I introduce the terms, taking them in an order that you might encounter them in a hypothetical first experience. The links to keywords are to our public documentation.
Let’s start with the term “Alert” since that is ostensibly what we’re talking about. This is not the message you get telling you that something has gone wrong. From here on Alert will be used to refer to the v3 Alerts and the concept of alerting in general. It will all make sense by the end, I promise.
So you’ve just opted into Alerts. Congratulations! The first thing you need to get going is a Policy. So you click on Alert Policies pick a meaningful name and Create policy
You’re on a roll! But what have you created? You can think of a policy as a container for Conditions and Notification Channels. More on those later, just know that the Policy acts as a way to collect potentially related Conditions and Notification Channels. You don’t have to put them together in a Policy but it is a much cleaner way to keep track of them than creating a policy for every Condition.
So that brings us to the contents of the Policy container: Conditions and Notification Channels. I like to start with Notification Channels because without them you are unlikely to know that anything is happening. The Notification Channels link up in the grey navigation bar will take you to the page where you can find Create a notification channel.
The simplest, in my opinion, is email because it doesn’t involve any additional setup. So click “Create a notification channel”, select Email from the drop down menu, put your email address in and click Create channel.
Now we’re ready to create an Alert Condition! Select your new policy and then the tab that says 0 Alert conditions then “Create a condition”.
There are a lot of options this button reveals. Fortunately the first row of choices purposely mimic the tabs up in the grey navigation bar that correspond to our various products. Hopefully you have an app reporting to APM, or something else you would like to create an Alerts Policy for. If not, well you can still follow along but I suspect you’re not going to get much out of it. I am going to choose APM as my product and Application Metric as my type of condition.
Here again my choice is guided primarily by simplicity. We’re trying to explain key terms through examples not dissect all of Alerts. Next, select targets (hit that button!). This is the app(s) we are going to be looking at when this Alert Condition is evaluated.
Next, define thresholds (hit that button!).
It’s been awhile since we introduced a new term and this is a great spot to talk about Critical Thresholds. The easiest way to think of this is as a line in the sand. When it is crossed, either once or for a period of time (more to come!), things really kick into action. In the UI a Critical Threshold is the one with the red octagon containing an ‘X’ beside an empty text box. Let’s fill in some boxes and then I’ll talk a little more about Critical Thresholds and what happens when you violate them.
I like setting up test Alert policies with just one Condition and I like that Condition to use Throughput. The reason is that it’s a very straightforward metric and easy to generate data that either violates or does not violate a Threshold. So I’m picking Throughput (web) has a call count above 200 calls for at least 5 mins. What this translates into is, “When my app has at least 200 requests per minute for at least 5 mins, I would like to consider that a violation of a Critical Threshold.”
In order to cross my line in the sand I am going to need to step over it and stay there for a little bit. Because I chose “for at least 5 mins” I must have 5 consecutive minutes of data reporting in violation of the Critical Threshold. This is a really crucial point! If you picked a metric that might sometimes have a 0 value then, at the time of this writing, you might skip a data point and Alerts will not consider your Critical Threshold to have been violated!
So while I was typing all this I had my app with lots of elements refreshing every 20 seconds in order to generate some data and violate the Critical Threshold of the Alerts Policy Condition. Now if I were to navigate to the Alerts beta page by clicking up in the grey navigation bar I would see that I have an open Incident (which is also our final vocab word for the day).
Incidents are like another container. When a Critical Threshold is violated an Incident gets created and we put events related to it inside the container. These events can be violations of critical thresholds, notifications sent, opening and closing the incident, and acknowledgement of incidents (among other things). Note that an Incident will only be created when a Critical Threshold is violated but other events will still show up in the Incident if they are related. I will avoid too much detail here but may elaborate more in a future post.
Hopefully by now you’ve got an Alerts policy created that is meaningful to you, or you at least feel comfortable creating one and communicating with us about it.
For complete details about the New Relic Alerts Beta, including setting up alert policies, assigning notification channels, and viewing your dashboards, see New Relic’s Documentation site.