When to Use (and when not to use) Baseline Alerting
Baseline Alerting is a very cool new tool that New Relic unveiled recently. You can read more about it in this document. This will show you exactly how to set up Baseline Alerting and goes over a bit of what I cover in this post.
Everyone wants to use Baseline Alerting! But there are certain situations where it shines, and other situations where you can accomplish your needs more easily (and with less needless noise) by using normal alert conditions.
What Baseline Alerting needs to work well
Baseline Alerting, in order to work well, needs every one of the following:
Lots of data points. Without a strong signal (ideally at least triple digit data points daily), our algorithm will not accurately assess your baselines. More data is better and increases the ability to predict your baseline metric.
Varying data points. If your data points are all flatlining at a certain value or bouncing between two values consistently, our algorithms will wind up setting your baseline very tight to that value and will consider anything that deviates from it as a violation. If your data points have some variation, the algorithms will shine and work to find a baseline range that works to alert you when data points start to fall out of control. The wider the variability, the wider the baseline will wind up.
A data stream that has been around for at least a week. Without a considerable history to examine, the algorithms will wind up setting your baselines too tight, will fluctuate wildly as they try to settle on a good baseline, and will generate notifications when you’re not likely to need them.
What are good use cases for Baseline Alerting?
APM Transaction metrics that fluctuate during certain times of day or during the week work well, as baselines algorithms will learn from the data history and attempt to predict these fluctuations.
NRQL alert conditions pulling in many data points that might fluctuate during weekly maintenance schedules will benefit from baseline algorithms so that you won’t get alerted for metrics that change during maintenance windows
Page load times on a high-throughput web app work well, as you’ll need to know when this falls out of control but you can allow the baseline algorithms to take page load history into account and only notify you when you need to be notified.
When should I use normal, non-baseline alert conditions instead?
An app that is brand new will have very little history for the baseline algorithms to work with – give it at least a week to build up some data history first.
If you already know exactly what range the metric should fall within, it’s best to just set a regular alert condition to notify you when it falls outside of that range.
If your metric has a very low throughput, the baseline algorithms will react unpredictably. In these cases, it’s best to determine a threshold using a normal alert condition.
Baseline Alerting is an awesome new tool – it allows you to set up a fire-and-forget alert condition that will adjust to data that fluctuates over time – but it’s important to make sure you’re using it in cases where it is able to really shine so that you’re not woken up needlessly.