Right now, outlier alerts detect if along these lines (pseudo-phrased): “If there are any violations of this alert criteria, as an outlier to the trend of the query results, for at least this long, trigger an alert.”
Suppose this NRQL query is something like this (we use several variations of this general approach):
“select average(duration) from transaction where appName=‘myAppName’ facet host”
And suppose I have it set as an outlier with a particular threshold.
This configuration will detect (properly) if a single server goes bad and has a wildly different response time from the rest of the servers in the group.
The problem is that if there are several servers that get performance “spikes” lasting only a minute or so, then it will be attributed as “there were at least one outlier at all times during the threshold of this condition, so fire the alert”. But that’s not what we want – we want to only trigger on individual hosts that are outliers, not the fact that the group as a whole does have many outlier blips.
To solve this issue, I propose having an additional setting to configure whether the outlier alert condition is looking for Group Outlier (which is the current functionality), or Instance Outlier (which looks for single outliers).
New Relic Edit
- I want this too
- I have more info to share (reply below)
- I have a solution for this
We take feature ideas seriously and our product managers review every one when plotting their roadmaps. However, there is no guarantee this feature will be implemented. This post ensures the idea is put on the table and discussed though. So please vote and share your extra details with our team.