NRQL Outlier alert conditions end-of-life

Hi folks

You may have read the announcement I posted about the exciting new initiatives for AIOps/Alerts in New Relic over the next 9 months or so. If not, I encourage you to take a look at this link.

One thing I mentioned in that post is that we would be deprecating and then disabling NRQL Outlier alert conditions. I’ll go into a bit more detail on that here.

  • At first, we’ll post a banner in the UI when you edit an alert condition. You should see the banner in the Alerts UI this week. The banner is a warning that EOL is approaching and is planned for 1 February, 2022.
  • After a month or two, we’ll turn off the ability to make new Outlier conditions. However, any pre-existing Outlier conditions on your account will continue functioning. We’re tentatively aiming for 1 December, 2021 for this.
  • On 1 February, 2022, we will turn off Outlier alert conditions.

Over the last year, we have improved our NRQL Baselines’ evaluation method to support the FACET clause so that you can now watch for anomalies in your cluster, or group of similar signals (see docs on faceted baseline conditions at this link). This method is an improvement over Outlier conditions, since it will alert you to exactly which member of the group is misbehaving. You may be able to meet the same need as your old Outlier conditions met by using a faceted baseline condition.

4 Likes

Thanks for taking the time to clarify, @Fidelicatessen, much appreciated.

Do correct me if I’m wrong, going by the docs, faceted baseline alerts are not an exact replacement by any means. Notably, baseline alert takes each entity’s individual historic performance into account while the outlier alert works on the immediate group’s signal pattern.

In practise, if a server has been misbehaving for an extended period of time, the faceted baseline alert treats this as “normal”, despite significant difference from its load-balanced counter-parts. Whereas, the outlier condition alerts accordingly since it discerns the anomalous behaviour which contrasts with the rest of the group. We ran into one such incident today where the outlier picked up what the faceted baseline missed:

image

That being said, I agree it’s an improvement to alert on a specific facet entity instead of a generic group. Though, at this time, it seems as if we’re about to drop support for a stand-out feature without a reliable replacement. Perhaps, stddev() can be wrangled in some way, as @tyzbit has recently shown off elsewhere?

Hi @rishav.dhar,

Good to hear from you. You are the primary user that we thought about while making this decision.

You are correct in saying that faceted baselines is not a direct replacement for NRQL Outlier conditions. These 2 functions are best represented as a Venn diagram. Neither is a 100% replacement for the other, but the baseline technology is where we are focusing our investment.

The current Outlier conditions may point you to 1 member that is out of band for 5 minutes, or it may indicate that there were 5 separate entities that were each out of band for only 1 minute, then self resolved. Then, during analysis , we won’t be able to tell you which 1 , or many there were. This creates too much noise, and is one of the reasons for such low adoption of this feature.

If there is 1 entity that is out of band, and that is abnormal for it, then a well tuned baseline should identify that anomaly.

We are focusing a lot more of our attention on improving our anomaly detection capabilities. This will include various approaches for monitoring cluster health, and identifying problematic cluster members. However, this particular implementation is not viable going forward.

1 Like

Hi @bgoleno, thanks for your response, I really appreciate it.

I can definitely understand the shift towards faceted baseline alerts; they’re significantly more valuable to precisely identify the source of the anomaly with minimal distractions.


Like you say, it’s a loss of existing functionality with no 100% replacement for it, which is a bit of a shame for existing workflows that rely on it.


I think it’d be fairer to say the emphasis is on “that is abnormal for it” rather than the “out of band”, since the facet baseline alert condition has no concept of a grouping: just the faceted entity’s past performance, I’d assume?


I’m happy to attest to that. Particularly with regards to recent developments @Fidelicatessen announced in the alerting space, it’s been pleasantly surprising to see these implementation go-live with full API support. This allows users like myself to actually try out these features with Terraform to orchestrate large-scale changes across our monitoring enterprise on day 1.

Always keen to see more progress in this area, particularly with the backing of Terraform support.

3 Likes