Alerting on Spotty Metrics - Possible to increase Aggregation Window times?


I was wondering if it was possible to increase the aggregation windows to something higher than the allotted 15 minute window maximum that’s currently in place.

Where we’re looking at:
New Relic One > Policies > my team’s policy > Alert Conditions > Condition Settings > Advanced Signal Settings > Aggregation window

Problem we’re trying to solve:
Our metric is fairly spotty - during the day, we get a lot of events; during the night, we can go for an hour without seeing an event fire. However, we would like to be alerted if the events are failing more than they are succeeding when they do happen (we would like to alert if the metric drops below 30% and stays there for an extended amount of time). Our current train of thought is to evaluate the percentage every 60 minutes (total number of successes / total number of requests) and if it stays below 30% for an extended period of time, to send an alert.

Open to other suggestions on how to monitor a spotty % metric, but increasing the aggregation windows seemed like something that we’d want.

Edit: We’re assuming that the aggregation window is a rolling window, so the evaluation (calculating the %) being suggested would also happen every minute, but for data from the past 60 minutes.

I ran your question by our support team and they confirmed that this is a feature that our product team is already working on. Can’t give you specifics on timeline. There is a potential workaround that could help that you may want to check out:


Note: We were told separately about the “Metric Aggregator” app as well, which may be able to do some of the aggregation described without custom building anything. I’m not sure if the solution was successful, but it is another place to look.

Thanks for the update @rraposa! let us know how the Metric Aggregator app works out for you if you get the chance!

Unfortunately, I learned from my colleagues that the Metric Aggregator didn’t help and was found to be buggy. New Relic suggested using the APIs directly, I think described in the linked doc above. I don’t think we moved forward on that because we don’t have the time to implement a custom solution. We’re hoping the Aggregation Window functionality gets expanded beyond 15 minutes.

Can someone confirm if this assumption is correct? I’m looking to increase the aggregation window in alerts and want to understand if changing this value will mean alerts are still evaluated every minute or end up being evaluated less frequently.