I was wondering if it was possible to increase the aggregation windows to something higher than the allotted 15 minute window maximum that’s currently in place.
Where we’re looking at:
New Relic One > Policies > my team’s policy > Alert Conditions > Condition Settings > Advanced Signal Settings > Aggregation window
Problem we’re trying to solve:
Our metric is fairly spotty - during the day, we get a lot of events; during the night, we can go for an hour without seeing an event fire. However, we would like to be alerted if the events are failing more than they are succeeding when they do happen (we would like to alert if the metric drops below 30% and stays there for an extended amount of time). Our current train of thought is to evaluate the percentage every 60 minutes (total number of successes / total number of requests) and if it stays below 30% for an extended period of time, to send an alert.
Open to other suggestions on how to monitor a spotty % metric, but increasing the aggregation windows seemed like something that we’d want.
Edit: We’re assuming that the aggregation window is a rolling window, so the evaluation (calculating the %) being suggested would also happen every minute, but for data from the past 60 minutes.