New Release: Sliding Windows Aggregation Alert Conditions

Hello folks!

Today we’re officially releasing Sliding Windows Aggregation Alerts (SWA). This is a great way to smooth out a volatile signal over the time period you specify. I went into detail about how these will work and some of the ways we’ll be providing “guard rails” to make sure your SWA alert conditions work correctly in my #whats-on-deck post: What’s On Deck for Alerts: Sliding Windows Aggregation!

The documentation for SWA can be found at this link.

The way we’ve implemented the limits that I mention in that post I linked above is to use a UI slider for selecting your slide by interval.

The values shown in the slider will change based on your window duration setting. The minimum value in this screenshot is circled in blue, the maximum value is circled in red, and your current selected value is circled in green. As you move the slider, it will hop to the various possible values, dependent upon your window duration setting. You can also change the slider to show seconds, which may result in more options being available (for example, in the above screenshot, if I switch to “seconds” instead of “minutes,” the lowest possible value becomes 30 seconds).

If you edit your window duration, this slider will adjust to fit to the new value. If you select a window duration that only has one possible value, the slider will disappear entirely and that value will be shown in the box. An example of this would be a window duration of 1 minute: since the slide by interval can’t go any lower than 30 seconds, that is the only value available. In that case, the slider will disappear and that value will be automatically selected.

As an added benefit, we have raised the maximum Window duration (aka “aggregation window”) to 120 minutes. This increase in the maximum value is available across all NRQL alert conditions, not only SWA!

I encourage you to use this for any use-case where you need to alert on a volatile signal but would prefer to use an aggregated value (CPU % and throughput both come immediately to mind). Feel free to post the use-case where you’re finding SWA useful, or any questions you might have, below.

4 Likes

Hi @Fidelicatessen, thanks for sharing this update.

Of the 4 Golden Signals for SRE, you mentioned Traffic (i.e., throughput) and Saturation (i.e., CPU utilisation) are good use-cases for SWA.

For Errors (i.e., rate of errors), I have a NRQL alert condition trigger when the query returns a value above a threshold for at least a few minutes using “event flow” streaming method. With the query being:

FROM TransactionError, Transaction
SELECT percentage(count(*), WHERE error.class IS NOT NULL)
FACET appName

Which looks like this in situ:

Do you think this might also qualify for a SWA use-case, or not so much?

Hi @Rishav! Nice to hear from you again :slight_smile:

If you want to get alerted on what could be a volatile spike of errors, then don’t use SWA. Personally, if it were my app, I would want to know about volatile spikes of errors, but use-cases are always different, so it would really depend on what you need for this use-case.

If you want to “smooth out” the volatile spikes of errors, and alert on an aggregated value over X minutes, SWA will definitely help to do that.

3 Likes