What's On Deck for Alerts: Sliding Windows Aggregation!

Hi folks!

As part of our new What’s On Deck? series, I want to share with you a feature we are actively developing and that will be coming to you soon. Follow the #whats-on-deck tag in general for new Alerts updates, and follow this thread specifically if you’d like to get updates on how close we are to release!

We will be releasing the ability to use Sliding Window Aggregation (SWA) in your NRQL alert conditions. Sliding Windows is something that is currently available in NRQL queries. I encourage you to read more on it in this documentation. We will be adding this functionality to Alerts soon, and I want to make sure you’re aware so that it won’t take you by surprise, and so that you can get excited for this new functionality.

As part of this improvement, we will also be increasing the maximum aggregation window size from 15 minutes to 120 minutes!

Why is this so cool?

  • It will allow for more consistent aggregation of erratic or volatile signals
  • More accurate and reliable alerting for infrequent or inconsistent signals
  • Ease of troubleshooting – you can duplicate sliding window behavior in ad-hoc NRQL queries
  • You can use aggregators other than sum

OK, how does it work?

The documentation on Sliding Windows in NRQL queries covers the basics, but I’ll quickly go over the formula we’ll be using (and that you can use too) to convert your “Sum of query results” alert conditions over to using SWA.

First of all, here’s the formula – you can reproduce this in NRQL for now, so I’m using TIMESERIES, which we do not normally allow in Alert conditions:

<your query> TIMESERIES <your threshold window> SLIDE BY <your aggregation window>

So if you have a threshold something like Sum of query results is over 100 at least once in 3 minutes, your threshold window is 3 minutes. Let’s assume you have an aggregation window of 1 minute (the default). This would result in the last 3 minutes worth of data being aggregated each minute.

Here’s an example of what that would look like. Imagine each block is 1 aggregation window of data, and inside the block is the aggregated value for that window. I’m going to use sum for my aggregator, since that just makes things easier to think about.

  • On minutes 1 and 2, no evaluation would take place. That’s because a buffer is being filled, and we do not yet have 3 minutes worth of data.
  • On minute 3, we now have a full buffer of data (3 minutes’ worth), so we can aggregate the values. The evaluated value for minute 3 would be 6 (1 + 2 + 3)
  • On minute 4, the 3-minute window slides by 1 minute, and a value of 9 is evaluated (2 + 3 + 4)
  • On minute 5, the 3-minute window slides by 1 minute again, and a value of 12 is evaluated (3 + 4 + 5)
  • and so forth

Will I be able to use TIMESERIES and SLIDE BY in my query?

We do have plans to allow this, but for now you will be able to use sliders in the UI to control these values. Keep in mind that your aggregation window (used for SLIDE BY) needs to be smaller than your threshold window (used for TIMESERIES), and the threshold window should be evenly divisible by the aggregation window.

Which brings us to …

Ways a slide-by condition can be broken

We plan to disallow these cases, but I want to make sure you all understand the why.

1st way to break your slide-by: use the same SLIDE BY setting and aggregation window

This is not a terrible way to break your condition, but it will make it so that you’re not really getting slide-by functionality.

If your threshold window and aggregation window are the same value, you wind up actually getting the traditional alerts behavior. That is, instead of getting a nice, incremental slide, like this

You wind up with a “cascading” aggregation, which looks more like this

2nd way to break your slide-by: use a higher aggregation window than SLIDE BY setting

This is a pretty terrible way to break your condition, since you will wind up with gaps which are not evaluated.

Let’s say you had your threshold window set to 3 minutes, but your aggregation window set to 6 minutes. That would look like TIMESERIES 3 minutes SLIDE BY 6 minutes.

You would wind up with behavior like this

3rd way to break your slide-by: use a SLIDE BY setting that does not divide evenly into your aggregation window

This is a somewhat terrible way to break your condition. Since your SLIDE BY setting is lower than your aggregation window, but will leave a gap once in a while.

Here’s an example: imagine a SLIDE BY setting of 60 seconds, and an aggregation window of 90 seconds. For the first minute, everything is good, but on every other minute, the SLIDE BY setting moves forward by half an aggregation window, which leaves half an aggregation window as a “gap” that does not get evaluated.

We will have validation in place to disallow these, but it’s important that you understand why.

You may think that SWA sounds familiar. That’s because I included this feature in my big announcement, which, I recommend checking it out at this link if you haven’t already.

Sliding Windows Aggregation, will be our replacement for the Sum of query results threshold type in NRQL alert conditions (documented here). This will be a gradual replacement, so that you will still be able to use your Sum of query results thresholds for a time after SWA is released.

What’s wrong with “Sum of query results” thresholds?

In a nutshell, they will only sum data. They won’t give you the maximum value, the minimum value, or the average value, they will only add data points together and give you a sum over a rolling time window. While this is certainly useful for some use-cases, there are many other cases where an average, min, max or some other aggregator is needed.

So … you’re saying Sliding Windows Aggregation will fix this?

Yes! When you use Sliding Windows Aggregation (SWA), you control how we aggregate the sliding window in your NRQL query when you use an aggregator function. If you use average, we will give you the average over the sliding window, instead of always the sum and only the sum.

Hey, this seems pretty cool! When is it available?

I can’t share an exact date with you yet, but I encourage you to try out the SLIDE BY function in ad-hoc NRQL queries to see how it works now!