Hi folks!
We recently released Sliding Windows Aggregation (SWA) alert conditions. You can read more about them in this post or the release announcement. SWA is an upgrade to “Sum of query results” thresholds, so we will eventually be getting rid of “Sum of query results” thresholds (you can find an announcement at this link, and further announcements will be forthcoming as the date approaches).
As this gets closer, we will provide migration tools, both single and in bulk, to help you convert your old “Sum of” conditions to new SWA conditions. However, some of you may want to be proactive about this change, or may be curious about the logic being applied in this conversion. What follows is a look at a case where you might not want to do a straight conversion, and a breakdown of the logic of the conversion itself.
First, let’s look at the reasons why people use “Sum of query results,” and a situation where you may not want to do a straight conversion to SWA.
UPDATE as of 6 June:
- Bulk conversion tool is available – see this link. It will list out all of the “sum of” conditions on your account(s) and let you convert them all at once.
- An in-UI conversion tool is also available, to convert one condition at a time. Note that this tool also includes a function that will show you what would change in Terraform if you were to convert this condition using a Terraform script, and the Nerdgraph mutation you would make to convert this condition. If you are converting your automation scripts, this should help!
The two most common reasons people chose “Sum of” thresholds
Many folks have found value in “Sum of” thresholds because they work to smooth out a volatile data stream. This is the intended use-case, and “Sum of” works well for that, although SWA allows for more fine control over timing and aggregation method. If you have a “Sum of” condition that was created for this reason, doing a straight conversion makes perfect sense.
The other most common reason for using a “Sum of” threshold is to address an issue with sporadic data. “Sum of” was historically introduced at a time when aggregation windows were fixed at 1 minute and there was no “Gap Filling” option. So users were advised by support and account teams (or figured out for themselves) to use “Sum of” to “smooth over” sporadic data. In this case, you could now configure Gap Filling to more directly address your need. You can find more information on Gap Filling at the following links:
- Alerting: Loss of signal detection and configurable gap-filling strategies
- Streaming alerts: key terms and concepts
- Fill data gaps
- Relic Solution: How Can I Figure Out When To Use Gap Filling and Loss of Signal?
Keep in mind that converting one of these latter type of “Sum of” conditions to use SWA will not affect the functionality of the condition – it will still function as before.
Now that you understand the subtle difference between using “Sum of” for smoothing volatile data and using it to make up for the lack of a Gap Filling strategy, let’s talk about the logic used to convert from “Sum of” to SWA.
How to manually convert a “Sum of” condition to Sliding Windows Aggregation
- Switch “Sum of query results” in your threshold to “Query returns a value of”
- Turn on the
Use sliding window aggregation
toggle - Make note of your original window duration
- Determine the condition’s threshold duration (ex.: “for at least 10 minutes,” “at least once in 5 minutes,” etc.). Set
Window duration
to this value. - Set the
Slide by interval
to the original value ofWindow duration
, that you noted down in step 2 - Save your condition
Feel free to use these steps to manually convert your existing “Sum of” conditions. Keep in mind that we will be providing a UI-based conversion method (for a single alert condition) and a bulk edit tool NR1 app.
I hope this article helps to better understand how the conversion of “Sum of” to Sliding Windows Aggregation. Please feel free to post any comments or questions below.