Relic Solution: What's the Difference Between Evaluation Offset And Delay/Timer?

Hi folks!

You may have noticed that we’ve added two new streaming aggregation methods in NRQL alert conditions. You can find the announcement about this over at this link. I think it’s pretty exciting, since we’ve been looking forward to this ever since we moved everyone over to the streaming alerts platform back in Summer of 2020. As part of this change, we have done away with the evaluation offset setting, and introduced the delay/timer setting to take its place.

We strongly recommend you use one of the new methods in your new and pre-existing alert conditions – they improve accuracy and reduce mean time-to-detect (MTTD). However, for one reason or another, you may want to continue using the “Cadence” method for now. Which brings up the question: If my evaluation offset was x, what value should I use for Delay?

Evaluation Offset: the old version

Back in the olden days (a week or two ago), we had only one aggregation method and we used evaluation offset to set the delay. The way this worked is that we would query your data once per aggregation window, using the query you gave us along with the evaluation offset.

Imagine a very simple alert condition, using an aggregation window of 1 minute and an evaluation offset of 3 “windows.” Let’s populate this imaginary alert condition with a simple query:

SELECT average(cpuPercent) FROM SystemSample

With the settings given above, we would run this query once per aggregation window (once per minute) with the following clause added:

SINCE 3 minutes ago UNTIL 2 minutes ago

In every case:

  • We would formulate the SINCE value by multiplying the aggregation window by the evaluation offset
  • We would formulate the UNTIL value by subtracting one aggregation window from the SINCE value

To put it a different way, if we had an aggregation window of 1 minute and an evaluation offset of 3 windows, we’d wait 3 minutes from the start of the window for any late-arriving data before aggregating the window and evaluating it.

An evaluation offset of 0 was not allowed, since this would result in a query which would never return results (SINCE 0 seconds ago UNTIL now). Instead, you had to use a value of at least one aggregation window.

Delay/timer: the new version

With delay and timer, we aren’t working in terms of aggregation windows any more - we’ve decoupled this configuration from the aggregation window entirely. These values effectively say, “how long AFTER the end of the bucket should we wait for any late arriving data before aggregating and evaluating it?”

In addition, the streaming alerts platform no longer makes a once-per-minute query, but instead gathers data literally as it is streaming in to New Relic. Keep that in mind as you read on – these SINCE ... UNTIL clauses are approximations as to what is actually happening.

Imagine the same alert condition from above. If we used the same exact settings, but replace evaluation offset with a delay setting of 3 minutes, we effectively get this added on to the query in our alert condition:

SINCE 4 minutes ago UNTIL 3 minutes ago

Using the same old setting actually increases the MTTD! That’s because the SINCE ... UNTIL clause is formulated differently, now that we’re using delay/timer instead of evaluation offset. With delay/timer, in every case:

  • We formulate the SINCE value by adding the evaluation offset to the aggregation window
  • We formulate the UNTIL value by subtracting one aggregation window from the SINCE value

So, before, you could only have evaluation offset set to a multiple of your aggregation windows. Now, however, you can set delay/timer to any value, including 0. What SINCE ... UNTIL clause would a delay/timer of 0 result in? If you were using an aggregation window of 1 minute, you’d get:

SINCE 1 minute ago UNTIL now

As a further example, if you set delay/timer to 2 (using a 1 minute aggregation window and the “Cadence” aggregation method), you’d get the following clause added:

SINCE 3 minutes ago UNTIL 2 minutes ago

OK, this is all very interesting, but what’s the TL;DR?

To translate an existing condition to the new settings, (evaluation offset - 1) * aggregation window = delay/timer setting.

Since delay/timer measures time from the end of the aggregation window, instead of the beginning, you will need to remove one aggregation window when thinking about evaluation offset as a delay/timer setting.

To reiterate, delay/timer configures how long to wait AFTER each aggregation window for late data. The evaluation offset was expressed in the number of windows and included the window itself.

CAVEAT: Keep in mind that this is only important if you’re sticking with the “Cadence” method. By switching to one of the new methods instead, you can potentially reduce this delay/timer value even more, which will reduce your MTTD.

1 Like