Relic Solution: How Can I Figure Out When To Use Gap Filling and Loss of Signal?

Hi friends! I’m writing this about a month after the Streaming Alerts Platform started rolling out, and I wanted to proactively answer some questions that are coming up a lot in support tickets. If you’d like to learn more about these new features, read on!

First of all, let’s make sure to provide some documentation links here:

Now that you have some documentation to refer back to, let’s take a step back to discover when you’ll want to use these new settings. Before you can answer this question, you need to understand…

Query order of operations

By default, the aggregation window is 1 minute. You can change that, but let’s move forward with the assumption that you’re using the default setting.

Every minute, a window of collected data is aggregated using the function in the NRQL alert condition’s query. The query is parsed and executed by our systems in the following order:

  1. FROM clause – which event type needs to be grabbed?
  2. WHERE clause – what can be filtered out?
  3. SELECT clause – what information needs to be returned from the now-filtered data set?

Let’s take an example query and see what this means in practice.

SELECT count(*) FROM SyntheticCheck WHERE monitorName = 'My Cool Monitor' AND result = 'FAILURE'

Let’s say that, for this minute, there are no failures.

The system would first grab all of the SyntheticCheck events on your account (FROM clause). It then filters through that mountain of events, looking only for the ones that match the monitor name and result that I’ve specified (WHERE clause). Once that is done, and this is very important

If there are no events left after the first two steps, the SELECT clause will not be executed.

This means that aggregators like count() and uniqueCount() will never return a zero value. When there is a count of 0, the SELECT clause is ignored and no data is returned, resulting in a value of NULL.

In the past, in some cases, New Relic used to insert synthetic zeroes to cover over those NULL values, and in other cases would let the NULL value stand. With the Streaming Alerts Pipeline, now New Relic never inserts a synthetic zero. You now have the power to configure what is done with all of those NULL values.

Does this mean I can never get a zero value in a NRQL alert condition?

Not at all! If you have a data source delivering legitimate numeric zeroes, the query will return that. Let’s look at that sort of example. Imagine that myCoolAttribute is an attribute which can sometimes be equal to 0:

SELECT average(myCoolAttribute) FROM MyCoolEvent

If, in the minute that is being evaluated, there is at least one myCoolEvent event and if the average value of all myCoolAttribute attributes from that minute is equal to zero, then a 0 value will be returned, not a NULL.

However, if there are no MyCoolEvent events during that minute, then a NULL will be returned (because of the order of operations).

OK, I think I get it. Now what’s the deal with Loss of Signal and Gap Filling?

Loss of Signal (LoS) and Gap Filling (GF) allow you to determine how New Relic’s Alerts Evaluation Service handles any NULL values that are returned by your query.

Loss of Signal: If you’re using a Synthetics query like the one I used as an example above, you’ll wind up getting a violation when there is a failure or series of failures. However, the violation will seem to never close! That’s because a 0 can never be returned by a count and a NULL can’t be evaluated numerically. However, if you already understand this behavior, you can plan for it by setting up a LoS. If I set up a LoS with a 10-minute window and check the box labeled Close all current open violations, then my open violation from getting that failure will only close once I’ve had 10 minutes with no further failures.

Keep in mind that Loss of Signal needs at least one non-NULL data point to kick in. If you create a new condition (or edit and save an old condition) that has nothing but NULL values, your LoS behavior won’t kick in until after a numeric data point gets evaluated.

Gap Filling: Gap fill pretty much does as it says on the tin – when a gap is detected, it will insert the values you specify. Those can be either all 0s, all some other static value, or whatever the last numeric value reported was.

Here’s the thing with Gap Filling: it needs to detect a gap before it kicks in. That means that there need to be two numeric data points separated by some stretch of NULL data points. Until that 2nd numeric data point shows up, the stretch of NULL values will possibly become a LoS if it lasts long enough to trip the LoS setting. So you need to have a start and an end to a gap. If the gap is shorter than your LoS setting (or if you have no LoS setting), once the 2nd data point shows up, the values you’ve specified will get inserted. This could lead to a violation opening or a violation closing, but it won’t happen until that 2nd numeric data point shows up.

Phew! That was a lot to cover! If any of this is unclear, please post your questions below!