Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Relic Solution: Evaluating an Absence of Data with a NRQL Alert Condition


#1

An introduction

To better optimize your Alerts experience, here’s a question that you should ask yourself – how, exactly, does a NRQL alert condition handle an absence of data?

If you’re not already familiar, the answer will be one of the following – your condition’s underlying NRQL query will return 0 (which we’re able to evaluate mathematically), or your query will return NULL (which will stop our evaluation).

There are a specific set of circumstances in which “no data” will be interpreted as one over the other, and this post can be referenced as a quick cheat-sheet so that you don’t have to throw your hands in despair and answer that question with “it depends.”

To summarize why this all matters – we can’t evaluate a NULL response against the numerical value represented by your alert condition’s threshold. Depending on whether you do or do not want an absence of data factored into the evaluation process, you’ll want to ensure that you write or configure a NRQL alert condition appropriate to the rules I’ll outline in this post.

A quick refresher on evaluation

As you may know, NRQL alert conditions are evaluated in one-minute slices. With a default evaluation offset, we’ll evaluate the output of the following query every minute:

{Your Query} SINCE 3 minutes ago UNTIL 2 minutes ago

Your query’s syntax will require the use of a compatible aggregator function that can distill any number of rows returned over an individual minute into a single numeric value (or, in a faceted query, a single numeric value for each facet).

These minute-by-minute results are then compared to the thresholds defined in your alert condition’s configuration. This is all fine and good, presupposing that an actual number is returned every minute.

In a situation where no data is returned – such as a lack of events returned by the targeted event type(s), or if a queried attribute is not present – this lack of data will be interpreted as either 0 or NULL.

Recall that we can only compare a numerical value against an alert condition’s defined Warning and Critical thresholds. So long as the condition’s underlying query returns a number, or for as long as that query’s results are interpreted as 0, evaluation will continue.

However, for any minute in which its results are interpreted as NULL, evaluation will stop. Evaluation can only restart once your query throws a number back into the mix.

(It’s important to stress that in this circumstance, evaluation is restarting and not resuming. For example, if the threshold is set to “CPU % > 80 for at least 30 minutes” and it had 28 minutes of breaching data points before encountering a NULL, a non-null will not cause evaluation to pick back up with minute 29; rather, evaluation will restart at minute 1.)

Rules governing an absence of data

I’ve teased this portion out long enough, so here it is: the specific circumstances in which {No data} = 0 and {No data} IS NULL. It’s easiest to group these rules into two categories:

  1. NRQL alert conditions without a FACET clause, and
  2. NRQL alert conditions with a FACET clause.

Let’s start by digging into the former and conclude by taking a hard look at the latter.

1. Non-faceted queries

As a baseline, you can expect any non-faceted NRQL alert condition to return a 0 in place of NULL – so long as you are using an aggregator function that is not latest() or percentage().

So why is that?

The problem with latest()

As an example, let’s say that I have the following query set as a NRQL alert condition:

SELECT latest(coolAttribute) FROM CoolEvent

In order for this query to return a number, two conditions must be met in a single minute:

  1. There must be at least one CoolEvent.
  2. That one CoolEvent must contain a value for coolAttribute.

The simple fact of the matter is that in order to pull the latest value of a specific attribute from a given event, both the event and the attribute must exist.

While there’s no way to work around this for latest(), if you’re hitting an issue in which your alert condition frequently and undesirably stops its evaluation, you may wish to reconsider whether latest() is really what you need.

For example, if I want to alert on coolAttribute, it might report two values in a single minute -– 33 at the top, and 99 at the bottom. With latest(coolAttribute), I’d only get evaluation on the 99, and I’m not going to take 33 into account at all!

Knowing this, you might wish to explore something like average(), sum(), min(), or max(). Not only will these mathematical aggregator functions return a 0 if I don’t have any coolAttributes, they’ll calculate the full minutely range of my CoolEvents to return their value.

An issue with percentage()

For this individual lesson, here’s an example percentage() function to work with:

SELECT percentage(count(*), WHERE coolAttribute > 1) FROM CoolEvent

The only circumstance in which the above query will return NULL is when there are no CoolEvents present. In that circumstance, we’d have to divide by 0 – which, as of November 26th, 2019, is not mathematically possible.

Here’s what that operation looks like:

Events that meet a specific criteria
––––––––––––––––––––––––––––––––––––
Total events overall

So when all events drop off, we end up dividing zero by zero, which subsequently returns a NULL response.

You can avoid this problem by simulating your very own percentage function. For example, you could write a query that looks like this, where x is some non-zero number:

Events that meet a specific criteria
––––––––––––––––––––––––––––––––––––
Total events overall + x

Translated to NRQL, it may look something like this:

SELECT filter(count(*), WHERE coolAttribute > 1) / (count(*) + x) FROM CoolEvent

(I have @Fidelicatessen to thank for this workaround, as there’s no way my remedial understanding of “math” could have produced any equation that remotely resembles “long division.”)

2. Faceted queries

In an absence of data, faceted NRQL alert conditions will abide by the same rules governing latest() and percentage(). But the story doesn’t end there!

As mentioned at the outset of this guide, each individual facet returned by a query will have its response evaluated independently (and with a non-baseline NRQL alert condition, up to 5,000 unique facets can be evaluated at any moment).

However, evaluation won’t happen at all if there are no facets “reporting.” Like most things in life, @Fidelicatessen summarized it best in their classic guide to faceted NRQL alert conditions:

When all facets are returning zeroes, the system reports NULL values.

Generally speaking, faceted NRQL alert conditions require a strong “signal” in order for evaluation to happen. If a signal is present (i.e. at least one facet is “alive”), evaluation will occur. If no signal is present (i.e. all facets are “dead”), then nothing will be evaluated.

As long your query returns at least one facet with a non-zero value, and as long as latest() and percentage() aren’t in use, a facet that stops reporting a numeric value will have its value interpreted as 0. However, any interruption along the way will stop evaluation, and it will only restart once the “signal” resumes.

A dead facet also has an “expiration date” of 2 hours. This is counted as the wall clock time from when the facet first ceased to report, so it’s important to keep in mind that this timer will continue to count down even if evaluation is stopped.

An example

Following the information that I’ve shared above, here’s an example scenario that you might face. Let’s say my NRQL alert condition has the following settings:

Query

SELECT average(coolAttribute) FROM CoolEvent FACET niceAttribute

Critical threshold

When the query returns a value greater than 100 at least once in 15 minutes

Understanding the information that I’ve shared above, I know that my alert condition will open a violation any unique value of niceAttribute has an average coolAttribute exceeding 100 over a single minute.

Subsequently, the violation should automatically close once that individual niceAttribute facet has a coolAttribute value below 100 for fifteen consecutive minutes.

Again, this should all work as expected, so long as at least one other facet consistently returns a non-zero value. But what happens if the whole signal drops off?

In this imaginary but all-too-real situation, let’s say the following has happened:

  1. The violating facet dropped off at 00:05:00, and it will never appear again.
  2. All other facets (i.e., the “full signal”) dropped off at 00:10:00.

Here, we would’ve been evaluating the response from the dead facet as 0 up until 00:10:00, when we began to interpret the facet as NULL (and thereby stopped the evaluation that had continued up to this point).

To help you continue to build a better understanding of facets and NULLs, let’s imagine two endings from this scenario:

A) The signal resumes at 01:30:00.
B) The signal resumes at 02:30:00.

In ending A, the evaluation that was stopped at 00:10:00 will restart at 01:30:00. Because this is before the violating facet’s two-hour expiration date, the violation will close at 01:45:00, assuming the signal continues for at least 15 minutes after 01:30:00.

However, in ending B, the facet’s expiration date has passed, and we “forget” that it ever existed. Subsequently, the initial violation is “stuck open,” and will not resolve until it is closed manually (or if a configured violation time limit expires).

Those are the rules, and if you stick to them, you will find that most Alerts mysteries can be resolved.

In conclusion...

I exerted all of my effort on the earlier paragraphs and I’m not sure how to wrap things up appropriately. All I can say is that I was trying to find a synonym for the word “problem” earlier and my computer’s thesaurus suggested that I use “gremlin,” which is great.

In summary, I hope that you can gain from this a better understanding of The Gremlin With Nulls. While other gremlins may still haunt your dreams, this information should at least get your total gremlin count to be less than nine.