What is a faceted NRQL alert condition?
I’m glad you asked! Documentation on faceted NRQL alert conditions can be found at this link (scroll down to the
FACET attribute section). Essentially, faceted NRQL alert conditions allow you to separate your results by attribute and alert on each attribute independently.
Here is an example query, using Synthetic Checks to demonstrate:
SELECT count(*) FROM SyntheticCheck WHERE result = 'FAILED' FACET monitorName
This query would result in a count of 0 when monitors are passing, but would return a count of 1 when a monitor failed and would track each monitor name separately, opening a violation for each different monitor that had a failure. You would set your threshold to detect when the query returned a value over 0 so that any time a monitor failed, a violation would open.
OK, that makes sense. But what do you mean by “well-behaved?”
Faceted NRQL alert conditions have a quirk that happens when all facets are reporting no data. In this case, if all of your Synthetics monitors were passing, all facets would be returning a count of 0. When all facets are returning zeroes, the system reports
NULL values. So if one of your monitors had a failure, a violation would open as you would expect. However, once that monitor stopped failing (resulting in all facets again reporting zeroes),
NULL values would again be returned. Since a
NULL value can’t be evaluated numerically, your violation would get stuck open and would require you to manually close it.
Huh. Well, that’s annoying. What’s the solution, then?
Since this is a known behavior, we can plan for it and turn the query upside-down. Instead of querying for failures, let’s query for successes. Take a look at this modified query:
SELECT count(*) FROM SyntheticCheck WHERE result = 'SUCCESS' FACET monitorName
This will result in a bunch of facets usually reporting values of 1. Only when a failure happens will that value drop to 0. Your threshold, therefore, would watch for the query to return a value less than 1.
The great thing about this is, since most facets are nearly always returning a 1 value,
NULL values would never come into play, and when your monitor started passing its checks again (raising the value to 1), any open violation would automatically close.
Sweet! What’s the catch?
Good question! Since our system evaluates each minute discretely, in order for the method I’ve outlined to be this straightforward, you would need to make sure that each of your monitors is running a check at least once per minute (or the count will drop to 0 and result in a false positive violation).
You can still make this work if your monitors run at slower frequencies. If that were the case, you would just need to use a
sum of threshold and take into account your monitor frequency. Using the query I posted above as an example, if your Synthetic monitor’s frequency was once every 5 minutes, you could set up a threshold like
Sum of query results is less than 1 at least once in 5 minutes. This will keep a rolling total of
SUCCESS results and only open a violation if there are no successes in the last 5 minutes.
Although I used a Synthetics monitor to demonstrate this, the same principle can be used for any sort of faceted NRQL alert condition. Happy alerting!