What's On Deck: NR AI Analytics Events

Hi folks!

Today I’d like to talk about NR AI Analytics Events, which will be released soon. We have been working on these for some time now, and they have been in beta state, so you may have already discovered them. Specifically, I’m talking about NrAiIncident and NrAiSignal. We have more events of this type planned, so you will eventually see NrAiNotification and NrAiIssue, as well.

What are they for? What information do they show?

These are primarily for alerts metadata, so that you can directly query, using NRQL, information about the behavior of your alerts.

  • NrAiIncident shows details from every incident open and close. Keep in mind that this is the newer definition of “incident,” so this will be the most granular form of alert. Read more about the terminology at this link. Documentation for this event type can be found here.
  • NrAiSignal shows details from every NRQL alert condition and every signal on your account, for every aggregation window that passes. This is data that is posted immediately after each aggregation window is aggregated and evaluated, so it will show you exactly what New Relic Alerts is seeing. Documentation for this event type can be found here.
  • NrAiNotification (not available yet) will show details from every alert notification that is sent.
  • NrAiIssue (not available yet) will show details from every Issue on your account, and will have separate records for both open, acknowledge, and close events.

What sorts of things can I do with these new event types?

I’m glad you asked! Let’s look at some possible use-cases where these event types would be valuable.

Looking for noisy alerts

By querying NrAiIncident and scoping to conditionName, we can see how often and how many incidents are opening per alert condition. This can help to pinpoint alert conditions that may be contributing to an overly noisy alerts environment. The basic query you would use for this would look like this:

SELECT count(*) FROM NrAiIncident FACET conditionName SINCE 1 month ago TIMESERIES 24 hours

This can show us when any specific alert condition had a spike of incidents, but can also highlight noisy alert conditions that may need their thresholds desensitized. You could, alternatively, use conditionId if preferred. You could also expand or contract both the time frame of the query (SINCE 1 month ago) or the granularity of the data points (TIMESERIES 24 hours), depending on your use-case.

Check for noisy/problem entities

Similar to the query posted just above, you can find any “hot spots” among your entities by faceting on entity.guid and then looking up that guid to see which entity (or entities) is opening a lot of violations:

SELECT count(*) FROM NrAiIncident FACET entity.guid SINCE 1 month ago TIMESERIES 24 hours

Checking to see what your alert condition is evaluating

Have you ever wondered why your alert condition is failing to open incidents? One of the first things you can do when this happens is to check NrAiSignal to see exactly what is being evaluated. You’d do that by using this query:

SELECT aggregatedDataPointsCount, signalValue  FROM NrAiSignal WHERE event = 'value' AND conditionId = '123456'

If you replace 123456 with the ID of the condition you’re interested in, this will show you how many data points are getting aggregated for each aggregation window, and the value that is being evaluated by the system. If you do not see any results for this, it indicates that your condition is failing to aggregate any data and that you may need to change your aggregation method or delay.

For more ideas…

Take a look at the Alerts & AI → Analyze → Overview page in the New Relic UI. All of the charts on that page were made using queries to NrAiIncident. You can view the query for any of those widgets by clicking on the ellipsis (...) and selecting View query.

I encourage you to start using these events right now. We have a bit more work to do before they are ready to release fully (non-beta), but they can already help to meet some of your use cases!

If you come up with more use-cases that can be met by querying these events, please post your ideas below.