Relic Solution: Using Insights to create the Ultimate SLA report

With the departure of legacy alerting (RIP :cry:), the legacy availability monitor is no longer configurable. This means the Uptime stat in the APM SLA report may no longer be available on new accounts and apps. While Synthetics has its own marvelous SLA report, it can be nice to see everything in the same place for easy consumption. Here is a way you can put all of that into a single Insights dashboard that can be easily shared with others!

Hold on!

Before I dive into that, there is one caveat to keep in mind.

There is a limit on the number of APM transaction events that can be gathered before the agent begins to aggregate the data. This can mean that only a sample of the total events are being sent to Insights. For most agents, that limit is 2000 events per minute, but can be raised in the agent configuration. To check the defaults and configurable options for this, check your agent’s configuration documentation and look for the transaction_events attribute (also known as analytics_events in the past).

Alright, let’s get querying!

Here is my query to replicate the APM SLA report:

SELECT (count(*)/1000) AS 'Requests (thousands)', 
(average(duration)*1000) AS 'Response Time(ms)', 
apdex(duration, t: 0.1) AS Apdex, 

percentage(count(*), WHERE apdexPerfZone = 'S') AS '% Satisfied', 
percentage(count(*), WHERE apdexPerfZone = 'T') AS '% Tolerating', 
percentage(count(*), WHERE apdexPerfZone = 'F') AS '% Frustrated' 

FROM Transaction WHERE appName = '[APP NAME]' 
FACET dateOf(timestamp) since 14 days ago limit 20

Here is my query for the Synthetics SLA report:

SELECT average(duration) as 'Duration', 
percentage(count(*), WHERE result='SUCCESS') AS 'Uptime', 
apdex((duration/1000), t:[t-value]), 

percentage(count(timestamp), 
    WHERE (duration/1000) <= [t] AND result = 'SUCCESS') 
  AS '% Satisfied', 
percentage(count(timestamp), 
    WHERE (duration/1000) > [t] AND (duration/1000) < [4t] AND result = 'SUCCESS') 
  AS '% Tolerating', 
(percentage(count(timestamp), 
    WHERE (duration/1000) >= [4t]) + percentage(count(timestamp), 
        WHERE result != 'SUCCESS')) 
  AS '% Frustrated' 

FROM SyntheticCheck WHERE monitorId = '[MONITOR ID]' 
FACET dateOf(timestamp) since 14 days ago limit 20

Simply replace the bracketed values with your own, and you will be set!

16 Likes

Hi @cwhite - I have been looking at the APM SLA query and noticed that apdexPerfZone is not an attribute on all agents, definitely not the .Net agent. Is there any plans to add this attribute for the .Net agent or a way to add this as a custom attribute in code?

Hey @stefan_garnham

It looks like apdexPerfZone is only available in the Java Agent :confounded:

I’ll send a feature request to the development teams for you though, to get it added to the remaining agents.

The apdex is calculated using transaction response times (also transaction errors, which are auto filed as frustrated apdex scores…)

If you can grab response times in code (I’m not so sure this is possible but maybe with a trigger before and after every transaction that you can grab timestamps from then do math magic…), then you should be able to make the calculation yourself in code. The formula is simple enough, as below.

So you’ll need to categorise your transaction times when you grab them, and increment the following variables with that data, assuming T = the preset apdex threshold;

Satisfied: Response time <= T
Tolerated: Response time > T & < 4T
Frustrated: Response times > 4T

Like I said, transactionErrors are auto filed as frustrated responses, so they’ll factor into the TotalSamples part of the calculation.

Hopefully this helps - but I’ll get that Feature Request filed for you as well.

1 Like

Hey @RyanVeitch

The NRQL query you have mentioned here, doesn’t seem to be plotting any data.
Also , did you get to raise the feature request for tracking apdexPerfZone metric?

@shubham.utwal - I did raise that feature idea - though I have no update on that currently.

Can you clarify which NRQL query it is that isn’t working for you?

Aplogies, it worked, so I had to give the display name for the appName filter.
Also, is there a way I can get an aggregate over a month for the %Satisfied, et al. metrics?
@RyanVeitch

Yes, you can vary the timeframe by modifying the SINCE clause

1 Like

@shubham.utwal - let us know if modifying the time range as Stefan suggested is helpful. :slight_smile:

It looks to me like the “since” is limited to “24 days ago”.

I can’t go further back than 24 days

Hi, @ethiele: It will depend on the amount of Insights data retention on your account.

1 Like

Hijacking this thread :slight_smile:
Can I use this to show a SLA report on Error percentage as well?
Thanks
Lior

Hey @lior.avni,

You should be able to modify the above SLA report to include Error Percentage. Just add the following as another line in the NRQL:

percentage(count(*), WHERE error) AS '% Errors'

Let me know if this helps.

Cheers!

3 Likes

Hi sudu
Thank you for this, but where exactly do I need to edit the NRWL for the SLA report? can you direct me to the correct place?
Thank you
Lior

Hey @lior.avni,

Something like this should work:

SELECT (count(*)/1000) AS 'Requests (thousands)', 
(average(duration)*1000) AS 'Response Time(ms)', 
apdex(duration, t: 0.1) AS Apdex, 

percentage(count(*), WHERE apdexPerfZone = 'S') AS '% Satisfied', 
percentage(count(*), WHERE apdexPerfZone = 'T') AS '% Tolerating', 
percentage(count(*), WHERE apdexPerfZone = 'F') AS '% Frustrated',

percentage(count(*), WHERE error) AS '% Errors'

FROM Transaction WHERE appName = '[APP NAME]' 
FACET dateOf(timestamp) since 14 days ago limit 20

Cheers!

2 Likes

It should be noted that the apdexPerfZone is not available for the .Net agent.

1 Like

Thanks for adding that clarifying point Stefan. :slight_smile:

1 Like