How to measure throughput and alert if it impacts latency

My service has a “Throughput” widget but I want to put that data in my dashboard. I am get any query that locates the similar data represented in the widget. Could someone recommend a query that may work to get that data?

Once I have that data, I would like to create an alert when that number goes up and impacts my latency measurement.

Lastly, are there standard queries documented somewhere on the site that a new person could reference OR a map to tell me data relationships? For example, Transaction has all of these datapoints.

1 Like

Hi there @reopelle.scott - welcome to the Explorers Hub!

We do have a well-trafficked post here that may be of use to you:

Additionally, if you haven’t had a chance yet, I would recommend checking out theInsights Data Explorer and this list of default data from New Relic products available in Insights.

Let me know if you are looking for something more specific.

1 Like

@hross Thank you for the response. I took time to look over that post and got some good information. I configured a query to track throughput as such:
SELECT count(*) from Transaction WHERE appName = 'MyApplication' since 1 hour ago COMPARE WITH 1 hour ago

Do you have any thoughts on a query or alert that could help me determine a relationship between high throughput and latency? I’m hoping someone has created a relationship of this type before.

Hey @reopelle.scott - While you could construct a query to use filter in order to grab throughput and response time, you can see from my screenshot that this does not return great results.

Query:

SELECT filter(count(*), where appName = 'myAppName' ) as 'Throughput', filter(average(duration), where appName = 'myAppName') as 'Response Time' FROM Transaction TIMESERIES since 2 hours ago

A typical expectation is for throughput to be > 30, and response time to be < 2s
As you can see - as there’s no way to have dual Y-Axis, the throughput makes it near impossible to interpret the response time from this chart.

Instead I would recommend having two widgets side by side in one dashboard. One widget with your query targeting the throughput, and another query exactly the same looking at the average(duration). If both queries are looking at the same timeframe then you’ll be able to correlate a spike in throughput to a spike in response time.

Alternately, if you remove timeseries from the query I shared above, you can get 2 numeric values, one for throughput and one for the response time, in the one widget.

Hope that helps :smiley:

2 Likes

Can you expand on what this means to you?

1 Like

What I mean by that is all of the sub-items under the Transaction Table.

I’m finding it difficult to locate items to create my queries because I don’t yet know what is collected by default and what needs to be configured on the client to be captured. This is due to my inexperience with the product. When I test some of my queries in Data Explorer, or on my dashboard, I notice that if the query does not have the items I am searching for, it does not provide an autofill when I select ‘appName =’. I’m guessing because the client has not been configured to collect that data. Does this make sense?

I may have just not found it yet, but does a document exist that outlines what is collected by default and specifics about what each field is capturing?

1 Like

This is very helpful @RyanVeitch, and I understand the scales are way out of proportion. I think I will work on creating an alert with the threshold values for each of the items. That is ultimately the goal.

Some additional background: I am an Enterprise Architect researching how to implement Site Reliability Engineering. If anyone is not familiar, my ideas are coming from this book. Site Reliability Engineering. I am starting with a single team to develop the documentation, service level indicators and objectives needed to monitor and provide feedback to our business side of the house. To date, we have been very reactive and this book has given me hope that we can change. I want to build templates, re-usable monitors, defined standard metrics that can be re-used for future products. That’s my goal anyways.

4 Likes

@reopelle.scott - Thanks for the additional context, I think alerts on your thresholds is a good place to start.

In terms of the default collected attributes, you can check out the Attribute Dictionary to see those.

You can then collect Custom Attributes to fill the gaps that our default attributes leave. Check out the docs on sending custom data to Insights here - I would encourage you however to be aware of any sensitive data that you may be sending. Our agents typically avoid collecting sensitive information by default, but custom attributes are unrestricted.

2 Likes

We all figure out at some point that apps in insights can have different collections of attributes. As you note this can be due to configuration or customization. There are a couple of simple methods for exploring attributes:

For a given application simply do like so:
Select * from Transaction where appname = 'foo' limit 100

You can then scroll right and left and see the attribute names and samples of the data being collected.

To be exacting on the attributes do like so:

Select Keyset() from Transaction since 1 week ago

I think you have a very similar goal of most new customers of New Relic. You should be able to find help along the way.

4 Likes

Sounds like @6MM has gotten you on the right path @reopelle.scott - let us know if you need further help.

@RyanVeitch I am trying to use this NRQL to relate High Transaction Rate to High Response time for the purpose of alerting.
SELECT percentage(count(*), WHERE duration > 2) FROM Transaction WHERE appName = 'MyApp'

While this defines a threshold for the percentage of durations over 2 seconds, how would put in a threshold for the average amount of transactions that happen over 1 minute.

Essentially, I want to be alerted “if transaction count is > 2000 and average duration is over 2 seconds for a 1 minute time frame.”

Composite alerts where two or more metrics states are compared is not possible at this time.

Well that is truly disappointing, but what am I seeing when I run this query ?

'SELECT average(duration) FROM Transaction WHERE duration >= 1 AND databaseDuration > 2 AND appName = ‘MyApp’

It shows a chart. Did it select one metric of the two?

You are first taking a result set that is reduced to events where the duration value and the database duration value are over or equal to 1 and over 2 seconds for your appname (for the last hour). You are then returning the average duration value from the results set. You are only measuring duration.

Well that’s not doing what I want then… :confused::-1:t3:

Hi @reopelle.scott! What a great conversation above—thank goodness for @6MM and @RyanVeitch 's super smarts, right?!

I am just checking in to see if you had any more progress to share on this? The last we heard from you is that you may not still be getting exactly what you need? Hoping to help if I can! :blush:

I’m hoping there are other suggestions on how to trigger an alert if two conditions are met.

2 Likes

If you needed to make this product do it you might be able to poll 2 or more alert states with a synthetic and compare. The result of that comparison could generate emails etc, but you could also simply fail the synthetic and alert off that or send insights events that could then be used for alerts… Or you might be able to generate insights events from alert callbacks and then use those messages to alert. None of this is very pretty and hard to scale to hundreds of conditions, but you could get to a solution if its critical.

2 Likes

Thanks @6MM. I’d be interested in how to configure the items you’ve mentioned. I’ll try to sort through your comment and configure what you identified. I’m sure there will be more questions to follow.

1 Like

Is there a method to create and Alert if a policy has 2 or 3 open incidents opened at the same time??