Relic Solution: Video - Understanding Metric Grouping Issues in 80 seconds

This post assumes you have a basic understanding of APM, and are interested to learn more about Metric Grouping Issues.

Application Performance Monitoring (APM) is for tracking the performance of applications. Fundamentally: how fast do the methods run?

New Relic Agents create metrics that time method calls and aggregate them together.

Metric Grouping Issues are often described as having “too many metric names.” This is just a symptom. MGI’s are really about the dilution of metrics. When the same metrics and transactions are given different names, their average time is diluted and the APM dashboard misrepresents the performance and bottlenecks of the application.

**Click here for a text description**

To understand why Metric Grouping Issues make APM less useful, let’s take a look at the following pseudocode example:

methodA() {
  // does stuff that takes 95 seconds
  setTransactionName( "methodA" + Random(0,95) );

methodB() {
  // does stuff that takes 5 seconds
  setTransactionName( "methodB" );

In this case we can see that methodA takes most of the time, 95 seconds. Meanwhile methodB takes significantly less time, 5 seconds. However the name for methodA is broken out into 95 different names. The metric name for methodA is not appropriately grouped. In APM this would show up as:

APM Most Time Consuming:
methodB - 5%
methodA1 - 1%
methodA2 - 1%
methodA3 - 1%
methodA4 - 1%
methodA5 - 1%

So instead of leading application developers to focus on methodA where 95% of the time in the application is actually spent, they’ll focus in on methodB , because it falsely appears to be the most time consuming. The real problem code is being diluted by a poorly designed naming scheme, which can ultimately lead to an MGI.

The best way to solve this is with careful planning and implementation of a good transaction naming scheme using newrelic.setTransactionName . Using setTransactionName , opposed to regex matching rules, will make future changes to application code much less likely to break the naming scheme.

Note that the throughput for both pages shown is roughly the same.

Additional Information on the Docs site.

This post is first in a series of Metric Grouping related posts. Part 2, Part 3.