Oftentimes the New Relic Support team receives requests from users reporting that the Java agent might be responsible for unacceptable memory usage. If you suspect that the Java Agent is causing your application to run out of memory or you’ve received an
OutOfMemoryError, reviewing a heap dump from the JVM with a Memory Analysis tool can be helpful in determining a path forward. Given the size of some heapdumps, as well as security requirements in your organization, providing a heapdump to our team may not be an option. In this document, we will discuss the use of the Eclipse MAT for the purpose of determining problem areas in memory management and leak detection as it relates to the Java agent.
Memory issue types
In general, we really only have 3 different “classes” of memory issues with the Java Agent.
1) Heap size too small
This is the simplest memory issue and it occurs when a user has set their
-Xmx value low enough that adding the agent simply pushes them over that limit. For example, if a user runs with their
-Xmx set to 256MB and their application has been tuned to sit right at 250MB of heap usage, then adding the agent will almost certainly push them over the edge and into
This occurs because the agent has a fixed size of memory it requires in order to function and we do not utilize off-heap memory to avoid this issue. An undersized heap is not a super common issue for our Support team, but still worth noting.
JVM options for allocating memory:
-Xms- initial java heap size
-Xmx- maximum java heap size
2) Real memory leak (caused by application code)
This issue is relatively self-explanatory and just means that an application has always had a memory leak and it happened to occur when the agent was attached. This issue is actually pretty rare from what we can tell - but can be confirmed/denied based on the heap dump contents (more on that below).
3) Real memory leak (caused by the New Relic Java Agent)
When we run into real memory leaks caused by the agent, the causes include (but are not limited to):
- ClassLoader leaks (preventing app servers like Tomcat from freeing up memory on redeployment)
- Instrumentation that stores strong references to objects that are never freed
- Internal Java Agent services that store strong references to objects that are never freed
- Instrumentation that prevents objects from being collected (rare)
It should be noted that when we encounter one of these memory leaks it is very common that a large number of users end up running into the same issue (and it is very likely on our Support team’s radar already).
For analyzing the heapdump, we will be using the Eclipse Memory Analyzer: Eclipse Memory Analyzer Open Source Project
You may need to bump up the heap available to MAT to allow large heap dumps to be loaded
- Edit mat.app/Contents/Eclipse/MemoryAnalyzer.ini
- Change “-Xmx1024m” to “-Xmx4096m” or similar (the higher the better, depending on your machine’s available resources)
- In order to troubleshoot any
OutOfMemoryissue, we must get a heap dump. There is no other simple way to verify the cause of a memory issue without a heap dump, which will contain a snapshot of everything on the heap at the time that the dump was triggered.
There are two ways to capture a heap dump:
By setting the following JVM property and allowing the JVM to capture it automatically when an OutOfMemory error is triggered: -XX:+HeapDumpOnOutOfMemoryError
By manually triggering a heap dump with the following command: jmap -dump:file=/tmp/app_heapdump.hprof <pid>
NOTE: If using step #2, it’s important that it is run only when the heap memory utilization is at or near maximum heap usage, otherwise the heap dump might not contain enough information to troubleshoot further.
- Now that we have a heap dump to work off of we need to download it locally and load it into the Eclipse Memory Analyzer. Heap dumps will generally end with .hprof or .bin and both can be loaded by MAT:
- MAT will take a while to parse the file (depending on how large it is). Once complete, you’ll be shown the “Getting Started Wizard”. Select Leak Suspects Report and click Finish . This step is not required but the Leak Suspects Report can help save you some time when trying to find the cause of a memory leak.
- As soon as the dump is loaded, you’ll be shown the Leak Suspects page. The important things to take note of are:
The total size of the heap
If the size of the heap is way less than the
-Xmxthen it’s unlikely that this dump will be helpful. The total size should end up being very close to the maximum size of the heap if we have a memory leak and the dump was captured at the correct time.
The size of each Problem Suspect
If we have a problem suspect that is taking up 50% or more of the heap it’s very likely that we’ve found the cause of our memory leak.
The location (package) of each Problem Suspect
The package of a problem suspect can tell us quite a few different things. If the problem suspect is in “com.newrelic.*” then it’s very likely that we have a real memory leak somewhere in the agent and that this should be reviewed by New Relic Support. If the problem suspect is in a user or application’s package then more investigation will be required to figure out the cause of the leak. It is always possible that a memory leak is occurring due to an instrumentation bug that gets weaved into the application’s classes.
As an example:
The total size of the heap = 221.7MB. This is pretty small, so unless the
-Xmxis set to 256MB this would not necessarily indicate a real memory leak.
The problem suspects are only about 25% of the entire heap combined, so it’s not likely that these are significant in this case.
The fact that one of the problem suspects is a ClassLoader is interesting and could indicate a problem with instrumentation or the agent since we do make use of ClassLoaders fairly extensively. But notice that both of these are outside of the
com.newrelic.*prefix so it’s not a silver bullet at this point.
Assuming that the Leak Suspects page did not show any obvious signs of a memory leak (from our agent or otherwise) the next step to take would be to look at the Histogram (inside the red box on the left) and Dominator Tree (inside the red box on the right).
- The Histogram displays an overview of each Class in the JVM, how many instances of it have been created and how much memory they are collectively holding. It’s pretty common to see things like
byteand other primitive objects at the top of this list due to the common usage of things like Strings or objects that hold char arrays, byte arrays, etc.
The best thing you can look for here is anything that seems to be holding an abnormally high amount of heap or has millions (or more) instances of itself. Memory leaks caused by extremely large numbers of very small objects have been pretty rare in the Java Agent from what we can tell, but this would be the place to find something like that.
- The Dominator Tree displays a list of all objects in the system that are preventing other objects from being garbage collected. By default, the list is sorted by how much of the heap that a given object in the system is holding on to. These dominators are frequently going to be things like a Map, Collection, ClassLoader or a static reference in a class.
In the large majority of cases a real memory leak will appear at the top of the dominator tree so your investigation should usually start here. However, it is important to compare the relative sizes of Retained Heap for the items at the top of the dominator tree and also compare it to the configured size of the heap. For example, if the item at the top of the dominator tree is taking up 100MB but the heap is configured for 4GB then it’s very likely that this heap dump is not going to be useful for troubleshooting.
com.newrelic.agent.instrumentation.context.InstrumentationContextManager near the top of the dominator tree does not indicate a memory leak with our agent. It is not uncommon to see this class taking up 10-25MB of heap space, as we use it for caching important weave-related information. However, anything else with a
com.newrelic.agent package prefix taking up more than 20MB is generally not expected and should be reviewed more closely.
From here it should either be clear where the memory leak is (at least which classes to start looking into) or you can continue to use the dominator tree/histogram to explore the objects in the heap to try to find a probable cause area and start looking for possibilities.