We’ve been spending a lot of time on this recently, and it looks like there are actually two separate issues here:
- In some applications with Ruby 2.1, the Ruby agent triggers higher, but stable memory usage while the application is under load, as @richard_schneeman found.
- Applications that are idle, and using Ruby agent versions 3.9.4 - 3.9.6 show a steady growth in memory usage over time.
We believe we’ve isolated the cause for the second issue, but we’re still working on fully understanding the first.
The second issue is related to how the Ruby agent communicates with our servers. By default, we’ll initiate a new SSL connection to our servers every minute in order to submit data. In the process of establishing a new SSL connection, the openssl library makes a number of native allocations that do not go through Ruby’s malloc wrapper, and thus don’t contribute to triggering a garbage collection run. The bulk of these allocations are associated with parsing the
.pem file that we ship with the agent that contains the set of root CA certificates that we use to validate the certificate presented by our servers when establishing an SSL connection. Changes we made to this
.pem file in version 3.9.4 of the agent exacerbated this problem.
These allocations don’t seem to be a true leak, in the sense that they are released when the associated Ruby objects that reference them are garbage collected, but the problem is that in an idle application that’s not servicing any requests, Ruby doesn’t see the allocations for these objects, and thus doesn’t realize that it should trigger GC.
Starting in version 3.9.7 of the Ruby agent (due to ship soon), we will be instead maintaining a persistent SSL connection to our servers for as long as possible in order to work around this issue (we’re also planning to cache the results of parsing the
.pem file in case the connection does get severed, but that likely won’t make it in to 3.9.7).
If you want to try this behavior today, you can easily do so by adding
aggressive_keepalive: true to your
newrelic.yml file, or running:
heroku config:set NEW_RELIC_AGGRESSIVE_KEEPALIVE=1
As to the first issue of the stable-but-high memory usage for some apps using the Ruby agent on Ruby 2.1, we’re still trying to figure out how to best address this. You may have some success in forcing major GC runs to be triggered more frequently by lowering the value of the
RUBY_GC_OLDMALLOC_LIMIT_MAX, at the expense of the increased overhead of doing those additional GC runs.
Part of the difficulty in understanding these issues is that every application triggers various code paths in the agent with differing frequencies. In our testing, we’ve not yet been able to reproduce deltas as extreme as @richard_schneeman’s application has, but I suspect this is just a matter of us not triggering the right code paths in the agent.
That said, the investigation that we’ve done so far suggests that the most expensive feature of the agent from a memory perspective is our transaction trace feature. If you want to see whether that’s the case for your application, you can try setting the
NEW_RELIC_TRANSACTION_TRACER_ENABLED environment variable to
false. If you try this, we’d be very interested to know the results for your application.