Memory leaking only with Node.js agent installed

Any updates on this? I have a production setup for a socket.io server and I want to track my socket calls.

After adding in the node newrelic agent, I see the memory climbing steadily. As shown in the charts attached, after commenting out require('newrelic') on the commit 8371d, memory stabilizes.

I can confirm this issue still exists and since removing SSL is not an option for us, removing NewRelic was the only option. We’d love to continue using NewRelic with our Node app but we cannot while this problem exists.

In our case, we were able to determine that the specific version of memcached libraries we were using (0.2.8) contained a memory leak. Simply downgrading those libraries one version made a significant difference, to the point where it isn’t even clear that there is a problem. There may still be memory leaks but the scope is nowhere near what we were facing prior to this discovery. This is no longer blocking us.

Since New Relic resources were also attached to those leaked objects, when monitoring was on we experienced a noticeably larger leak. That is to say, enabling NR monitoring magnified the existing problem.

We initially used this approach to help identify the leaking objects: http://strongloop.com/strongblog/how-to-heap-snapshots/

1 Like

So was there a memory leak on node newrelic agent? Or was it just amplifying the leaks on application code?

In our case it was the memcached libraries that were leaking, and the NR agent was amplifying that. This might be the case at other sites as well – perhaps not our specific memcached leak, but some other leak which is also being amplified by NR (since they will be attaching objects wherever network calls are made).

2 Likes

The memory leak in its current state is believed to be an issue node core. The problem is some OpenSSL (and node) objects are being leaked on outbound HTTPS requests. This is provable against google.com (making it not a misconfiguration of newrelic servers). The problem exists always with HTTPS, but is amplified with the agent because we use custom certs on outbound connections back home and those are being leaked too.

I got pulled into some other things that needed to be released but should be back on the memory leak hunt soon. Once I’ve developed a full fix for it in core, I’ll also use that patch in our outbound connections to newrelic servers. The trick is, we need to know exactly what the patch looks like in core so we can reliably detect if we need to inject it or not for our connections, which means I can’t really apply any fix in the agent until something has been settled on for core.

1 Like

I tried updating to node 0.12.0 and that didn’t help but I can confirm disabling SSL fixes the leak. The severity of the leak also seems to depend on what other modules you require. Some make the leak worse than others.

Well, looks like turning SSL off totally stopped the reporting of metrics to New Relic when running on Heroku, sigh.

Edit: Related post:
Double edit: All good, you also need to explicitly set the port setting to 80 if ssl is false.

It’s worth noting that once I disabled SSL I also saw a significant increase in application performance under heavy concurrent loads. Even with 0.12.0, ssl on node appears to still have a lot of overhead and the ssl reporting to new relic can negatively impact your app.

Fortunately we don’t send URL params or custom metrics so turning SSL off is a viable option for us…and we will probably keep it off even if this memory leak is fixed due to the performance increases.

Is there any update on this memory leak?

Turning off clustering, or turning off SSL: neither of these are really options for us. We’re hoping to scale up a production instance of an application that’s been under development but this is really a blocking issue as we need a monitoring solution.

Turning off clustering, or turning off SSL is not an option for us either.
NewReleak…

Hi all - we are still working on this. It is an issue in Node core; however, our engineers take this seriously and are looking into what we can do here.

Essentially this update is the same as the earlier one:

I don’t have an ETA on changes here; this is important and we are actively working on it.

Is the leak still not a priority?

The leak is one of the highest priorities. That said other things do come up and sometimes we have to react to them.

If the problem was in our agent, the fix would likely be very simple, but the problem exists in node core and makes it vastly more difficult to fix for a number of reasons. Part of that is having reproduction cases that meet cores standards. Currently there is a provable leak when you hit google.com but core would like a test that doesn’t involve hitting an external server. This means recording packets and timing and trying to replay as exactly as possible to show the problem using a node based TLS server, (only some TLS servers cause the race conditions)

Another part of it is, that the race condition involves both the tls module and the net module. Neither of which is particularly simple because it is a confluence of several duplex streams (which are really 2 different streams all with their own state). Plus a couple other state orchestration objects. I am working with core folks on this issue.

Just to reiterate: the issue is with node core, the agent is using all the APIs properly and not leaking the memory itself. In fact you can find cases of other libraries running into this bug (such as aws.js) they eventually stopped using custom certs on their outbound, which makes the leak much smaller, but still exists. We are considering going that route but there would still be a leak, just slower, and we have have a few reasons why we ship our own certs instead of trusting what node ships with.

2 Likes

Hey @wraithan. Just wondering if there is an ETA for the fix for this?

Fully appreciate that it’s not an issue with the Node agent itself. However, we can’t deploy New Relic at the moment due to the leak (node 0.10.37)

@briandela_barn2door Since it’s a bug in Node.js core, we can’t provide an ETA on a fix.

However, we did make some changes to the Node.js agent in version 1.18.0 that should mitigate the impact of the Node.js core issue on the agent:

https://docs.newrelic.com/docs/release-notes/agent-release-notes/nodejs-release-notes

We recommend upgrading to this version of the agent when you have the chance.

Upgrading to 1.18.x did not help with our memory leak, it does not stop leaking.

Same problem here. We cannot put this into production until a solution is found, or we will have to investigate an alternative service.

@nzpost @kmiyashiro is it possible that either of you would be willing/able to provide us with a reproduction of your respective situations?

Is there any update on this?