We have created a performance lab which uses JMeter to create load on an ASP.Net site on Windows Server 2012/IIS 8.0. In summary, we are seeing significant performance degradation when using the New Relic .Net Agent to monitor it.
The way we do load testing is to incrementally add virtual users until the average response time exceeds 30 seconds or the error rate is significant. In our lab, we can reach about 1332 virtual users (VUs) at which point the response time exceeds 30 seconds. But when we enable the New Relic .Net Agent and the New Relic Server monitor, a few things happen at 612 VUs:
- The response time climbs dramatically eventually exceeding 30s.
- The error rate jumps dramatically, reaching over 50%. Almost all of these errors are “System.Data.SqlClient.SqlConection.Open().System.InvalidOperation: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool.”
- The CPU usage of the process increases dramatically, up to 100%
So our failure point is over 50% worse. As an experiment, we turned off the New Relic System Monitor since it isn’t as important to us and we saw similar behavior, but at 720 VUs. So it seems to be mostly related to the New Relic .Net Agent.
- How much overhead should the NR .Net Agent be putting on the web server? According to what I’ve read, this should be minimal.
- Is there any significant issue with the New Relic Agent when the CPU usage is high? We only saw these issues when the system was stuck at 100% usage.
- .Is it fairly common to use New Relic with load testing? I’ve seen a number of posts for things like Blazemeter integration (NR and BlazeMeter), people using JMeter with Ruby and Java (Page Loading Time with Apache JMeter). So it doesn’t like we’re using the tool in a way that is overly unique.
- Is there anything we can do to the agent to tune it? Maybe reduce what it records by changing something in newrelic.config?
- Why would New Relic be causing SqlConnection errors? I would not have expected New Relic to influence that. My only theory is that the system is so overloaded, it can’t process its own connections.