Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Decreased PHP Agent performance when Meltdown mitigation is applied


#1

Please share your agent version and other relevant versions below:

NR PHP Agent 7.7.0.203, PHP version 7.0.27

Please share your question/describe your issue below. Include any screenshots that may help us understand your question:

We run a fleet of Amazon Linux servers in AWS, on the m4.large instance type. Our applications run in docker (ECS) using the official php-apache image from docker.

We recently upgraded to the latest Amazon Linux AMI (ECS-Optimized) that included the patches for the Meltdown vulnerability. After doing this, we saw a 2x-6x response time penalty in most of our web applications. It seems like the most impacted applications are the ones that run on top of larger frameworks (primarily Zend Framework 1, Zend Framework 2, and Doctrine 1). The application that was affected the most (going from about 250ms average response time to over 1500ms) actually loads both Zend Framework 1 AND Zend Framework 2.

Two of our web applications that don’t load any of these large frameworks seem to be impacted very minimally.

We found two solutions to fixing this problem (other than the obvious, rolling back to the previous AMI build), both of which appear to get us back to the pre-upgrade response times:

  1. Upgrade to m5.large (I assume this is due to newer CPUs being able to perform a more efficient Meltdown mitigation)
  2. Remove the New Relic PHP Agent

Option #1 isn’t a viable option for us currently because of the reservations that we have locked in with AWS. However, after some digging, it turned out that the New Relic PHP Agent was the main culprit behind our response time increase, when running on m4.large instances. I just wanted to report our findings with this, and see if anyone else had experienced similar issues.


Post starting newrelic agent, slow down the site
APM CPU Usage increases 15-30% on ec2s vs non apm ec2
#2

Hello,

We’ve seen decreased performance after the Meltdown patch in some environments which have a clocksource that does not support vDSO:

http://man7.org/linux/man-pages/man7/vdso.7.html

This is for all “gettimeofday” calls on the system, not just those made by New Relic.

One workaround may be to switch to a clocksource that does support vDSO such as “tsc”. It is possible that your old AMI was using this and the new AMI switched to “xen”.


Severe performance degradation with PHP APM agent installed on Vultr host
#3

Hi!

We experienced similar problem and now we have turned off the new relic.
Our app is running on AWS Elastic beanstalk 8 t2.large instances. I attached the screenshot with CPU usage when new relic is installed and when it is not. Average cpu load dropped from 20-25% to 3-4% after turning new relic off.


#4

@marcis Just out of curiosity, what is the impact on response times?


#5

Hi!

I cannot give you exact improvement (as all the data on new relic disappears after 2 days without pro acc), but I guess it became 2-3 faster after removing new relic.


#6

As the fixes for Meltdown are OS fixes to protect the CPU, any CPU intensive computations will automatically decrease performance. The only way to identify this is to run your systems without New Relic installed to get the performance impact of the Meltdown patch. The comparison can then be completed with New Relic installed to get the impact, if any, on the patch and New Relic agent. Depending on the capacity of the design of the architecture pre-Meltdown patch additional servers may need to be added to maintain service levels.

At least this is the approach that we had to take as part of our risk analysis.


#7

If you are unable to change your clocksource to one that supports vDSO, one workaround is to disable the transaction trace feature. This makes a number of “gettimeofday” calls to get timings of segments in a trace which have become more resource intensive after the Meltdown patch. To disable this feature, in your newrelic.ini set the following:

newrelic.transaction_tracer.enabled = false

newrelic.transaction_tracer.detail = 0

Then restart your PHP handler to pick up the change. This should reduce resource usage on systems where vDSO is not available or supported.


Severe performance degradation with PHP APM agent installed on Vultr host
#8

I see severe performance degradation on Ubuntu 18.04, DigitalOcean, clock source is tsc. PHP7.2. Latest version of NewRelic agent. The notch at 21:18 on the attached graph is when I set newrelic.enabled = false in newrelic.ini, and restarted php-7.2-fpm. Results are repeatable. CPU load went down from ~50% to ~25%.


#9

TSC as the clock source is part of it but your system may still not support vDSO.

To confirm if you have vDSO enabled it should not show up in an strace. With the Agent enabled, this command should not show any output if vDSO is enabled:

strace php -m 2>&1 | grep gettimeofday

Note that while this does affect the New Relic PHP Agent quite a bit because most of what we do is timing things, this affects all system calls like this and enabling vDSO can improve performance post-Meltdown patch for processes not monitored by the Agent as well as non-PHP processes.