We are using newrelic agent since a few years. It allows us to trace the behaviour of our PHP applications. Not to slow down every web backends, we only deploy and enable the agent on the first host of a given pool. We know that the agent applies an overhead on the host resources and we admit that the loads of these hosts are higher than the others. However, since a few days, we do not understand why the load of our new-relic-enabled host is sometimes highly above the average of others. Most of the time, the host is pretty normal, but time-to-time, the load makes a peek twice higher.
For example : the capture below is taken from our Prometheus supervision stack and shows the load of our pool of Webservices backends. I have only kept the curves of new-relic-enabled host in green and the average of load of the 11 other backends (the average also contains the load of the new-relic-enabled host), in red.
The green curve is always above the red one, at about 1-2 points, but sometimes, the gap is increasing by 6-8 points (it doubles). Our backends are Debian 8 VM with 16 vcpu and 16GB RAM and are load-balanced so that requests are equally spread
System details :
- Kernel :
Linux prod-ws1 4.9.0-0.bpo.12-amd64 #1 SMP Debian 4.9.210-1+deb9u1~deb8u1 (2020-06-09) x86_64 GNU/Linux
- php :
php7.4-fpm 7.4.7-1+0~20200612.18+debian8~1 amd64 server-side, HTML-embedded scripting language (FPM-CGI binary)
- nginx :
nginx 1.16.0-1~jessie amd64 high performance web server
- newrelic :
newrelic-php5 220.127.116.110 amd64 The New Relic agent for PHP
PHP-module configuration : new disabled most of the tracing options to keep a low overhead :
newrelic.appname = backend1-ws
newrelic.cross_application_tracer.enabled = false
newrelic.cross_application_tracer.explain_enabled = false
newrelic.daemon.logfile = “/var/log/newrelic/newrelic-daemon.log”
newrelic.daemon.loglevel = error
newrelic.enabled = true
newrelic.license = xxxxxx
newrelic.logfile = /var/log/newrelic/php_agent.log
newrelic.loglevel = error
newrelic.process_host.display_name = backend1
newrelic.transaction_tracer.detail = 0
newrelic.transaction_tracer.enabled = false
newrelic.transaction_tracer.explain_enabled = false
Backend has not been modified manually, but services has been restarted (phpfpm, nginx). There are no errors in logfiles in /var/log/newrelic.
The VM should have enough resources left to correctly work : the capture below shows the system metrics (node-exporter) on the same time-range as previously :
Could you help us to understand what is wrong with ?