[PHP] randomly crash with the NR agent enabled

We have a couple of Wordpress sites. Pretty standard configuration with the boilerplate generated using Bedrock and deployed using Trellis on AWS EC2.

With the NewRelic PHP Agent installed and enabled, the HTTP requests toward the Wordpress built-in REST API endpoints would randomly crash the php-fpm workers and return 502 bad gateway errors. The chance of that happening is probably around 1 in 20 or so.
Other regular web pages that return HTML are somehow not affected, only the REST API endpoints that return JSON.

When I disable the NR agent from the php conf, the issue would be gone.

I’ve also tried setting newrelic.browser_monitoring.auto_instrument to false , but it didn’t help.

Not much information can be found in the error logs. The log that indicates a crash is found here /var/log/apport.log

ERROR: apport (pid 2046127) Fri Apr 29 19:59:10 2022: called for pid 2045950, signal 11, core limit 0, dump mode 2
ERROR: apport (pid 2046127) Fri Apr 29 19:59:10 2022: not creating core for pid with dump mode of 2
ERROR: apport (pid 2046127) Fri Apr 29 19:59:10 2022: executable: /usr/sbin/php-fpm7.4 (command line "php-fpm:\ pool\ wordpress")
ERROR: apport (pid 2046127) Fri Apr 29 19:59:10 2022: is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment
ERROR: apport (pid 2046127) Fri Apr 29 19:59:10 2022: apport: report /var/crash/_usr_sbin_php-fpm7.4.0.crash already exists and unseen, doing nothing to avoid disk usage DoS

I can provide the dump file if it’s useful.

=============

Please provide information on your environment, and any further information you believe is relevant, such as the below:

Additional Resources

Please run New Relic Diagnostics on your applications and share the results here.

[REDACTED BY NEW RELIC]

Hello @robert.h,

Welcome to the Explorers Hub, I hope you are well!

We want you to know that we see your question and are working to get you the best support. While your question is a bit out of my scope I am looping in an expert from our PHP team to help answer your question. We appreciate your patience while we provide continued support with your issue.

1 Like

Sounds good. Thanks for the update!

Hi @robert.h

Just touching base to let you know that the engineer team are working on this. They will reach out here with their findings.

Please feel free to share any new findings or updates you have.

Hi @robert.h ,

Can you confirm whether or not you can see this crash occur in a staging or development environment? There are some changes we’d like to suggest to determine if there are any changes with the behavior you outlined. Can you confirm if the problem still occurs when downgrading to agent version 9.18.1.303?

Also, are you using opcache? And if so, can you see an improvement when disabling opcache?

Hi @tlugo , thanks for looking into this.

We only have the agent installed in prod. So we didn’t really think about reproducing the issue in a staging or dev environment. We will try installing the agent and see if we can see the crash there.

Regarding opcache, yes, we have it enabled. We will also try turning it off.

I’ll get back to you with the test results maybe in a day or two.

All right, here are our findings with some more tests.

  • Disabling opcache does resolve the issue.
  • Downgrading to agent version 9.18.1.303 doesn’t really help. The site still exhibits the same behavior.
  • And my apology, but need to correct the original description of the observed behavior

This is actually not true. After testing it in a staging environment, we noticed that it does not just affect the REST API endpoints. The fact is… The crash only randomly occurs when at least 2 concurrent requests are being processed. It never crashes when we test sending sequential requests.

The reason why we neglected that is largely due to the access pattern of the site. It runs a decoupled CMS. It has a user-facing frontend. And at the same time, exposes its REST API to power another site. Its own frontend, at the moment, just gets fairly light traffic. So rarely gets multiple concurrent requests. On the contrary, the REST API gets very bursty traffic from the other site. That’s why initially we only noticed the crashes in the logs related to REST API requests but not the HTML page requests.

1 Like

Hi @robert.h

Thanks you for reaching back out!

I notice that you mention the issue has been solved by disabling “opcache” worked but downgrading the agent has not helped. I will be sure to highlight this to the engineer team.

Have the crashed stopped since you applied the changes ? Apologies this is out of my scope, should you require additional support can you confirm, and I will be sure to loop in the team to help!

Hi @dcody

Thanks for the follow-up. We have yet to deploy the workaround to prod. Ideally, we would still love to have both the NewRelic agent and opcache enabled. Meanwhile, most likely we will roll out the change this week after evaluating the potential impact to the site speed and stuff. I’ll post an update after seeing the result in prod.

Hi @robert.h

Thanks for the reply here, its greatly appreciated.

Please let us know once the update was made and if the workaround was successful. From there we can proceed with support.

The change has been made to prod for a couple of days now. Examining the logs indicates that this workaround has fixed the issue.

Appreciate your assistance!

Thank you for the update!

Have a great day!