NewRelic Infra Agent is launching semodule -l over and over again killing the box

I noticed on a new box I just installed the newrelic-infra agent on is running a high load.
I checked htop and hit t to view process tree and the agent keeps running semodule -l and after it exits will restart it again. If i stop newrelic-infra the problem goes away.

New Relic Infrastructure Agent version: 1.0.724

8990 root 20 0 651M 19592 5872 S 0.6 1.9 0:18.72 ├─ /usr/bin/newrelic-infra
9690 root 20 0 29156 3232 1108 R 88.4 0.3 0:04.46 │ ├─ semodule -l
9042 root 20 0 651M 19592 5872 S 0.0 1.9 0:00.02 │ ├─ /usr/bin/newrelic-infra
9034 root 20 0 651M 19592 5872 S 0.0 1.9 0:02.25 │ ├─ /usr/bin/newrelic-infra
9027 root 20 0 651M 19592 5872 S 0.0 1.9 0:03.32 │ ├─ /usr/bin/newrelic-infra
9022 root 20 0 651M 19592 5872 S 0.0 1.9 0:02.10 │ ├─ /usr/bin/newrelic-infra
9021 root 20 0 651M 19592 5872 S 0.0 1.9 0:02.21 │ ├─ /usr/bin/newrelic-infra
9018 root 20 0 651M 19592 5872 S 0.0 1.9 0:02.55 │ ├─ /usr/bin/newrelic-infra

@adam.stracener,

Thank you for making us aware of this. I would like to work with you to determine the cause and see if we can’t get your load reduced. I have opened a ticket sot that we can discuss the details of this host with you.

Please let me know if you have any difficulty accessing the ticket.

Hello,

Any hints on how to solve this?

Thanks,
Seb.

@sebosp

We are tracking this issue and are working on finding the cause and coming up with a long-term fix for it. However, for now, something you could try would be to add this line to your config file, then restart the service:

selinux_interval_sec: 120

After setting this and restarting the service, I would appreciate it if you could monitor htop again and let us know if the behavior is changed at all. This may help other community members in the future in case they also encounter this problem.

Sorry I haven’t replied to this but I stopped using the Infrastructure Agent due to this problem. Just started using it again and still having the issue… but nice to see you have a “fix”.

EDIT: I just tested and same issue the semodule -l process continues to run over and over killing my boxes. Oddly enough it doesn’t happen on every box.

We are very interested in fixing problems that crop up with our products, but it’s difficult to fix a problem if we can’t reproduce it. At this point, we haven’t been able to reproduce this problem on our own systems.

If anyone who is experiencing this problem would be willing to dig into it with us, let me know on this thread and we can open a ticket to try to find what’s causing it, any possible workarounds, and get it reproduced on our own systems so that we can squash it!

I have not seen the selinux_interval_sec option in the Configuration Docs. Are there other options that are not documented?

@adam.stracener,

There are several config options that are not documented, most of which are used in development and troubleshooting. These config settings generally aren’t shared because they would, in many cases, lead to unsupported configurations for the agent.

Alright so I see what’s going on here.
In /var/db/newrelic-infra/data/config there is a file called selinux-modules.json which of course is a record of all the semodules that are installed. I made a copy of the file, waited until i saw semodule -l get ran, removed a semodule and ran a diff.
diff --brief --side-by-side selinux-modules.json selinux-old and they differed after.

There are a few other files in there that record would boolens are on or off and a few others.
So it seems maybe the default selinux refresh interval is set in a way that can kill smaller boxes.

Any chance you could still share the settings? I like to test and fiddle, of course i know it won’t be supported.
I assume the newrelic-infra agent is not open source?

Thanks for keeping us all updated with your findings so far, @adam.stracener! There isn’t any more we can share at this point besides the options that have already been mentioned above. If there were less risky options that you could experiment with, we would share them.

Your enthusiasm for getting to the bottom of this is contagious! I want to loop in our Infrastructure Product Owner @kirani to see if she has any ideas on who or what can continue helping you in your quest.

Of course in the meantime (and beyond), please feel free to continue updating us with your milestones! :blush:

2 Likes

So… I didn’t know this was a problem, and wasn’t having much trouble at all until just recently. Now, on the same server it had been working fine on for a while, this is eating up the cpu on our T2.medium linux box. Is there really no alternative to turning it off? with server monitoring going away leaving us just infrastructure, and infrastructure causing these problems… we’re about to be in a tight spot without this service being able to monitor our server without driving it into the ground.

newrelic-infra-1.0.724-1

We just updated to newrelic-infra-1.0.785-1, will respond later on results.

Hi Kevin,

Please do let us know if you still have an issue after updating to version 1.0.785.

Thank you,

Paul

It does seem to be working better. I can still see semodule pop up on the server from time to time using 60% or more cpu, but we aren’t having problems like we were before.

1 Like

I’m glad to hear it! If this does become an issue for you, feel free to reach back out.

Has this been dealt with? I’m having the exact same issue on some AWS machines running Centos. I had to stop the agents because it is unworkable.
The agent version is 1.0.808.

Hello Mario,

I would encourage you to file a support ticket on this, so we can take a look.

Regards,

Paul

Hi Paul,

Yes, that would seem the logical way forward. However, we are only briefly evaluating NewRelic to see if we should add it to our shortlist of monitoring applications for Cloud. I really don’t have time to start a support call and do debugging.
From my standpoint it’s strange that I bump into this issue. We have made no modifications to the agent’s config and we aren’t running some exotic platform. This is a straight out-of-the-box installation that kills some of our servers. Surely this issue must have been reported before?

Hello Mario,

Disabling SELinux will mitigate the problem. You can also add this option to your newrelic-infra.yml file which may also mitigate it.

selinux_interval_sec: 120

Regards,

Paul

Hi Paul,

SELinux is set to permissive, which doesn’t solve the issue. Disabling it is not an option.
I’ll try using the parameter you propose.

I can confirm that setting the option “selinux_interval_sec: 120” fixes the high CPU load.
What is the default value of this option?

1 Like