Pods are stuck in terminating state after installation of kubernetes integration


We are getting issues in our Kubernetes cluster when installing the new relic kubernetes integration.

When it is installed, deletion of other pods is problematic, we are getting the following error:

error determining status: rpc error: code = Unknown desc = failed to get sandbox ip: check network namespace closed: remove netns: unlinkat /var/run/netns/cni-ed5f7bb6a276-0694-b596-1e2c3b630aa7: device or resource busy

and pods are stuck in terminating-state.

If we uninstall new relic, everything is back to normal (pods stuck in terminating state are removed).

Seems like newrelic is blocking something that is not possible to delete, unless we also uninstall newrelic.

We are running kubernetes 1.23.

Other information about the environment is that we are using Calico and VXLan.

Anyone else experiencing similar problems?

Hi @jesper.mansson

Thanks for reaching out, I hope you are well.

This sounds like a tricky one, perhaps we can have some basic checks done first;

  1. I would suggest following along our automated installer guide to make sure nothing is getting overlooked Kubernetes integration: install and configure | New Relic Documentation.

  2. I would also request you ensure you meet the Kubernetes integration: compatibility and requirements | New Relic Documentation.

  3. Lastly we also have a troubleshoot guide here, that may be usful, Kubernetes integration troubleshooting: Error messages | New Relic Documentation.

Please let me know if any of these options were helpful, or should you have any updates, questions and fixes.

We have deployed the New Relic integration with Helm 3 using the instructions given to us with Add more data. No errors where found during the installation.

The Linux distro used is supported by New Relic but we are using a newer version of Kubernetes than 1.22.

We are using Kubernetes version 1.23 which has been out for about 5 months now. Is New Relic planning on official support for it?

We noticed that without VXLan it is working.

Hey there @jesper.mansson,

It looks like you currently have a workaround which is good! I followed up with our engineering team and they are working on support for Kubernetes version 1.23. When we have an update on when we can expect that to be released we will follow up and let you know.

Please let us know if there is anything else we can help with in the mean time as well. I hope you have a great day!

We need to use VXLan in our production environment so we decided to remove the New Relic integration from prod for now and keep on troubleshooting it in our test environment.

Hey @jesper.mansson,

Thank you for the update and I am sorry you had to remove the integration for the time being. Please let us know if we can help with anything else.

We seem to have it working now.

We noticed that when deploying new apps to our cluster we where able to remove them without facing the issue. Only the apps deployed before we added the New Relic integration seems to be affected.

After removing all our apps from the cluster, uninstalling New Relic and installing it again we are not experiencing this issue anymore.

1 Like

Hey @jesper.mansson,

Glad to hear you were able to resolve this, thank you for letting us know how you accomplished this as well. Please let us know if we can help with anything in the future!

1 Like