Bundle-nrk8s-ksm CrashLoopBackOff

Hi,

The nrk8s-ksm pod is in CrashLoopBackOff

kubectl -n newrelic get pod
NAME                                                    READY   STATUS             RESTARTS        AGE
newrelic-bundle-nri-metadata-injection-d9fb57d5-wfzmw   1/1     Running            0               34m
newrelic-bundle-nrk8s-ksm-5b4c68fb67-bhsrg              1/2     CrashLoopBackOff   7 (2m50s ago)   23m
newrelic-bundle-nrk8s-kubelet-9t7zf                     2/2     Running            1 (12m ago)     14m
newrelic-bundle-nrk8s-kubelet-ks4dx                     2/2     Running            1 (12m ago)     14m
newrelic-bundle-nrk8s-kubelet-rnbtp                     2/2     Running            1               34m

Logs:

kubectl -n newrelic logs --previous newrelic-bundle-nrk8s-ksm-5b4c68fb67-bhsrg -c ksm
time="2022-05-20T13:00:19Z" level=info msg="Waiting for agent container to be ready..."
time="2022-05-20T13:00:19Z" level=info msg="New Relic Kubernetes integration Version: v3.2.0, Platform: linux/amd64, GoVersion: go1.17.9, GitCommit: 07996399732a02ea798be9df034446fbda254010, BuildDate: Mon May 16 11:56:34 UTC 2022\n"
time="2022-05-20T13:01:23Z" level=error msg="retrieving scraper data: retrieving ksm data: discovering KSM endpoints: timeout discovering endpoints"

Any idea why?

helm chart: nri-bundle-4.4.7

Hey there @dors1,

I hope you are well!

I am sorry you are experiencing trouble here. I am looping in an expert from our Infrastructure team to look at this further as it is just slightly out of my scope. With that said if there are not enough resources available or a cluster is not set up correctly, containers could begin restarting continuously, getting stuck in what’s called a “crash loop backoff.” You can see Container Restarts in New Relic’s Kubernetes dashboard, warning you that you need to address the issue.

We definitely appreciate you posting the messages from the logs as well, it will help our engineers pinpoint a cause and possible solution much faster. If you do have more questions please reach out and we will be more than happy to help. We appreciate your patience while we continue to support you as well.

Hi @dors1 ,

Is kube-state-metrics running in your cluster? We install it by default as part of our Guided Install so I’m wondering if maybe it was disabled? Did you install using Helm?

Can you provide the output of:

kubectl get pods -A | grep kube-state-metrics

1 Like

Hi @bschmitt ,
Sure, here is the output:

kubectl -n kube-system get pod -l app.kubernetes.io/name=metrics-server
NAME                                     READY   STATUS    RESTARTS   AGE
system-metrics-server-784d9647cd-f6bls   1/1     Running   0          5d4h

Here is the metrics-server svc

kubectl -n kube-system get svc -l app.kubernetes.io/name=metrics-server
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
system-metrics-server   ClusterIP   172.20.226.106   <none>        443/TCP   222d

I can also run kubectl top

kubectl -n kube-system top pod
NAME                                             CPU(cores)   MEMORY(bytes)   
aws-node-2hnqc                                   3m           54Mi            
aws-node-2n8sg                                   3m           53Mi            
...

We are using the official Kubernetes metrics-server helm chart from here

The problem might be related to the metrics-server svc name?

@dors1

The newrelic-bundle-nrk8s-ksm pod is responsible for pulling metrics from Kube State Metrics (not the Metrics Server) in your cluster. It appears that this component has not been installed.

2 Likes