NRI Deployment on Windows EKS Nodes Troubleshooting

Hi there! We are seeing the following error when deploying New Relic Infrastructure on some of our Windows nodes but not on others. The last error (“No data was populated”) repeats indefinitely until the pod is killed. Any insight you can provide into troubleshooting the issue would be much appreciated.

newrelic-infra time="2020-07-21T23:50:09-07:00" level=info msg="runtime configuration" agentUser="User Manager\\ContainerUser" component="New Relic Infrastructure Agent" executablePath="c:\\Program Files\ │
│ \New Relic\\newrelic-infra\\newrelic-infra.exe" maxProcs=1 pluginDir="[C:\\Program Files\\New Relic\\newrelic-infra\\integrations.d]"                                                                        │
│ newrelic-infra time="2020-07-21T23:50:09-07:00" level=warning msg="commands initial fetch failed" component=AgentService error="command request submission failed: Get \"https://infrastructure-command-api. │
│ newrelic.com/agent_commands/v1/commands\": dial tcp: lookup infrastructure-command-api.newrelic.com: no such host" service=newrelic-infra                                                                    │
│ newrelic-infra time="2020-07-21T23:50:09-07:00" level=info msg="Checking network connectivity..." component=AgentService service=newrelic-infra                                                              │
│ newrelic-infra time="2020-07-21T23:50:09-07:00" level=warning msg="collector endpoint not reachable, retrying" collector_url="https://infra-api.newrelic.com" component=AgentService error="Head \"https://i │
│ nfra-api.newrelic.com\": dial tcp: lookup infra-api.newrelic.com: no such host" service=newrelic-infra                                                                                                       │
│ newrelic-infra time="2020-07-21T23:50:11-07:00" level=info msg=Initializing component=AgentService elapsedTime=1.2947539s service=newrelic-infra version=1.11.40                                             │
│ newrelic-infra time="2020-07-21T23:50:11-07:00" level=info msg="New Relic Infrastructure Agent Running" component=AgentService elapsedTime=1.4565301s service=newrelic-infra                                 │
│ newrelic-infra time="2020-07-21T23:50:11-07:00" level=info msg="Starting up agent..." component=Agent                                                                                                        │
│ newrelic-infra time="2020-07-21T23:50:11-07:00" level=info msg="Agent plugin" plugin=metadata/attributes                                                                                                     │
│ newrelic-infra time="2020-07-21T23:50:11-07:00" level=info msg="Agent plugin" plugin=metadata/system                                                                                                         │
│ newrelic-infra time="2020-07-21T23:50:11-07:00" level=info msg="Agent plugin" plugin=metadata/host_aliases                                                                                                   │
│ newrelic-infra time="2020-07-21T23:50:11-07:00" level=info msg="Agent plugin" plugin=metadata/agent_config                                                                                                   │
│ newrelic-infra time="2020-07-21T23:50:11-07:00" level=info msg="Agent plugin" plugin=metadata/proxy_config
newrelic-infra time="2020-07-21T23:50:11-07:00" level=info msg="Integration info" arguments="map[]" command=metrics commandLine="[.\\bin\\nri-kubernetes.exe --metrics]" env-vars="map[CLUSTER_NAME:mbcore-a │
│ dmin ComSpec:C:\\Windows\\system32\\cmd.exe KUBERNETES_SERVICE_HOST:172.20.0.1 KUBERNETES_SERVICE_PORT:443 NRIA_CACHE_PATH:c:\\var\\cache\\nr-kubernetes\\infra-sdk-cache.json NRK8S_NODE_NAME:ip-10-210-167 │
│ -238.us-west-2.compute.internal PATH:C:\\Windows\\system32;C:\\Windows; SystemRoot:C:\\Windows VERBOSE:0]" instance=nri-kubernetes integration=com.newrelic.kubernetes interval=15 labels="map[]" os=windows │
│  prefix=integration/com.newrelic.kubernetes protocolVersion=2 workingDir="C:\\Program Files\\New Relic\\newrelic-infra\\newrelic-integrations"                                                               │
│ newrelic-infra time="2020-07-21T23:50:11-07:00" level=info msg="Starting HeartBeat sampler" component=HeartbeatSampler                                                                                       │
│ newrelic-infra time="2020-07-21T23:50:11-07:00" level=info msg="Integration health check starting" instance=nri-kubernetes integration=com.newrelic.kubernetes prefix=integration/com.newrelic.kubernetes wo │
│ rking-dir="C:\\Program Files\\New Relic\\newrelic-infra\\newrelic-integrations"                                                                                                                              │
│ newrelic-infra time="2020-07-21T23:50:11-07:00" level=info msg="connect got id" agent-guid=MTcwNTc4NXxJTkZSQXxOQXwyODQ4NTY4NzU2ODgyMTM3NDkz agent-id=2848568756882137493 component=IdentityConnectService    │
│ newrelic-infra time="2020-07-21T23:50:21-07:00" level=error msg="Integration command failed" error="exit status 1" instance=nri-kubernetes integration=com.newrelic.kubernetes prefix=integration/com.newrel │
│ ic.kubernetes stderr="time=\"2020-07-21T23:50:21-07:00\" level=panic msg=\"No data was populated\"\ntime=\"2020-07-21T23:50:21-07:00\" level=fatal msg=\"No data was populated\"\n" working-dir="C:\\Program │
│  Files\\New Relic\\newrelic-infra\\newrelic-integrations"                                                                                                                                                    │
│ newrelic-infra time="2020-07-21T23:50:21-07:00" level=info msg="Integration health check finished with some errors" instance=nri-kubernetes integration=com.newrelic.kubernetes prefix=integration/com.newre │
│ lic.kubernetes working-dir="C:\\Program Files\\New Relic\\newrelic-infra\\newrelic-integrations"                                                                                                             │
│ newrelic-infra time="2020-07-21T23:50:32-07:00" level=error msg="Integration command failed" error="exit status 1" instance=nri-kubernetes integration=com.newrelic.kubernetes prefix=integration/com.newrel │
│ ic.kubernetes stderr="time=\"2020-07-21T23:50:32-07:00\" level=panic msg=\"No data was populated\"\ntime=\"2020-07-21T23:50:32-07:00\" level=fatal msg=\"No data was populated\"\n" working-dir="C:\\Program │
│  Files\\New Relic\\newrelic-infra\\newrelic-integrations"

Hi @andrew.purdin

I believe we’ll need to see some debug output from the integration to determine the nature of the error. You can generate verbose output by running the integration directly, with something like the following:

kubectl exec -it newrelic-infra-pod -- /var/db/newrelic-infra/newrelic-integrations/bin/nri-kubernetes -verbose

Where newrelic-infra-pod would be the name of your specific pod.

Apologies @sellefson I missed your reply, we are still having this issue and that command doesn’t seem to work on a Windows pod. Response: command terminated with exit code 126.

Managed to dig in and find it manually and got a new error:

WARN[0000] Cache file (c:\var\cache\nr-kubernetes\infra-sdk-cache.json) is older than 1m0s, skipping loading from disk.
DEBU[0000] Integration “com.newrelic.kubernetes” with version 1.21.0 started
DEBU[0000] Found cached copy of “kubelet-client” stored at 2020-08-10 00:07:33 -0700 PDT
DEBU[0000] Kubelet node IP = 10.209.103.85
DEBU[0000] Discovering KSM using DNS / k8s ApiServer (default)
DEBU[0000] Found cached copy of “ksm-client” stored at 2020-08-09 23:50:24 -0700 PDT
DEBU[0000] KSM Node = 10.209.99.65
DEBU[0000] Running job: kubelet
DEBU[0000] Calling Kubelet endpoint: https://10.209.103.85:10250/pods
DEBU[0000] Calling Kubelet endpoint: https://10.209.103.85:10250/metrics/cadvisor
DEBU[0000] Calling Kubelet endpoint: https://10.209.103.85:10250/stats/summary
DEBU[0005] Calling Kubelet endpoint: https://10.209.103.85:10250/stats/summary
DEBU[0010] Job kubelet took 10.04s
DEBU[0010] populate errors:, error querying Kubelet. Get “https://10.209.103.85:10250/stats/summary”: context deadline exceeded (Client.Timeout exceeded while awaiting headers) datasource=k
ubelet phase=populate
PANI[0010] No data was populated
DEBU[0010] Integration “com.newrelic.kubernetes” exited
FATA[0010] No data was populated

Seems that querying the kubelet for /stats/summary on the node is failing for some reason

Hey @andrew.purdin I reckon we’ll need some more detailed logs from you to investigate this further! Watch out for an email from our support team for that :slight_smile:

1 Like