Can't fetch discovery items - postgresql integration on kubernetes

I’m facing an issue with the latest newrelic image 1.23.0, I couldn’t integrate the postgresql database with newrelic, which is running on kubernetes. getting the below error

time=“2020-06-22T10:57:28Z” level=error msg=“can’t fetch discovery items” component=integrations.runner.Group env=production error=“2020/06/22 10:57:28 failed to connect to Kubernetes: failed to execute request against kubelet: Get http://ip-10-186-210-110.dev.admin.aws.io:10255/pods: dial tcp: lookup ip-10-186-210-110.dev.admin.aws.io on 100.64.0.10:53: no such host \nexit status 2” integration_name=nri-postgresql role=postgresql

Hi @S.boopathy - I believe this issue is being dealt with inside a ticket you submitted with us (#409147). Once we work through the ticket I will share our findings here with the community.

HTH.

We have the same issue with the nginx integration. Have you found a solution?

This is because the readonly port is by default disabled in recent kubelet configurations: See the discussion here:

The problem is the default behavior of the nri-discovery-kubernetes has changed according to this pr

The solution is to edit the the nri-integration-cfg so it includes the two flags --tls --port 10250

Full sample:

apiVersion: v1
kind: ConfigMap
metadata:
  name: nri-integration-cfg # aimed to be safely overridden by users
  namespace: monitoring
data:
  nginx-config.yml: |
    ---
    # Run auto discovery to find pods with label "app=nginx"
    # https://docs.newrelic.com/docs/integrations/host-integrations/installation/container-auto-discovery
    discovery:
      command:
        # Run NRI Discovery for Kubernetes
        # https://github.com/newrelic/nri-discovery-kubernetes
        exec: /var/db/newrelic-infra/nri-discovery-kubernetes --tls --port 10250
        match:
          label.app: nginx
    integrations:
      - name: nri-nginx
        env:
          # If you're using ngx_http_api_module be certain to use the full path up to and including the version number
          # Use the discovered IP as the host address
          STATUS_URL: http://${discovery.ip}/status
          # Comma separated list of ngx_http_api_module, NON PARAMETERIZED, Endpoints
          # endpoints: /nginx,/processes,/connections,/ssl,/slabs,/http,/http/requests,/http/server_zones,/http/caches,/http/upstreams,/http/keyvals,/stream,/stream/server_zones,/stream/upstreams,/stream/keyvals,/stream/zone_sync
          # Name of Nginx status module OHI is to query against. discover | ngx_http_stub_status_module | ngx_http_status_module | ngx_http_api_module
          STATUS_MODULE: discover
          METRICS: 1

I am also getting the same error “failed to execute request against kubelet”.

time=“2020-07-13T09:04:53Z” level=error msg=“can’t fetch discovery items” component=integrations.runner.Group env=staging error=“2020/07/13 09:04:53 failed to connect to Kubernetes: failed to execute request against kubelet: Get https://ip-10-5-7-95.ap-south-1.compute.internal:10250/pods: dial tcp: lookup ip-10-5-7-95.ap-south-1.compute.internal on 172.20.0.10:53: no such host \nexit status 2” integration_name=nri-jmx

But even after adding --tls --port 10250 or --tls --port 10255, its not getting resolved. I am on AWS EKS. Can someone please help?

This error in particular seems to indicate that the kube-dns isn’t properly resolving the hostname for your worker node. Can you confirm that the record resolves with something like the following?

dig ip-10-5-7-95.ap-south-1.compute.internal @172.20.0.10

dig command not present on the infra-agent pod (and ant other pod). Can I get some other log?

Below is the output of nslookup, executed on infra-agent:

bash-5.0# nslookup ip-10-5-7-95.ap-south-1.compute.internal
Server: 172.20.0.10
Address: 172.20.0.10:53

** server can’t find ip-10-5-7-95.ap-south-1.compute.internal: NXDOMAIN

** server can’t find ip-10-5-7-95.ap-south-1.compute.internal: NXDOMAIN

I got what the issue is:
The infra agent container is not able to communicate with the node with its name (ip-10-5-7-95.ap-south-1.compute.internal). But it is working fine when I update the environment variable: NRK8S_NODE_NAME to use the ip of the node (status.hostIP).
The issue here is, when I update this variable in the infra-agent’s yaml file, k8s metrics are not being exported.

Looks like this is the issue with the nri-discovery-kubernetes code itself.
Because in the infra-agent logs, I see in lot of places the communication with kubelet is happening with the IP address of the node (Calling Kubelet endpoint: https://10.5.7.95:10250/pods). But only nri-discovery-kubernetes is using the hostname for communication.

Can you please give us any workaround for this problem?
logs.txt (2.8 MB)

Hi @vamsikrishna.m

I’ve brought this up with our product engineers, and we’re currently investigating how the Discovery plugin can be improved. I’ll have more information as this lead develops.

3 Likes

I am getting the same error message on JMX integration

Hi @vamsikrishna.m

I’m not sure what your specific DNS server configs are, but after spending some time troubleshooting this with my colleague @basma.asaad and the team, we discovered what was causing our nslookup issues for our jmx k8s integration.

The DNS forwarders that we have configured for our coredns deployments actually had both public and private IP addresses, this configuration was causing the internal lookups to fail. Once we switched to a set of internal DNS servers we were able to get resolution working again for the newrelic pod. Hope this helps you work around the issue!

@marvin.matos1 Thank you for sharing how your team resolved this issue. Very curious to see if it will also work for other community members :slight_smile:

So it looks like a fix for the nri-discovery-kubernetes module has been merged but not sure how long that takes to roll out? https://github.com/newrelic/nri-discovery-kubernetes/issues/17 shows resolved now though.

@toby.broyles Thanks for the update! It’s much appreciated. We’ll be sure to keep an eye on this on our end!