NRI Flex doesn't work with Discovery

Hello, this is my first attempt to run NR Flex. We’ve got newrelic/infrastructure-k8s:2.9.0 docker running as daemonset in our EKS. It’s working fine for past few months. I also run some examples for flex and those are running fine too. The problem, though, is that discovery doesn’t work. Here’s my config:

$ cat /etc/newrelic-infra/integrations.d/nri-flex.yaml
---
discovery:
  command:
    exec: /var/db/newrelic-infra/nri-discovery-kubernetes --port 10250 --tls --namespaces riak
    match:
      image: /my-riak/
integrations:
  - name: nri-flex
    config:
      name: MyRiak
      apis:
        - event_type: MySample
          url: http://${discovery.ip}:8098/stats
          jq: ".mem_allocated"

When I run “/var/db/newrelic-infra/nri-discovery-kubernetes --port 10250 --tls --namespaces riak” directly on the host, it works (returns json with pods info). But when I run

/var/db/newrelic-infra/newrelic-integrations/bin/nri-flex --verbose --pretty --config_file /etc/newrelic-infra/integrations.d/nri-flex.yaml --structured_logs

I get:

{"level":"debug","msg":"sending GET request to http://${discovery.ip}:8098/stats","time":"2022-05-05T07:59:34Z"}
{"err":"parse \"http://${discovery.ip}:8098/stats\": invalid character \"{\" in host name","level":"debug","msg":"http: error","time":"2022-05-05T07:59:34Z"}

Flex version:

name: com.newrelic.nri-flex,
protocol_version: 3,
integration_version: 1.4.3,

What am I missing?

Hey there @Boguslaw.Kalka,

While I am not familiar with this error or why you are receiving it I did find some documentation that may be able to help you further: https://github.com/newrelic/nri-flex/blob/master/docs/basics/k8s_configure.md. I am also looping in one of our Infrastructure Engineers to help take a look at this as well and they will reply to this post when we have an answer.

If the documentation helped you solve your issue we would love to hear it. Please feel free to also ask any other questions you may have and we will continue to help out! Have a good weekend!

Hey! I’ve already went through this instruction, it works allright for local services (the ones on the host) but it doesn’t cover discovery feature - to find containers running on host. Discovery is documented here: Container auto-discovery for on-host integrations | New Relic Documentation - and my config is almost copy-paste from this doc.

Hey there @Boguslaw.Kalka,

So this looks like a limitation with the Flex itself as your nri-kubernetes-discovery is working fine you run it manually. Flex is not reading the declared environment variable from anywhere, and is instead just trying to query the raw value itself as a kind of string:

{“err”:“parse “http://${discovery.ip}:8098/stats”: invalid character “{” in host name”,“level”:“debug”,“msg”:“http: error”,“time”:“2022-05-05T07:59:34Z”}

You may have luck lodging this with the Flex team here: * Issues · newrelic/nri-flex · GitHub. I can also create a formal case with our support engineers on your behalf. Please let me know which direction you would like to take and I will be more than happy to help you further.

I hope you have a great day!

Thank you @michaelfrederick I have created support case and hope to get answer there.

Hello @Boguslaw.Kalka ,

I have submitted a feature request to our Engineering team for review. As of now, the Flex integration has a limitation on not being able to read the discovery environment variable within Kubernetes.

While we can’t guarantee when or if this feature will be implemented, we take customer requests very seriously and use them to prioritize which features we implement next.

You can track the feature request with our Engineering team via Github:
Github Feature Request

I hope this helps.

Hi @Boguslaw.Kalka, the k8s discovery capability is limited to the integrations executed by the infrastructure agent (it won’t work when calling the nri-flex binary directly). Have you tried that? If so, and not working, you’ll need to enable infra-agent logs and see what hints appear there.

Hi @jmore yes I tried that, and was not able to get any information from infra-agent logs that could point me in the right direction. Currently, without possibility to run debug of this tool in a standalone way, it’s of no use to us, as it involves too much work to make it working (ultimately I reverted to just “wiresharking” network packets to see what’s happening :slight_smile: ).
We just decided to go with old approach and use new relic java agent for the time being, only to have app up&running (downside of having to install jre inside container only for this agent).

@cconde Thank you for creating feature request. It was already closed without resolving, unfortunately.

What possibly could be fixed, is documentation, specifically this document: nri-flex/discovery.md at master · newrelic/nri-flex · GitHub as it points to " to use the improved, fully-supported container auto-discovery" that is not backwards compatible and doesn’t provide a way to debug NRI-Flex in a standalone way.

@jmore I might be wrong, correct me if I’m wrong, currently debugging nri-flex means having only possibility to create config, run it as part as infra-agent, and to search for hints in logs file. Might be good idea to provide any mock configuration to run&debug nri-flex as a standalone app (e.g. providing other variables via ENV or config file).