[Ruby] Why have Elasticsearch requests disappeared from "External Services"?

In our rails app, requests to our elasticsearch cluster have historically accounted for a high percentage of our outgoing HTTP traffic. Until recently, requests to it would appear near the top of the list at “External Services.”

Now, it’s not found there at all. I can kind of find my way to that “external service” item by going to “Transactions”, selecting a slow transaction, seeing the request to Elasticsearch at the top of that (it usually accounts for a high percentage of those transactions), and clicking that link. At that point I’m sent to “External Services”, and I’m looking at graphs of response time and throughput for requests to the Elasticsearch cluster. I can see at the top of those charts the throughput (CPM), which should place it easily at #1 on the list—but it doesn’t appear in the list at all.

Hi, @brian38

Welcome to Explorer’s Hub. I am not aware of anything recently changed with how we observe Elastic Search nor External segments.

  1. Did the behavior change occur as part of upgrading the Ruby agent? If so, what version did you upgrade from/to?

  2. Were any settings recently changed such as turning on/off Distributed Tracing or setting transaction_tracer.attributes.enabled or attributes.enabled (and other similar settings) to false?

  1. Did the behavior change occur as part of upgrading the Ruby agent? If so, what version did you upgrade from/to?

This is possible. In November, we upgraded gem newrelic_rpm from 3.18.1.330 to 6.14.0… which I realize is an enormous gap. Unfortunately, that’s long enough ago that I can’t compare data in Newrelic before and after.

  1. Were any settings recently changed such as turning on/off Distributed Tracing or setting transaction_tracer.attributes.enabled or attributes.enabled (and other similar settings) to false ?

No, we haven’t changed any config options in a very long time.

I suppose we could go back to 3.18.1.330, deploy to prod, see if elasticsearch requests begin to appear again, and start bisecting the gap to see if some version caused the change. But that could be time-consuming. Any other ideas?

@brian38

That’s definitely quite a big upgrade!

Could you share a link to your application in New Relic One? Only those on your account and New Relic Admins will be able to view your application. To create a permalink within the New Relic, you can use the ‘Copy a short permalink’ button in the top right corner under your user profile or share the full URL from any page.

I would be happy to take a look at what you’re experiencing and seeing what might be going on with Elastic Search.

Michael

Here you go: https://one.newrelic.com/launcher/nr1-core.explorer?pane=eyJuZXJkbGV0SWQiOiJhcG0tbmVyZGxldHMub3ZlcnZpZXciLCJlbnRpdHlJZCI6Ik5UazNOVFI4UVZCTmZFRlFVRXhKUTBGVVNVOU9mREkxTlRnMU5qZyJ9&sidebars[0]=eyJuZXJkbGV0SWQiOiJucjEtY29yZS5hY3Rpb25zIiwiZW50aXR5SWQiOiJOVGszTlRSOFFWQk5mRUZRVUV4SlEwRlVTVTlPZkRJMU5UZzFOamciLCJzZWxlY3RlZE5lcmRsZXQiOnsibmVyZGxldElkIjoiYXBtLW5lcmRsZXRzLm92ZXJ2aWV3In19&platform[accountId]=59754&platform[timeRange][duration]=1800000&platform[$isFallbackTimeRange]=true

Hello @brian38!
So in taking a look at this, I noticed that elasticsearch is not autoinstrumented by the agent. The standard recommendation to work around this is to add custom instrumentation using our API.
I suspect the issue here may be that such a large version change in the agent caused an incompatibility between your custom instrumentation and the agents current API, leading to these calls not making it through to new relic.
I recommend finding where you are instrumenting your calls to elasticsearch, and updating the API calls there. Here is a link to our docs on instrumenting external requests that should help explain our current API.
Hopefully that helps!

@tmcclure I understand that there’s no instrumentation that specifically targets elasticsearch. But why would requests to elasticsearch be special? We’ve had no problem with http requests in general being instrumented by the newrelic agent, and as far as I can tell, we’ve not set any custom instrumentation for requests; we get them from the newrelic agent as described here: https://docs.newrelic.com/docs/agents/ruby-agent/features/http-client-tracing-ruby

That brings up a really good point, @brian38. There’s a possibility whatever’s managing the Net/HTTP calls may have an issue and I’d like to dig into that.

What is the Net/HTTP library or gem do you have configured for Elastic Search external calls? Whether it’s just the standard library from Ruby or one of the ones listed, I’d like to specifically target some tests against that library.

Michael

@mwlang appreciate you helping to dig in on this. The short answer is: Typhoeus 1.4.0.

The long answer is: Most of our app makes requests using Typhoeus directly. In the case of requests to elasticsearch, we use the chewy gem, which uses the elasticsearch gem, which uses the elasticsearch-transport gem, which uses the faraday gem. Faraday is essentially just an interface around whatever actual http library it detects is present (or is configured to use), and since we haven’t configured it otherwise, it ends up using Typhoeus. (I’ve double-checked some backtraces during those requests to confirm that’s the case.)

We’re on Typhoeus 1.4.0, but as recently as 2 weeks ago we were using Typhoeus 0.8.0—although I had noticed that these elasticsearch requests were absent in Newrelic even before that, so I think we can rule that out as a cause.

@brian38 I opened this as an issue on GitHub so we can track our effort researching it. The long and short of it, we need to set up a reproduction and test it further ourselves to get to the bottom of this.

Stay tuned!

Great, thanks again. If it’s helpful, we’re on:

elasticsearch (7.3.0)
elasticsearch-transport (7.3.0)
faraday (0.17.3)

Let me know if any other information would be helpful in reproducing.

@brian38,

This is great. I am going to be looking into this tomorrow. Stay tuned.