Threshold for PHP agent distributed tracing

Hi,

I am trying to reduce the daily amount of data ingested for tracing.
Through “Manage your data” => “Data ingestion” I can see that I am currently using almost 3GB per day for “Tracing”.

When I go to “Services - APM” => “Transactions”, under “Transaction traces”, I see (as expected):

“No transaction traces above threshold in the last 3 hours.
If you were expecting traces, no transactions took longer than 2.0 seconds (4 * apdex_t) or there is an error in the agent configuration.”

This is as expected, apdex_t is set to 0.5s in “Services - APM” => “Application”.

Relevant settings in the newrelic.ini (confirmed in “Services - APM” => “Environment” => “Agent initialization”):

;newrelic.transaction_tracer.enabled = true
;newrelic.transaction_tracer.threshold = “apdex_f”
;newrelic.distributed_tracing_enabled = true

(All those settings are commented, but the settings are the defaults).
So “Services - APM” => “Transactions”, under “Transaction traces” listens to these settings and does not store/show anything below 4x apdex_t (2 secs in this case).

However, when I go to “Services - APM” => “Distributed tracing”, I see over 16 thousand traces for that same 3 hour period, including traces for transactions with a duration of 20ms, 100ms, 200ms, etc (all not even close to apdex_f).
I suspect that these traces/span info is what is using almost 3GB per day.

When I look at the docs for “Distributed tracing for the PHP agent” here: Distributed tracing for the PHP agent | New Relic Documentation
Under step 4, regarding the newrelic.transaction_tracer.threshold it reads: “If you want to make all transactions eligible for a distributed trace, set this value to 0 seconds.”
However, this is set to the default value of apdex_f, yet this seems to be ignored for distributed tracing, since I still get distributed traces for transactions, since I see 16K traces in the last 3 hours all nowhere near apdex_f in duration.

Is this expected/intended behavior or a bug?
If it’s intended, then maybe step 4 on Distributed tracing for the PHP agent | New Relic Documentation should be removed or replaced with a notice stating that for distributed tracing the newrelic.transaction_tracer.threshold setting is ignored, because currently it obviously implies that that setting influences distributed tracing eligibility as well.

I could of course disable distributed tracing completely, but it would still be nice to be able to see traces for slow transactions (above the threshold).

I am using PHP Agent version: 9.18.1.303

Regards,

Jeroen

Under step 4, regarding the newrelic.transaction_tracer.threshold it reads: “If you want to make all transactions eligible for a distributed trace, set this value to 0 seconds.”

Hi, @newrelic-jeroen: That seems incorrect. Transaction traces only trace the slowest transactions (those that exceed the configured trace threshold) each minute.

If you are using head-based sampling (the default) distributed tracing traces a representative sample of system activity, regardless of response time, up to 1000 spans per agent per minute. I don’t think the transaction_tracer.threshold configuration setting has any effect on distributed tracing.

You may configure drop data rules to omit data you don’t want.

Hi, @philweber - thanks for your reply!

Do you perhaps have a suggestion on how to create a drop data rule that achieves what I want?
Point is, a span has a duration, but that is just the duration for that function call/query/segment, so I cannot base the query on that. I would like to keep all spans/traces related to transactions that took 1 second or more for example.

I could create a rule with type DROP_DATA with an NRQL along the lines of:

SELECT * FROM Span WHERE transactionId IN (SELECT guid FROM Transaction WHERE totalTime<1)

So that all spans related to transactions with a totalTime of less than a second are trashed.
However, subqueries do not seem to be supported, so that query does not work.

In my case the application is hosted by 2 hosts behind a load balancer, so I could probably cut the tracing-related data ingest in half by creating a DROP_DATA rule that drops the data from ‘host 2’ (or simply disable distributed tracing in the agent config on host 2), but that would still leave me with all the traces from host 1 for all the fast transactions that I am not interested in.

How about:

SELECT * FROM Span WHERE parent.id IS NULL AND duration < 1

I am not sure what happens if you drop the root span; I would hope it would drop the entire transaction. Try it?

Another option is to simply disable distributed tracing altogether.

Interesting approach! For now in the end I decided to go with the option of dropping all spans from 1 of the 2 servers, this should hopefully already cut my trace-related ingest in half.

Might still try your suggestion though, but did not yet right now, mainly for 2 reasons:

  • Maybe someone from support/development can confirm that this would work, because if it does not drop all children for that root span: how will the UI handle this, when viewing spans for which the parent no longer exists?

  • Sometimes instantly seeing the ‘anomalous spans’ can come in very handy. I realized that if I trash all the spans for ‘non-slow’ transactions, then the slow spans probably will no longer be an anomaly because there are no more fast ones to compare them to.

1 Like

New Relic is closed in the U.S. for the Thanksgiving holiday. I will follow up next week and see if I can find someone who knows a way to drop spans based on the duration of the transaction.