We have a backend service running, which has a SOAP API. As part of the work, the service calls both a DB and an external endpoint.
During a recent incident, we saw a drop in throughput on the APM Overview and Transaction pages. As a consequence of this, there was also a drop in the throughput on the APM External Services page.
Now to our problem. The sending party didn’t seem to send less requests. Why did we see a drop in throughput? Is this because we were actually receiving less traffic, or was it because we were slower to process the received traffic?
According to the New Relic glossary, throughput is measured as Requests Per Minute. We usually speak in terms of requests as incoming traffic, and responses as outgoing traffic. And throughput, as we and Merriam Webster understand it, should be the number of completed request-response pairs.
So how does NewRelic measure throughput? Is it counting incoming traffic, ie. the requests? Or is it counting completed requests, ie. requests that also have a response?
The two scenarios we have then, are:
- NR only counting incoming traffic: Since sending and receiving parties show different graphs, not all requests made it to our service, and we have a networking issue.
- NR counting request-reply pairs: Decreased throughput means our service is processing the requests at an increasing slower rate.