Monitoring resource pools


I’m trying to get some visibility into my resource pools (redis, mysql and worker threads).
Each pool has several metrics, for example current size and number of available connections, for the redis connection pool.
I basically want to know if my pools are correctly sized for the current load and if we’re approaching a limit.

On top of that, my application is server by puma, using its cluster mode, which means on each host I have multiple instances of the pools.

I tried following the examples in this post by having a background thread poll the pools for their metrics every second and sending them as event attributes to NR using ::NewRelic::Agent.record_custom_event.
I added an attribute identifying each puma worker (worker_id) to each event, otherwise we would have a “last one wins” effect.

I’m now trying to aggregate (sum) those data across hosts and puma workers, to get a single graph with each metric for a pool, because I don’t need the details per host or worker, and I don’t want to have multiple graphs for each pool type.

I understand from this question that I cannot do something like SELECT latest(sum(size)) ... FACET host, worker_id, so I’m wondering if there is something else I can do.
Maybe events are not the correct tool?



Hey there - You’re right, you can’t currently do SELECT latest(sum()) - though the latest operation will return just the most recent event.
Instead you could try
SELECT sum(attribute) WHERE attribute = "X" FACET host, worker_id SINCE 60 minutes ago LIMIT 1

This will return just 1 result, which is typically the most recent event recorded.

A colleague here also suggested that you might look at the Infrastructure SDK. maybe you can capture all metrics that are important from a single integration and send those events up in logical way to make it easier to write NRQL queries on them.

Is that something you could look into?

Thanks for your answer Ryan.

The issue with using only sum without latest is that this attribute is a “gauge”, so I’m interested in the value it has at each instant (e.g. there were 10 busy connections out of 100 and 2 dead ones at 10:00:00), not the sum of the values during a period of time.
But I do want to sum over all “workers” because I’m not interested in the latest value for any particular worker, and I want to display multiple attributes in the same graph.

Maybe what I want to do is closer to SELECT sum(latest(attribute)) [..], which is not doable either.

Unfortunately the custom infrastructure integration is only available with infrastructure Pro, which we don’t have and which cost is not warranted by this feature only.

While writing this answer, I realized the cardinality of worker_id is low, so I can actually run a query like SELECT histogram(busy, width: 20, buckets: 20) FROM DbPool SINCE 1 hour ago FACET host, worker_id to get an idea of how busy my db connection pools are for each worker of each host.
That does not give me the current value or it’s variation over time, but it’s good enough for my need, and I can still get the current value for each worker when I need it.

Thanks again and happy new year.

Thanks @cyril4 - Happy New Year to you too :partying_face:

Thanks for sharing that workaround with Histograms.

I’m curious though, in a query like SELECT sum(latest(attribute)) - you’ll be summing one value.
latest() will always return 1 single value, adding sum(latest... will not then make latest return the latest values rolling over a time period. It will still return the very latest value, and since there is just one value to return with latest,
essentially sum(latest())... is the same as latest()...

If you need the sum over a time period, then sum(attribute) will be the best option. adding latest() will not help.



I just started using Insight so I’m probably wrong but from what I saw, latest(attribute) [...] TIMESERIES, displays a curve, not a constant line, so I assumed it yielded the latest value at each instant in time (a rolling latest value), not the current value for all points.

The sum I want to do is not over a time period, but over all facet values for the same point in time.
Let’s say I have the following 3 events (I’m ignoring worker_id here, only considering the host facet since the issue is the same):

t0, host: 0, available: 10
t0, host: 1, available: 20

t1, host: 0, available: 10
t1, host: 1, available: 15

t2, host: 0, available: 5
t2, host: 1, available: 15

I want a timeserie which shows, for each point in time, the sum of available over hosts:

t0: 30
t1: 25
t2: 20

Obviously, there is aliasing coming into play too, but that problem can be solved using the correct aggregation method when packing a time period into a single value.
At least that’s doable with graphite, which we are still using for this task.

Hope my need is clearer with this example.

Hey @cyril4 - do you mind sharing details on the exact query you currently have, along with a link to your account? I’d like to see if there’s NRQL I can work out that may help meet those needs. Note the link to your account is only accessible by you, and New Relic admin staff.

Hi @RyanVeitch.
We don’t yet send those events continuously to NR (I was sending some burst from my local machine).
I’ll give you the details once we’re sending the events from prod machines (sometimes next week).
Thanks for following up.

Sounds good @cyril4 - looking forward to hearing back from you.

Hi @RyanVeitch,

We now have events coming in. Account link is and an example of working query is

SELECT histogram(db_connections, width: 60, buckets: 60) FROM HachiPools WHERE appName = 'Hachi (PROD)' SINCE 1 hour ago FACET host, worker_id

I created a couple of dashboards for 2 of our services::

An example of what I’d like to be able to do:

SELECT sumOver(latest(db_connections), host, worker_id) FROM HachiPools WHERE appName = 'Hachi (PROD)' SINCE 1 hour ago TIMESERIES

Where sumOver sums the attribute (or expression) over the values coming from other attributes, as I explained before.

Thank you,

Hey @cyril4 I’m afraid it looks like the query you are trying to achieve, and your goals don’t seem to be possible right now.

When it comes to sumOver, this is not an available function at the moment. You may be able to plot the rate over time with the rate() function, but I don’t think you’ll meet your exact needs.

I’ll get a feature request submitted internally for you - for more enhanced query capabilities.

1 Like

That’s what I thought.
Thanks for looking into it anyway, Ryan.


No worries - be sure to let us know if there’s anything else we can help with.