What is the correct method to get an average(sum(facets))?

I’m trying to figure out how to get an average of sum of facets. (This question is similar to other questions, e.g. Sum average duration of three lines but the solution presented there won’t work particularly well in this case since enumerating every value in a facet is a bit crazy).

Here’s an example query:

SELECT average(cpuUsedCores) from K8sContainerSample facet podName where deploymentName = 'mydeployment' and clusterName = 'mycluster' timeseries auto

This is a somewhat useful graph, but what I actually want out of this graph is the total cpu usage for this application, i.e. the sum of facets. The naive way to write this query would be:

SELECT sum(cpuUsedCores) from K8sContainerSample where deploymentName = 'mydeployment' and clusterName = 'mycluster' timeseries auto

But this does not work because the sum counts duplicate data within the timeslice (i.e. if a given container reports N times during the timeslice, the reported sum will be ~N times larger than the correct value). One (somewhat) better method is this:

SELECT sum(cpuUsedCores) * uniqueCount(podName) / count(podName) from K8sContainerSample where deploymentName = 'mydeployment' and clusterName = 'mycluster' timeseries auto

This has the effect of rescaling the sum by the sample rate (the value of count(podName) / uniqueCount(podName) is a rough approximate of the number of times each pod reported in during the timeslice (the value of N above), and so rescaling by the inverse corrects the sum error). This method sort of works, but only if the sample rate is consistent; during timeslices with rolling pod restarts, for example, this value becomes badly incorrect and there is no way to tell from the graph that it’s wrong. This also appears fool the sorting function when used with facets, like with this query:

SELECT sum(cpuUsedCores) * uniqueCount(podName) / count(podName) from K8sContainerSample facet deploymentName where clusterName = 'mycluster' timeseries auto LIMIT 3

the LIMIT 3 actually hides an bunch of facets who definitely should be sorted higher (facets whose min, median, mean, max, and sum are all above the shown values), I assume because the weird rescaling math is opaque to the sorter; i end up having to do something like LIMIT 1000 just to make sure the peak users end up in the graph (which of course makes this expensive query even more expensive).

Is there a way to do this type of query properly? I should note that, while I’m using the K8sContainerSample above, I end up trying to do this class of query a lot across different datasets, so figuring out a side-channel to get this specific query may not actually solve the problem. If it’s not possible to do this type of query using the currently-available features, perhaps a feature request: a new function averageOfSumUnique which would reproduce the above queries with averageOfSumUnique(cpuUsedCores, podName)

Thanks!


New Relic Edit

  • I want this too
  • I have more info to share (reply below)
  • I have a solution for this

0 voters

We take feature ideas seriously and our product managers review every one when plotting their roadmaps. However, there is no guarantee this feature will be implemented. This post ensures the idea is put on the table and discussed though. So please vote and share your extra details with our team.

2 Likes

Hi, @novas0x2a: As you have discovered, NRQL does not currently support nested functions (average(sum(...))) or subqueries (SELECT sum(...) FROM (SELECT average(...) ... FACET podName)), either of which I think would solve your problem. Both of these features have been requested in the past.

Until our product teams decide to implement these features, there are a couple of workarounds:

  • Use the Insights query API to execute the inner query, and send the resulting JSON to another tool (such as a spreadsheet) for further analysis; or,

  • Use a scheduled task or Lambda function to periodically execute the FACET query and send the result back to Insights as a custom event. You may then calculate the average(), sum(), etc. of the custom event’s attributes.

1 Like

that is 2 years ago, new relic you guys are really slow for a commercial PAAS

6 Likes

Hey @vincent.liu

I agree, it’s been quite a while. It looks like this feature idea wasn’t prioritised over the past while.

I can absolutely get this bumped up to the team in a new feature request - so that the right people will see it. However as I’m sure you can imagine, we get many many feature requests, it’s impossible to prioritise every one of them, all we can do is ensure it’s logged and seen for you. And I will do that

I will say that, while this isn’t one of them, there are many features that have been released in the past while, most of which stemmed from customer driven feature requests like this one. And some of those highlighted here in our current March Mayhem voting rounds: https://discuss.newrelic.com/tags/marchmayhem2020

1 Like