I’m trying to figure out how to get an average of sum of facets. (This question is similar to other questions, e.g. Sum average duration of three lines but the solution presented there won’t work particularly well in this case since enumerating every value in a facet is a bit crazy).
Here’s an example query:
SELECT average(cpuUsedCores) from K8sContainerSample facet podName where deploymentName = 'mydeployment' and clusterName = 'mycluster' timeseries auto
This is a somewhat useful graph, but what I actually want out of this graph is the total cpu usage for this application, i.e. the sum of facets. The naive way to write this query would be:
SELECT sum(cpuUsedCores) from K8sContainerSample where deploymentName = 'mydeployment' and clusterName = 'mycluster' timeseries auto
But this does not work because the sum counts duplicate data within the timeslice (i.e. if a given container reports N times during the timeslice, the reported sum will be ~N times larger than the correct value). One (somewhat) better method is this:
SELECT sum(cpuUsedCores) * uniqueCount(podName) / count(podName) from K8sContainerSample where deploymentName = 'mydeployment' and clusterName = 'mycluster' timeseries auto
This has the effect of rescaling the sum by the sample rate (the value of count(podName) / uniqueCount(podName)
is a rough approximate of the number of times each pod reported in during the timeslice (the value of N above), and so rescaling by the inverse corrects the sum error). This method sort of works, but only if the sample rate is consistent; during timeslices with rolling pod restarts, for example, this value becomes badly incorrect and there is no way to tell from the graph that it’s wrong. This also appears fool the sorting function when used with facets, like with this query:
SELECT sum(cpuUsedCores) * uniqueCount(podName) / count(podName) from K8sContainerSample facet deploymentName where clusterName = 'mycluster' timeseries auto LIMIT 3
the LIMIT 3
actually hides an bunch of facets who definitely should be sorted higher (facets whose min, median, mean, max, and sum are all above the shown values), I assume because the weird rescaling math is opaque to the sorter; i end up having to do something like LIMIT 1000
just to make sure the peak users end up in the graph (which of course makes this expensive query even more expensive).
Is there a way to do this type of query properly? I should note that, while I’m using the K8sContainerSample above, I end up trying to do this class of query a lot across different datasets, so figuring out a side-channel to get this specific query may not actually solve the problem. If it’s not possible to do this type of query using the currently-available features, perhaps a feature request: a new function averageOfSumUnique
which would reproduce the above queries with averageOfSumUnique(cpuUsedCores, podName)
Thanks!
New Relic Edit
- I want this too
- I have more info to share (reply below)
- I have a solution for this
0 voters
We take feature ideas seriously and our product managers review every one when plotting their roadmaps. However, there is no guarantee this feature will be implemented. This post ensures the idea is put on the table and discussed though. So please vote and share your extra details with our team.