I have a Kubernetes Deployment with more than one replica of its pod. I would like to plot the combined CPU (and memory) usage of this Deployment.
For one replica/pod, the following NRQL query works fine:
FROM K8sContainerSample SELECT max(cpuUsedCores) AS 'Used', min(cpuRequestedCores) AS 'Requested', min(cpuLimitCores) AS 'Limit' WHERE clusterName='CLUSTER' AND namespaceName='NAMESPACE' AND deploymentName='DEPLOYMENT' AND containerName='CONTAINER' TIMESERIES
But, since it picks only a single sample (
min) from all replicas of the pod, it does not give a correct picture when scaling the Deployment.
To get a better picture of the load across all replicas, I tried to sum them up:
FROM K8sContainerSample SELECT sum(cpuUsedCores) AS 'Used', sum(cpuRequestedCores) AS 'Requested', sum(cpuLimitCores) AS 'Limit' WHERE clusterName='CLUSTER' AND namespaceName='NAMESPACE' AND deploymentName='DEPLOYMENT' AND containerName='CONTAINER' TIMESERIES
Sadly this does not work at all (it is wrong by a factor of 4 in my tests).
Looking at the data (JSON view) that the
WHERE part of the query selects, it seems that I
sum over all data points that were sampled in the given timeframe, i.e. it sums over multiple samples from the same container.
Does NRQL have a group-by aggregation step? Something that allows me to express: “Take the samples that match this
WHERE clause, then group them by
podName and select
max for each group, then
sum up the result from each group”?