I am trying to get the total allocated cpu cores in a cluster. This would ideally be the sum of all container requests in a cluster. This value at node level can be retrieved by running
kubectl describe node <node-name> command. We used to sum up these values to get total allocated cpu in a cluster.
Now, we are switching over to NR and trying to create dashboards with these values. We are making use NR infra agent to collect the metrics from our clusters. And we are creating dashboards using these metrics. But, we are unable to find a metric which would help us compute the total allocated cpu cores.
We tried the following query…
SELECT latest(cpuRequestedCores)/1000 from K8sNodeSample facet clusterName, nodeName limit max
But this returned a value higher than the coreCount (eg. 27 cores was returned, while the node has only 16 cores)
We later realized that this was considering the container request associated with ‘Terminated’ pods as well. And we couldn’t find a way to filter out such pods (as we are only interested in ‘Running’ pods).
We later tried a different query using K8sContainerSample…
select sum(rcores) from (SELECT latest(cpuRequestedCores) As rcores from K8sContainerSample where status != 'Terminated' FACET clusterName,nodeName,namespace,podName,containerName limit max) facet clusterName limit max
This query does work, but only for smaller clusters. This is mainly because NR has a max. limit of 2000 rows per query. One of our clusters has more that 2000 containers running and the inner query truncates the results (to 2000 rows). This results in incorrect value for large clusters (ie., shows a lower value of allocated cou cores).
Need your help on this. It would have been nice if
cpuRequestedCores in K8sNodeSample could return the allocated cpu resources based on ‘Running’ pods.