Problems getting k8 container restart count from K8sContainerSample

Im hoping to get a count of restarts using this query:

SELECT sum(restartCount) FROM K8sContainerSample WHERE containerName = “my-awesome-container”

Unfortunately it seems to always return 0 despite the fact that I know that there have been numerous restarts.

The only reliable way of getting the restart count (that I’ve found) seems to be:

SELECT uniqueCount(podName) FROM K8sContainerSample WHERE containerName = “my-unreliable-but-still-awesome-container”

@jon.yeargers1 Sorry you have been waiting awhile for a response from our community. I’m going to bring this back to the attention of our support team. Thanks for your patience!

Neal Mc

Hi @jon.yeargers1, thanks for your question!

The restartCount metric is an increasing counter which represents the total number of restarts since the container was created. This means that the latest value from this metric should give the current restart count for a specified container. For this, we can use the latest() function to return the most recent value for an attribute: docs explaining the latest() function. We can modify your original query by changing the sum() function to latest() and this should give the result we expect:

SELECT latest(restartCount) FROM K8sContainerSample WHERE containerName = “your-awesome-container”

2 Likes

Thank you @kmcginley!
Would you be able to explain for latest() function further, please? I read the description on the doc, but I’m still unclear exactly what it does. Let’s say we have a line graph for SELECT latest(restartCount) FROM K8sContainerSample FACET containerName TIMESERIES over the past 1 hour. The graph looks something like this


Is that showing there is 15 new restarts around 12:05 and no new restarts around 12:10 to 12:15?

Thank you!

Hi @tnhuynh,

The latest( ) function returns the most recent value for an attribute over a specified time range. With TIMESERIES, your query results are grouped into multiple buckets, each representing a distinct period of time. If you do latest(attribute) in conjunction with TIMESERIES then each bucket would return only the most recent value for attribute over that period.

I found a related question where a user used this query that could be applicable to your use case:

SELECT max(restartCount)-min(restartCount) as ‘Restarts’ from K8sContainerSample [...]

This gets the change in the restartCount value within each time window. Every minute, it is taking max(restartCount) and subtracting min(restartCount) for that minute only.

1 Like

Thank you @kmcginley!
I have a follow up question based on the query you suggested. If I do SELECT max(restartCount)-min(restartCount) [...] FACET podName then it behaves as you describes for the individual pods. However, when I try to get the restart count within a time window for a group of pods belonging to the same app in an environment (e.g. SELECT max(restartCount) - min(restartCount) FROM K8sContainerSample where containerImage like 'app-name' facet clusterName) , then the result from that query returns the difference between the max restart count among the pods - min restart count among the pods. What’s the proper way to get the restart count within a time window for a group of pods belonging to the same app in an environment ?

Hi @tnhuynh,

I tested this query in my own cluster with some sample pods with containers restarting every 60-90 seconds so I had an incrementing restartCount. You are correct, it seems to work correctly for individual pods but not for a group of pods belonging to the same app/using the same container.

There are a couple of NRQL functions available to get the rate of change for an attribute but neither of these yielded the results I expected from my own cluster, but I wanted to call them out for you anyway:

  • derivative(attribute [,time interval])
  • latestrate(attribute, time interval)

Unfortunately, it doesn’t seem like there is a possible query at the moment to get the count of new restarts in the full cluster over a time period. I came across this feature idea that was posted by another user having a similar issue with getting the new restart count for a specific time period. If you’d like to add your vote here and maybe information about your use case, it should increase the chances of this being addressed sooner.

I don’t know if I agree with this logic. The min(restartCount) and max(restartCount) don’t seem to be limited to timeseries buckets for me.

Min is always 0, but max does seem to be restricted to the bucket’s max.

Actually, this is built into the Kubernetes dashboard and it is faceted differently there:

FROM K8sContainerSample SELECT max(restartCount) - min(restartCount) AS 'Restarts' FACET clusterName, podName, containerName TIMESERIES LIMIT 50 SINCE 1633362418437 UNTIL 1633366018437 WHERE clusterName IN ('xxxxx')

This looks much more accurate.

2 Likes

@Larry.Collicott Thank you for sharing that additional information!