K8sDeploymentSample- what is the best way to write NRQL alert when the pods are unavailable per services or pods suddenly come offline ?
Hi @mary.gokoffski, you could try a query like this one:
SELECT latest(status) as 'Status' from K8sPodSample WHERE isReady != 1 facet deploymentName, namespace, podName since 10 minutes ago UNTIL 1 minute ago limit 100
What if the deployment is hung due to their maxUnavailable/maxSurge settings since the deployment is manipulating the RSs shouldn’t we taken into account podsMaxUnavailable.
Is there a way that we can provide a better visualization that puts pod counts over time using your query?
You could use the following to represent the number of pods missing per deployment:
SELECT latest(podsDesired)-latest(podsReady) from K8sReplicasetSample WHERE podsReady<podsDesired since 60 minutes ago UNTIL 1 minute ago timeseries facet deploymentName
We provide an out of the box alert for incomplete deployment under the Kubernetes alert type in the alert creation page:
available pods are less than desired pods.
You would set that alert for a period of time. As you mention, having a surge or a smaller amount of time while rolling out a new version might be expected. But if that variation lasts for a longer period, that probably reflects an issue with the deployment.