Alerts for K8s Node down

I am trying to set up Alerts for when a Node is in a ‘Not Ready’ state in k8s. I can see and query K8sClusterSample(i have attached screenshot) but unable to set alert for this.
Is this a known issue?
Can anyone please help? this is one of the most critical and basic alert to be able to setup.

Hi, @rashmi.modhwadia: Have you tried creating an alert condition for when “available pods are less than desired pods”? Will that work for you?

Alternatively, you might take a look at the isReady or status attributes of the K8sPodSample event.

Hi Phil,
When a node goes down, the pods will automatically reshuffle to another node.
I have some alerts set up for pods as its normal for pods to shuffle - due to resource limits etc. So when a node goes down, relying on a pod alert wont be a good solution. pod alerts can become noisy.

Is its a bug that the Nodesample does not work in alerts?

In your first post, you asked about K8sClusterSample; in the last post, you ask about NodeSample. They are two different events. I would not call it a bug, it is just a limitation in the way the product is designed: only Pod and Container events have isReady and status attributes.

EDIT: I just re-read your original post. Are you asking if it’s a bug that the UI says, “K8sClusterSample has not reported any data yet”, but you are able to see data in the query builder? Yes, that does seem like a bug.

What about:

SELECT uniqueCount(nodeName) FROM K8sNodeSample WHERE clusterName = 'Your Cluster Name'

And alert if the number of nodes is < X?

Sorry - I meant ClusterSample.
Bug i was referring to is the NR error which says “we cant create Alerts for k8sClusterSample because it has not reported any data yet.”
^^ this error seems to give incorrect information. As you can see per my screenshot, I can query for k8sClusterSample data.

So your suggestion would work if my cluster dont grow or shrink. I have autoscaling setup so this wont be helpful as I will manually have to update the alert each time cluster is autoscaled. :slight_smile:

Either via node or cluster sample data, i was hoping to setup alerts in NR to notify me if a node goes down verses situation where nodes were scaled up/down by autoscaling.

Hi @rashmi.modhwadia, just wanted to confirm if this specific issue is now resolved for you or if you need any further help.

Creating an Alert condition on Kubernetes Cluster Sample still gives the error: “We can’t create Alerts for K8sClusterSample because it has not reported any data yet. Please try another data source.” So the bug still exists. Phil did acknowledge the bug - not sure if its pushed out to new version and if i need to update my NR deployment to get latest.

And i can query and see data in NRQL for k8sNodeSample. Either via node or cluster sample data, i was hoping to setup alerts in NR to notify me if a node goes down verses situation where nodes were scaled up/down by autoscaling.

I don’t have a resolution or work around.

Hi @rashmi.modhwadia, This error - “We can’t create Alerts for K8sClusterSample because it has not reported any data yet. Please try another data source.” shows up when you have not had data reporting for that data source in the last 60 minutes.

Further, for a node down alert, I’d suggest using an Infrastructure Host not reporting alert. Information on this can be found here:

Would that work for you?

Hi thanks for your response.
If you see my screenshots attached, you can see i can query for that same data and i ca see results from NRQL.
Phil Weber confirms it seems like a bug.

I will check out the host not reporting condition. Because we are using autoscaling, i was hoping the k8s alert to work as k8s has the correct cluster state.

1 Like