Latest() not returning the actual latest?

Hello,
I’m trying to create a NRQL query that returns the latest node status in a k8s cluster based on the NodeReady and NodeNotReady events.
The closest I got to that is:

SELECT latest(event.reason) from InfrastructureEvent where event.reason in (‘NodeNotReady’,‘NodeReady’) facet event.involvedObject.name since 356 days ago

But sometimes it returns “NodeNotReady” for nodes that are actually ready and if I query the node specific data I can see the latest event.reason is indeed “NodeReady” when sorted by event.lastTimestamp.

I think “latest()” is using the reporting timestamp and not the actual event timestamp.

I’m attaching a few screenshots filtered for a specific node that was showing this behavior a few minutes ago. I’ve highlighted some areas that I think will help clarify the issue.

Query showing the “NodeNotReady” result for a node that is Ready:

Another query showing the NodeReady and NodeNotReady events for that same node with default sorting. Note that the two most recent events have the same timestamp but their event.lastTimetamp differs:

And the same query sorted by event.lastTimestamp desc:

Thanks.


New Relic Edit

  • I want this too
  • I have more info to share (reply below)
  • I have a solution for this

0 voters

We take feature ideas seriously and our product managers review every one when plotting their roadmaps. However, there is no guarantee this feature will be implemented. This post ensures the idea is put on the table and discussed though. So please vote and share your extra details with our team.

1 Like

Following because I’m currently working on migrating the k8s mixins from Prometheus to NR. Happened to be on this rule right now.

Hey @fgimenezm Realize you have been waiting awhile for a response. I’m going to reach out to some of our SMEs around this category to see if they can provide some feedback. Thanks for your patience!

Joi

1 Like

Hey @fgimenezm

You’re right! latest() will always use the reported timestamp (attribute name: timestamp), not the actual event timestamp if that’s included as another attribute.

This is overwriteable, but you would need to send the data as a custom event in order to overwrite that attribute. As part of the default InfrastructureEvent, I don’t think there is a way to overwrite it.

I’ll get that filed as a feature request for you though. For the ability to optionally select a preferred timestamp to use in queries.

2 Likes

Sounds great!
Thanks!

@fgimenezm The Kubernetes events integration should send the correct timestamp. Would you mind opening an issue on this project here: https://github.com/newrelic/nri-kube-events/issues ?

1 Like

@jjoly, so you are saying that the events integration should already be using the actual event timestamp to overwrite the reported timestamp?

Is the issue a delay in getting all of the events or do you have events out of order?

If you change your query to the last 5 minutes do you get any events in the last minute?

No, it’s not a delay, it’s about how latest() orders the results to find the latest and how the timestamp is recorded for these events.
The events I’m filtering in this query are “NodeReady” and “NodeNotReady”. These don’t show up very often (unless something breaks). When a node goes down for some reason (rebooting, dead, something) there is a “NodeNotReady” event. When the node goes back online there is a “NodeReady” event.
My plan was to look at the latest(event.reason) for those two events for each node hoping to find the current state of the node which is not reported by other metrics. This worked mostly ok but not always.
Looking into the details of those times that it didn’t work is that I found this issue which is explained in detail in my original post.

Edit: reducing it to 5 minutes will return empty in most cases as these events only happen when the nodes go down and up.

Gotcha. Is this K8? I saw some references above, but wasn’t sure.

We ran NR and K8 for a while but switched to Fargate for various reasons. One of the NR blog posts has this for k8.

FROM Metric select latest(kube_node_status_condition) where condition='Ready' and status = 'true' and clusterName = '<YOUR CLUSTER NAME>' facet nodeName

Looks like that one depends on enabling the Prometheus integration. I’ll give it a try.

Edit: enabling prometheus just for this sounds like an overkill.

Yes. If it doesn’t; then we should change the integration to make sure it does use the timestamp from the event rather than the timestamp from when the data is sent.

As @6MM pointed out, you could use the Prometheus OpenMetrics integration to get the value of kube_node_status_condition. This might be a good workaround until the timestamp for Kubernetes event is updated.

1 Like

Gotcha. Issue opened: https://github.com/newrelic/nri-kube-events/issues/6

1 Like