Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Unexpected number of results using SINCE clause on query cron jobs


#1

I have a batch job that runs once every hour. When writing a query to see if the pod job has ran I seem to get hundreds of results when i use the SINCE 30 minutes ago clause. I would expect to get back 1 or 0

SELECT startTime, createdAt, status, timestamp FROM K8sPodSample WHERE createdKind = ‘Job’ AND podName LIKE ‘%batcher%’ AND status = ‘Succeeded’ SINCE 30 minutes ago

Does the TIMESTAMP column always update to NOW when running a query? Or is this populated from the harvester


#2

Hey @baskeland

I ran that query in your account and I see the same thing.

I tried running

SELECT * FROM K8sPodSample WHERE createdKind = 'Job' AND podName LIKE '%batcher%' AND status = 'Succeeded' SINCE 2 days ago

Which showed a lot of different results for the attribute createdBy, and also a lot of label.job-name attributes.

So I think your original query might be so vague that it allows for more jobs than just yours to show up.

Could you try narrow down your query some more by specifying some of these attributes in your WHERE clause?


#3

Thanks Ryan - I tried adding a few more conditions to the WHERE clause but i’m still getting over 700 results. Maybe i’m not understanding why there are so many different events recorded to the K8SPodSample result set for a cron job. I would have thought there would be one ‘job’ record per instance of a cron job that runs?

SELECT * FROM K8sPodSample WHERE `label.job-name` like 'hre-batcher-pro%' and createdKind = 'Job' AND podName LIKE 'hre-batcher-pro%' AND status = 'Succeeded' SINCE 1 hour ago

#4

Hi, @baskeland: “Sample” events typically work by taking a snapshot every few seconds, or once per minute. It sounds like Kubernetes events might be a better fit for your use case. You might also try querying the most-recent status:

SELECT latest(status) 
FROM K8sPodSample 
WHERE `label.job-name` LIKE 'hre-batcher-pro%' 
  AND createdKind = 'Job' 
  AND podName LIKE 'hre-batcher-pro%' 
SINCE 1 hour ago

#5

Thanks @philweber. I think the updated query will always give me back a “Succeeded” status if the sample events are continually taking a snapshot in regards to the ‘timestamp’ column that is returned. In that case I won’t know which job actually succeeded and if it was the last one to actually run, if I am following the way the data is being modeled. If ‘latest’ was keying off the actual last run time of the pod then that should work, but since ‘timestamp’ doesn’t do that I don’t think it would be that accurate.

I will look into the K8S events like you mention and see what I can come up with.


#6

My next step would be to look at the prometheus integration into NR to see if it would give me insight to alert on data from cronjob execution status and frequency. The root of the question is trying to come up with a work-around for monitoring kubernetes cronjob’s with NR infrastructure and alerts. This would be a great item to add to the product roadmap if it is not already on there.


#7

Let us know what you come up with from looking at the Kubernetes Sample events…

I do want to look at your point:

“I think the updated query will always give me back a ‘Succeeded’ status”

I think your initial query had this issue - where you were specifying in that query to only return records that match status = 'Succeeded'.

Phil’s query won’t necessarily return only the Succeeded status’s. Not unless the other attributes of the WHERE clause are only available to success statuses. it will just return the most recent status, whether that’s Succeeded or otherwise.


I’m not certain on the workings of the prometheus integration, I’d be eager to learn whether or not it can help you.

Then onto your final point - I have added a feature idea for you, for better handling of K8s cronjob monitoring with Infra & Alerts.

Please do report back with what you find.


#8

Here was my work around solution to monitor the cron jobs.

My basic schedule for the cron jobs are */30 * * * *

Here is the NRQL > SELECT COUNT(*) FROM K8sPodSample WHERE status = 'Running' ANDlabel.job-namelike 'se-batcher-stg%'

Since I don’t get flooded with events when a job has a status of “Running”, I just check to make sure the query doesn’t return a 0 value for more than 33 minutes.

This seems to satisfy the conditions we need and I have tested it out by pausing the jobs and the alert seems to work for me.

Thanks for all the suggestions on this.


#9

Glad you were able to get a solution to this - thanks for coming back to confirm that :smiley: