Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

High Failed Job Queue


#1

Hey guys, I have some private minions, and some of then got a high job failed queue with more than 100k jobs failed :frowning:
is there some way to clear this ? I did a restart but I think will get a long time to clear all this.


#2

@iara.silva

A queue at a private location can be cleared by clicking options drop down at the end of the location row and selecting ‘clear queue’.

If you are referring to the failed jobs stat in the minion overview page, you can clear this by restarting the minion service with:

sudo restart synthetics-minion

#3

Hello @Michel_L,

A few follow-up questions if I may. First, do the failed jobs get cleared from the queue or do they have a repeat pattern prior to being dropped from the queue (say three tries then drop)? What is a typical failure rate for a minion in general terms? Say what percentage of failures is generally acceptable?

Thanks.


#4

@Wren

Coming back to this thread, I suspect that the original question is referring to the running count metric of jobs failed on a particular minion, that is displayed on a minion overview page.

When a check fails in the first instance, two more checks immediately get scheduled into the Private Location queue which the minions assigned to that location will grab to perform. This is part of the three-strike policy for failed checks. If the third check fails then this last check is provided as a Failed check to your account, and the monitor is put into a failing state. When a monitor is already in a failing state after the three strike policy, subsequent check failures are marked as failure on the first instance until the monitor recovers.

There is not a typical failure rate for checks performed on a minion, this can vary much depending on the endpoints being monitored and their own availability and the sensitivity of validations being performed with scripted checks. That being said, I would recommend setting up an Insights dashboard that highlights deviations in failure rates for your Synthetics monitors. For example:

SELECT percentage(count(*), WHERE result = 'FAILED') FROM SyntheticCheck WHERE location = 'Your-private-location' SINCE 2 hours ago COMPARE WITH 1 week ago timeseries auto

This will compare failure rate for a location from the week previous as a baseline.


#5

Thank you @Michel_L. This fit in very nicely with the metrics I was trying to get a view into and the explanation helped a lot with my understanding of what is going on ‘under the sheets’ as it were. Appreciate it! :slight_smile: