Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Alert when Process Isn't Running Using NRQL

nrql
nrql-alerting

#1

I need to be able to alert the developers when their software stops running on a server. I know that within Infrastructure we can add an alert which requires a process to be running, but it would be very helpful to write a NRQL query that will look at a whether the process is running without telling it how many servers it will be run on. My question is, is this possible? I have provided a sample query below that detects the number of processes running on the server that match a particular pattern:

SELECT count(commandLine) from ProcessSample FACET hostname where commandLine like ‘/usr/java/nds-%/bin/java’ and apmApplicationIds LIKE ‘%|95505343|%’ since 3 minutes ago

My thought process is that if I could run this and require at least 1 instance running on each host at all times then this would work as long as the apmApplicationIds are populated at all times. I’m just not sure if they always will be. If they aren’t populated at all times then I could add a label to the host to indicate that the service is up, but I’m hoping there are other ways of doing this as well.


#2

Select count(*) from Transaction where appid = '95505343' since 10 minutes ago until 2 minutes ago

If your application generates transaction then you might simply do the above and alert below a threshold.

You can also do this:

Select uniquecount(host) from Transaction where appid = '95505343' since 10 minutes ago until 2 minutes ago

That would give you the count of hosts that have reported any insights events in the timeframe. It will not tell you if the app is running and not doing anything. Most all apps have something happening on a regular frequency. If you don’t then you can add something like a heartbeat using synthetics or internally with the app.


#3

Hi @jmajors,

Some interesting suggestions from @6MM, but I wanted to address the specific question of doing this using Infrastructure events. Also, keep in mind that the Transaction event will have a count of 0 even if the application is still running at times when throughput drops to 0.

If you’d like to create an Infrastructure Process Running condition using NRQL, you should be able to make that happen by using the following query:

SELECT uniqueCount(entityAndPid) FROM ProcessSample WHERE commandLine like '/usr/java/nds-%/bin/java' AND apmApplicationIds LIKE '%|95505343|%' FACET hostname

This should monitor and count the number of unique entity/PID combinations that match the filters set in the WHERE clause, then facet by hostname. When any facet drops to 0, it would mean that no processes that meet those filter elements are running on that facet.

As with any NRQL Alert condition, I would recommend testing this first in Insights to make sure it’s returning the data you expect. As you pointed out, it’s also relying on the apmApplicationIds attribute to remain constant. If there is any danger of the App ID changing, I would recommend using custom attributes on the hosts you want to target with this query.

I hope this helps!


#4

Thank you for your response @Fidelicatessen.

If a host does not report to new relic for 30 minutes and if I’m querying for the last 5 minutes worth of data, will new relic remember that the host was associated with that apmApplicationId in the past or would it leave that host out of the query result?


#5

Hi @jmajors,

If a host does not report for 30 minutes, it wouldn’t be reporting any data, so it wouldn’t give any data as part of a query result over the last 5 minutes. In this case, the host would be left out of the query result, if only because it wouldn’t have been reporting anything.

If the host had come back after being offline for 30 minutes, the linkage between APM and Infrastructure would either reconnect or would have never gone away in the first place. So if the host was back online, your query over the last 5 minutes filtered to apmApplicationId would pick up the host.

If I misread your question, please let me know.


#6

I really like measuring a metric that is a result of a thing working. It’s not possible sometimes.

SELECT filter(uniqueCount(entityAndPid), where commandLine like '/usr/java/nds-%/bin/java') - uniqueCount(entityId) FROM ProcessSample WHERE apmApplicationIds LIKE '%|95505343|%'

That sort of thing will compare number of hosts to number of hosts with the process running. Maybe that can work for you. Alert when this isn’t zero.

*note that this forum stomps on nrql sometimes. there are some ticks around entityId.


#7

I should probably point out that sometimes there are multiple processes on a host with the same name. If you really just want to make sure one or more is running you can use this undocumented feature of uniquecount(). You can stack the attributes. So uniqueCount(entityId, commandLine) will count as 1 when there are 2 processes on the same box with the same commandline.

SELECT filter(uniqueCount(entityId, commandLine), where commandLine like '/usr/java/nds-%/bin/java') - uniqueCount( entityId ) FROM ProcessSample WHERE apmApplicationIds LIKE '%|95505343|%'


#8

Thank you for responding. The information provided makes sense.