Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Process Monitoring (Systemd)


#1

We had a recent issue with alerts whereby a monitored process supervised by systemd did not fully start and did not alert. Now the reason the alert did not trigger is known, systemd was constantly trying to start the process over and over, which meant that the monitored process name was pretty much always there. What we’d like is a way to either:

  1. Alert on process PID changing X times in Y minutes
    or
  2. Alert on systemd failing to start processes.

Is this possible?


#2

Hi, @antonio.berrios: You may be able to use a NRQL alert condition to do this. The following query, for example, will return the number of PIDs for a process named systemd, grouped by host name:

SELECT uniqueCount(entityAndPid) 
FROM ProcessSample 
WHERE processDisplayName = 'systemd' 
FACET hostname

You can add WHERE hostname IN (...) to monitor specific hosts, and create an alert condition to trigger an incident when the query returns a value above X.


#3

This just seems to return 1 PID for each host. i.e. the systemd process is running on each host. We don’t at the moment want to monitor the supervising process is running we wan to know if another process, supervised by systemd is constantly restarting (or in other terms, the PID is changing lots).

I see there are events in NewRelic that show the service is restarting, is there a way to alert on number of service restarting events within X minutes?

We do restart the service in the normal course of things, for example on a new code deploy. But we want to know if the service is constantly restarting, say 3 times in one minute.


#4

Try restarting the service, then run the above query. You should see 2 PIDs on one of the hosts.

You may also use the following query to detect service restarts:

SELECT count(*) 
FROM InfrastructureEvent 
WHERE changeType = 'changed' 
AND source = 'services/systemd'