Problem statement: How to monitor process availability SLA?
I thought to use “state” attribute from ProcessSample to show process percentage availability? By going through the definition of state it says current process status (running or sleeping) i.e R, S. However, I see another value “up”, I believe it is equivalent to “Running” in windows server.
New relic infra agent does not collect process STOP signal so that mean process state Running +Sleeping = Uptime of process.
SELECT percentage(count(*) ,WHERE state =‘S’ OR state = ‘R’) FROM ProcessSample where processDisplayName IN (‘newrelic-infra’) AND hostname = ‘myhostname’ SINCE 1 day ago
SELECT percentage(count(*) ,WHERE state =‘S’ OR state = ‘up’) FROM ProcessSample where processDisplayName IN (‘newrelic-infra’) AND hostname = ‘myhostname’ SINCE 1 day ago
Since New Relic does not receive process down signal, we are create process running alert. SO whenever my process is down then incident will be raised. We will get this downtime duration from NrAiIncident and Minus it from process uptime. In this way we can measure process availability SLA.
Does this makes any sense or am I missing something here?