Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Feature Idea: Windows Service Monitoring

windowsservice
windows
feature-idea
rfb

#61

I’d like some more details on this. Right now we collect metrics on Windows services for a specific host. With New Alerts (this is not available in legacy alerting) there are metrics that can be monitored which will alert when the values drop below a certain point or go to zero. Based on this, I don’t see any reason we cannot deliver on this ask. Have folks set something up like this in New Alerts that isn’t working? If so, I’d like to review the settings and try to figure out why. Can you provide links to the specific alert policies?

I want to point out I’m not promising anything, but I really do think this should work. I can do some testing based on specific settings if your experience is different, and work with our developers to verify conclusions either way. Thanks!


#62

Hi,

i am trying to get an alert from windows server services when its stops and start , but dont know how to do that , and i am using new relic free trial version , so please let me know how to do that…

example: i am using windows server 2012 , and select one service from services tab like puppet agent ,
puppet agent is running fine now but i want alert when i stop or restart that service through new relic , how i can do it please let me know …

Regards,
jaspreet


How to get alert when services of a server is stopped or started?
#63

Hi,

i am trying to get an alert from windows server services when its stops and start , but dont know how to do that , and i am using new relic free trial version , so please let me know how to do that…

example: i am using windows server 2012 , and select one service from services tab like puppet agent ,
puppet agent is running fine now but i want alert when i stop or restart that service through new relic , how i can do it please let me know …

Regards,
jaspreet


#64

@kyle I think I’m right in saying that at the moment New Relic only collects metrics like CPU usage and memory of services that are using the agent, as well as the top certain number of processes, but not for all services, not the state of a service and it isn’t possible to look at a certain process, e.g. a particular service that is using Java rather than all Java processes? Also like you say New Relic can alert when a reported value drops below a certain threshold, but not when no value is reported, i.e. the service stopped.


#65

hey i want clear explanation for that i didn’t get it that what you exactly want to tell… please explain according to my question … as soon as possible,…


#66

Hi @Trevor_Dearham - The server monitor (I’m using the generic, as this applies to Linux as well) collects metrics for every single process on the system, active or not (a running process that is idle is still recorded). This is one of the reasons it can run into an issue with zombie PIDs. While we see this almost exclusively with Windows, there has been the occasion or two where it has popped up in Linux. It is extremely rare, but not impossible. Regardless, every process, background or foreground, running on the system is queried, metrics are created, and the data reported to New Relic.

The Processes report, on the other hand, only displays the top 20. For a bit of an extreme example, it is possible (though incredibly unlikely) that of the top 20 CPU consumers, none were the top 20 memory consumers. In such a case, we would have to be sure metrics for at least 40 of the processes were collected. And that could change, literally, minute to minute, so which ones do we include or exclude? How do we know in that third minute it won’t change all over again? To ensure aggregation over any given time window, every process must be queried and accounted for.

Exposure to that data is a different story. You get to see the top 20 memory and/or CPU consumers in the processes list. That’s it. No more. There is not a configuration option to see more or a report that will contain more. That does not mean the metrics are not there or accessible. In fact, you can get at them one of two ways. The first is in the Data Explorer in Insights. There is a “Metrics” option, and from there select the entity type (Servers), the server name, and click on ProcessSamples under the suggested searches. That may not expose every process running on the system, but I have yet to hit a maximum value for that list. My busiest test server has 48 running processes, and all of those show up. There may be a limit, so you might need to type in a process name for a more specific search or increase the time window to see previously running processes (there’s a drop-down in the upper right of the page, and the default is 30 minutes).

It is true that the server monitor does not collect data on the “state” of a service. Keep in mind, however, that the monitor reports on running processes. The lack of a process is something it’s just not designed to look for. Simply the idea of doing so opens a huge can of worms. For Windows, technically we could scan the registry for all installed services. But if we’re going to do that, wouldn’t we also need to include all installed applications? A good, and very important, example of why this would be necessary is the w3wp.exe process. That is the process that serves up web pages, but it is not initially spawned as a service. In fact, it is a child process of svchost.exe, which is started from the World Wide Web Publishing Service (the service name is W3SVC, but even this doesn’t match the process name). We would definitely want to track the usage for this process.

Services would, in the end, be the easy part. Obtaining the state of a particular service in Windows is actually pretty simple. Processes that do not run as a service is a different story. For those, it’s not so much determining the state (not running would be considered “stopped” and running would be considered “started”), but rather how doing the lookup itself would impact system resources. My simple test system has several thousand CLSIDs in the registry. The code couldn’t pick and choose. Any one of them could be important. Iterating through all of them during every single sample cycle (which occurs every 20 seconds) has the potential of significantly impacting server performance. Worse, much like the zombie PID issue, there would be more occurrences of the monitor not being able to finish the sample before the next cycle began, which would in turn cause the monitor to start consuming either CPU or memory at a grossly accelerated rate.

That doesn’t mean there aren’t possible solutions. The one that comes immediately to mind is providing configuration options that allow you to specify processes of particular concern, which in turn could “force” the monitor to report on their availability or lack thereof. This would include “state” for particular services as well as time spans for non-service processes that haven’t run during a specified period. This would, of course, require a feature request as appropriate code would need to be included to make this happen.

Doing it this way would also address the concerns with Linux. One of the beauties of Linux, and subsequently, potential issues, is the fact that there is no concept of anything like the Windows registry. Consequently, the only way you could determine what applications were installed would be if you were using a package manager, and had used it for every installed application. Even then, the number of package managers available in the wild would make coding the server monitor to determine all installed packages something of a nightmare. With the option to specify the process or daemon you want to check the state on, that concern would be alleviated. Still, I suspect there would be an upper limit to the number of processes, services, or daemons you could specify so as to ensure the monitor still doesn’t end up causing resource issues.

Right now, though, we can alert on metric data based on the metric name in New Alerts. As long as the monitor/agent (this applies to APM as well) is reporting, zero or null is a valid value. Setting your reporting criteria to < 1 (I would add a second condition, or possibly make it the only condition, equivalent to =0 as a fractional value would fulfill the < 1 criteria) means a result with zero will trigger an alert, as long as the metric exists and the monitor itself is still reporting. I suggest using throughput as the measure because the minimum threshold is five minutes, which means that the process would have to have no activity at all (assuming throughput=0) in that time period before it triggered an alert. (Keep in mind there are multiple “alert trigger” time spans; 10m, 15m, 30m, 1h, 2h.)

With respects to looking at specific Java processes, I can offer a workaround for this as well, but it does take some effort. Both Windows and Linux report on individual processes based on their owner (in APM, the owner is in parentheses to the right of the process name). The metric name includes the process owner as well. If you want Java processes to report as separate processes, you can change the user account that runs each individual process. Create owner names that work best for organizational purposes and you can not only get CPU/memory usage based on the individual processes, but alerting as well.

Finally, alas, we come to one of the cruxes of the issue. To qualify for New Alerts, you must have a paid account. New Alerts is not available for free accounts. That is something I’m afraid there is no workaround for.

Hopefully this will help in you in understanding the possible options.


#67

Hi thanks for send this useful article. but for clear clarification for me please let me know that whatever my requirement which i mentioned above should be implemented or not, if possible please let me know with clear cut solution…

Thanks,
Jaspreet Singh


#68

@kyle Thank you for all the detail.

I think making this feature request generic so that it works against all processes is too problematic, as you say detecting the lack of a process is when you have many process instances of java.exe is going to be difficult, which is the main point of this request.

As this feature request is specifically about Windows services, it is possible to get the state of all Windows services, however it may not be simple to know the intended state of all services. Not all services that start automatically will be running when server monitor is running, e.g. Delayed Start or Trigger Start. They may also only run for short period and then stop, e.g. Software Protection. Therefore I do think that it should be a user defined list of services to check to avoid alerts about services that the user deems unimportant.

I do think that this might be able to be extended to Linux, assuming that service or systemctl is present and the service supports check the status.

The owner workaround is clever and might be a good security practice, although generally most services use Local System, Local Service or Network Service.

I’m surprised to hear that New Relic alerts shouldn’t be available on free accounts, as I have a few free accounts and one paid account, due to permissions only being available for Synthetics at the moment, and I have New Relic alerts on all those accounts.


#69

Hi,
Now whats the final conclusion for the windows service monitoring… please let us know , its possible to implement or not…


#70

@Trevor_Dearham I think the best way to file this is to request two new options. One is to define specific processes and services (by service name) that the monitor should be checking to determine whether A) in the case of a process name, is it running or not and B) in the case of a service, what is the state, what is the intended state, and a “trigger alert” option for any of these. In XML context it could look something like the following for a service:

<serviceWatch="true">
  <serviceName="W3SVC" stateCheck="running" alertStart="false" alertStop="true" alertNotActive="600000" owner="SYSTEM" />
  <serviceName="WdiSystemHost" stateCheck="stopped" alertStart="true" alertStop="true" owner="SYSTEM" />
<serviceWatch />
<processWatch>
  <processName="java.exe" owner="JohnDoe" alertNotActive="900000" />
<processWatch

This just outlines the possible options, not any real configuration. It would also require two code modifications; one in the monitor and the other in alerts.

@jaspreet1.singh to get alerts when a specific process stops reporting, find the exact metric name first. If you use the Data Explorer to get the metric name, take note that the name will contain spaces in-between each segment, i.e. ProccessSamples / SYSTEM / w3wp. These spaces are included for readability. If you intend to copy/paste the metrics from here, you will need to remove the spaces.

Next, in New Alerts, create a new policy or edit a current policy, and select the option to create a new condition. Select the entity type (Server), the specific servers you want the condition to apply to, and on the “Define thresholds” screen, select “Enter metric name” in the “When target server” drop-down. My suggestion is to use “has a throughput”, “equal to”, and set the value to zero. If the selected metric has a zero value for more than five minutes, it should then alert.


#71

@Trevor_Dearham I think the best way to file this is to request two new options. One is to define specific processes and services (by service name) that the monitor should be checking to determine whether A) in the case of a process name, is it running or not and B) in the case of a service, what is the state, what is the intended state, and a “trigger alert” option for any of these. In XML context it could look something like the following for a service:

<serviceWatch="true">
  <serviceName="W3SVC" stateCheck="running" alertStart="false" alertStop="true" alertNotActive="600000" owner="SYSTEM" />
  <serviceName="WdiSystemHost" stateCheck="stopped" alertStart="true" alertStop="true" owner="SYSTEM" />
<serviceWatch />
<processWatch>
  <processName="java.exe" owner="JohnDoe" alertNotActive="900000" />
<processWatch

This just outlines the possible options, not any real configuration. It would also require two code modifications; one in the monitor and the other in alerts.

@jaspreet1.singh to get alerts when a specific process stops reporting, find the exact metric name first. If you use the Data Explorer to get the metric name, take note that the name will contain spaces in-between each segment, i.e. ProccessSamples / SYSTEM / w3wp. These are included for readability. If you intend to copy/paste the metrics from here, you will need to remove the spaces.

Next, in New Alerts, create a new policy or edit a current policy, and select the option to create a new condition. Select the entity type (Server), the specific servers you want to condition to apply to, and on the “Define thresholds” screen, select “Enter metric name” in the “When target server” drop-down. My suggestion is to use “has a throughput”, “equal to”, and set the value to zero. If the selected metric has a zero value for more than five minutes, it should then alert.


#72

@kyle That looks good to me. The only thing that I’d consider adding was the ability to optionally use the service display name instead of the name for Windows services, but that might cause issues if someone creates two services with the same display name.


#73

Thanks for this article but I already tried the same but you my question is the i want to monitor one service, lets suppose name puppet agent which is not present in new relic account and it is available on my server , so how can i get alert from new relic when somebody or myself stop or restart the same service on server and got an email for the same …

plz send me soluton according to that.


#74

@jaspreet1.singh can you clarify this a little for me? Are you saying that you are not using a server monitor product? From your description it sounds like you want to monitor that specific service, but nothing else. Right now the only way to do this would be to create a plugin that monitors that service, determines its state, reports metrics accordingly, then use the specific metric values for determining the conditions for an alert.

I would avoid using zero for metric values to ensure you don’t end up with a null value. You’ll also want to build in logic that causes the plugin to report a state change for at least five minutes (I’d suggest 10) before going back to nominal so you don’t end up with alert incidents that remain open for long periods. The only possible exception would be for the “stopped” state, which for which you’ll likely want the alert to remain open until that state no longer exists. A start or restart change will require a threshold breach for at least five minutes in order to get an alert email.

If I’m misunderstanding, please provide additional details so I can appropriately address the question.


#75

Hello!

Can someone (perhaps Kyle?) tell me if the possibility to define certain services and processes to watch is implemented in new relic infrastructure? I would love to see the functionality


#76

Hi, @martin_helgesen: New Relic Infrastructure monitors everything, but you can configure it to notify you if an individual process stops running: https://docs.newrelic.com/docs/infrastructure/new-relic-infrastructure/infrastructure-alert-conditions/alert-infrastructure-processes.


#77

Hi @philweber

Yes New Relic Infrastructure monitors everything, but I can’t alert according to service name. Yes I can alert by various other details, but some of my services are Java based, as well as a Java based New Relic plugin, so I can’t rely on Java.exe to filter just to my services.

I should be able to use “commandLine contains” as all the services have the same root folder, but that doesn’t seem to select the Java processes. Although if it did work and someone ran a script that was in the same root folder then it would probably be part of the selected processes.

It would be helpful to use some of the Inventory details in alerts, like the service list.


Feature Idea: Alert on service names
#78

Reading through this topic, it seems that there is a fundamental misunderstanding regarding Windows services. The topic is continually directed toward process monitoring. This would work fine if not for the fact that many Windows services run under the same process. For example, svchost runs a multitude of different services under multiple instances of svchost. For example, the workstation I am using is running 14 instances of svchost with 75 Windows services running under those 14 svchost processes. Until the product can monitor those services independent of the executable and provide service state (Started, Stopped) any solution is purely a means to try and use an alternative method to solve a problem it isnt intended to solve for. Process monitoring is not a replacement for service monitoring.


#79

@jeffj803 I agree that process monitoring is not a replacement for service monitoring, but I wouldn’t expect services that aren’t components of Windows to be running inside of the svchost process. Even some of the Windows components have their own processes, e.g. IIS Worker Processes.

In my case I want to either monitor services of 3rd party programs or programs that I’ve built and so monitoring processes could be a work around, although it has many limitations that would be overcome by actual service monitoring.


#80

A post was split to a new topic: Feature Idea: Alert on service names