Ever wonder what Windows Server Monitor (WSM) is doing under the hood? This post describes the use of WSM’s verbose logging capability to see what the agent is up to and to troubleshoot a few rare issues that might crop up.
Where are the logs?
WSM actually writes to the Windows Application Event logs. Use Event Viewer to peruse WSM logs. Also, a custom event view is provided to isolate WSM log entries. To view WSM entries:
- Launch the Windows Event Viewer and then
- Under Custom Views, right-click New Relic Windows Server Monitor Events View.
Enable Verbose Logging
WSM doesn’t log much of interest at its “DEFAULT” level. You’ll want to bump that up to “VERBOSE” level to see what’s really going on. To do that:
Launch the Server Monitor Config Tool as the Administrator: From the Windows Start menu, select
- All Programs > New Relic > Server Monitor > Server Monitor Configuration Tool.
- Select the Support tab, then increase the Log level to VERBOSE
- Select Save Settings, and select Yes if prompted to restart the service.
- Now let WSM run long enough to capture the event of interest.
Let’s Examine a few Log Entries
WSM generally works seamlessly but below we call out a few issues that can be observed in WSM logs:
Process Sampling taking longer than 15 seconds
This may be accompanied by increased CPU usage by WSM.
This is likely due an application generating process handles that are not being reclaimed by the system. WSM uses the call System.Diagnostics.Process.GetProcesses() to get a process list. When there are many unclaimed process handles that call can take seconds rather than the usual milliseconds.
It’s likely there is a process, perhaps a virus scanner, that is orphaning large numbers of process handles and giving rise to this situation.
See this article for a discussion of the issue (which is not an agent issue per-se).
System.AggregateException: One or more errors occurred. —> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a receive. —> System.ComponentModel.Win32Exception: The client and server cannot communicate, because they do not possess a common algorithm…
The above error is likely due to the agent’s lack of support for TLS 1.1 and higher. Did you recently disable support for TLS 1.0 on your system? You may need to reenable TLS 1.0 or consider our Infrastructure Product which does communicate via TLS 1.2. You may also want to look at the work-around described here.
Diagnostics have determined that the New Relic server monitor cannot be run at this time.
It appears one or more required .NET 4 patches are not installed
See this article for a resolution!
System.Net.WebException: The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel. —> System.Security.Authentication.AuthenticationException: The remote certificate is invalid according to the validation procedure.
This sounds like a certificate issue and probably means you need to download and install the GeoTrust root certificates on your server.
You may also want to examine Windows System and Security event logs for further clues.
Finally, you may need to resort to a Wireshark capture to obtain a more specific reason for the failure.
Run: SendContentAsync - Exception occurred
System.TimeoutException: Unable to establish a connection with New Relic within 3 seconds.
at NewRelic.ServerMonitor.Core.Collector.SessionedCollectorClientAdapter.SendContentAsync(IContent content, CancellationToken cancellation)…
Something is interfering with the agent’s connection to New Relic. First check that the appropriate addresses and ports have been whitelisted on your side.
You may have to allow the agent to communicate through a proxy so be sure the agent has been been configured appropriately.
A wireshark capture may provide a more specific reason for the failure.
Query: SELECT Name,BytesSentPersec,BytesReceivedPersec,PacketsSentPersec,PacketsReceivedPersec,PacketsOutboundErrors,PacketsReceivedErrors,TimeStamp_Sys100NS,Frequency_Sys100NS FROM Win32_PerfRawData_Tcpip_NetworkInterface WHERE Name LIKE ‘Red Hat VirtIO Ethernet Adapter’
System.Management.ManagementException: Invalid query
at System.Management.ManagementException.ThrowWithExtendedInfo(ManagementStatus errorCode)…
The above message shows a failing WMI query. This log message is a specific example and you may run into similar messages for different WMI queries. These messages might be associated with missing chart data. If so, it’s likely your WMI installation is corrupt and needs to be repaired. This is one of more common issues associated with WSM not displaying chart data or not recognizing a device. Please see this article for more information on diagnosing WMI queries used by WSM.
ProtocolError: ForceRestartError - ***Restart agent on stale config (account) launch=2017-04-02 02:28:18 config=2017-04-29 01:05:46
The New Relic collectors periodically issue a restart request to WSM to update possible configuration changes. WSM should handle this request gracefully by updating its configuration state and continuing - but it doesn’t and the service shuts down instead. The “work-around” is to restart the New Relic Server Monitor service.
As you can see (and probably already know) error messages are not always transparent. We hope the above helps solve or work-around WSM problems you may encounter.
You may also need to do the usual googling to get to the truth behind a WSM log message.
Be sure to search discuss.newrelic.com for more WSM troubleshooting solutions.