Preface: The payload size limit for APM agents is 2MB. There’s also a limit of 10k metrics. I believe those same limits apply to the collector for plugins. At this point, however, I think that may be irrelevant?
Before I get into the possible issues here, I’d like to point out that the last work done on this plugin was three years ago. More importantly, the only change implemented was a modification to the readme file. The last actual code update was almost five years ago, and that was simply to add the Azure MSSQL database as an available option (the connection strings are significantly different).
One of the requirements for the plugin is .NET 3.5 must be installed. I don’t mean .NET 3.5 or higher. I mean .NET 3.5. The plugin was developed on this version of the framework. There is legacy code in that version of the framework that will prevent the plugin from working properly if it isn’t available.
Next, are you still seeing the following in the log file?
|ERROR|Context|Unexpected response from the New Relic service. StatusCode: RequestEntityTooLarge (Request Entity Too Large), BodyContents:
If you are, the plugin is doing something it shouldn’t oughta. You didn’t give us anything past the 86 metrics, so I cannot tell for sure. 2MB worth of data is 2,000,000 characters. I can see the payload exceeding that (easily, in fact) with 27,079 metrics. Or, the collector rejecting the payload because it exceeded the 10k metric limit, though for APM agents, that is regulated by the agent, not the collector. It could be that was unexpected, hence the reason the entire payload was originally rejected.
Prior to getting into possible connectivity issues, I do want to address the maximum instance limit for the plugin. The main difference here is between the theoretical and the practical. “Theoretically”, with enough memory and CPU, there is no such thing as an instance limit for the plugin. After nearly six years of working with this plugin (a lot), I can tell you from experience that the practical limit is somewhere around 20, and I would not recommend more than 10 unless you intend to dedicate an application server to running just this plugin. If that is the case, you can probably get away with as many as 50, but you’ll want at least 32 cores and 128GB of RAM.
I’m going to move on to connectivity, but if you have not already done so, turn debug logging on to look at this:
You’ll need to restart the service after setting this. Debug output will sometimes reveal the underlying cause for a connection problem where the info level logs may not. I’ll just leave that there as once turned on you might find exactly what the problem is.
Another issue might have to do with the protocols. Remember, this plugin was built on .NET 3.5. The newest security protocol at that time was TLS 1.0. There is no provision in that version of the framework for TLS 1.1 or 1.2. If the host is restricted on this (most servers only allow for TLS 1.2 these days), the plugin will run into an encryption algorithm mismatch, and the connection will get rejected. The following document is specific to the .NET agent, but it applies in this case as well:
Keep in mind, the plugin does not have to live on the MSSQL server. As long as the instances are set to accept external TCP/IP connections and the plugin is installed on a host in the same network segment, it can be on a different server.
Should the plugin continue to report traffic is rejected due to the size of the payload, the next step would be to uninstall the plugin, then install it again. Hold on the to the
plugin.json file if you do this so you don’t have to remake the configuration file. The
newrelic.json is just for the license key, log levels, and proxy settings. If have proxy settings, you can save the file or just copy them somewhere to be added back in. Then, install the plugin again. Assuming you’re using the NPI installer, run
npi -h from an administrator command prompt to output the list of options, then remove the service prior to removing the plugin. If you don’t remove the service first, you’ll get an error running the
The uninstallation/installation process only takes a few minutes. If that doesn’t fix the large payload issue, it will be time to open a support ticket and do some live troubleshooting.
I would like to encourage you to check out the MSSQL On-Host Integration:
This does require an Infrastructure Pro subscription, but there is a lot more detail and flexibility in using this option. Probably the greatest advantage is the metrics are stored in an Insights datastore and can be queried using NRQL. There is a version of the MSSQL plugin that does this, but it is not authored by New Relic, and would therefore require support by the author or through the community (I think there is a license fee as well).
There is one last bit of errata I want to bring up with regards to this plugin. The following log file is built as a buffer to populate the agent log:
Due to a bug in the plugin, that log will grow and grow and never stop. I’ve seen it close to a terabyte in size before. As it is likely no further development will be done on this plugin, it is very doubtful this will change. My recommendation is to create a daily/weekly/monthly job in the Windows Task Scheduler, and have it run a batch file that stops the plugin, deletes that specific file, and starts it back up again. This should take less than two seconds and no metric data loss will occur if it is done in this manner.