There is a complete walkthrough on troubleshooting a non-reporting agent linked below. It should help you track down what is happening.
Can you give that a shot and let us know if you find the cause? If it doesn’t help you out, please let us know where you get stuck in the process.
@stefan_garnham Windows, .NET 4.6.1
Agent version is 4.3.123
Yes we tried to get guidance from the published guide.
Anything we can think of outside the guide?
Thank you. Yes we tried that. For our scenario it is already working and then suddenly stopped at 0300 MT.
The first step is always determining if the agent is creating logs for the application.
Can you follow this post and let me know if the agent is creating logs for the app?
In short, the process is this…
- Delete the existing logs at
C:\ProgramData\New Relic\.NET Agent\Logs.
- Restart your application.
- Generate some traffic to your app.
- Find the current process ID (PID) for the application.
- Check the logs directory for a file called
Thank you. The link seems to be not working.I will try that again during non-peak hours for minimal impact and will update the thread.
sorry about that. I fixed the link.
For our scenario, there was an Azure initiated change on the applicationhost.config that changed the applicationPool ID. So the PID that the agent is monitoring no longer exists. The good old restarting machine did not work. We also noticed that even with new images the Agent haven’t started at all. Looking at 54152 seems we have the issue of not having enough permission to the registry key. We are running the startup task to execute newrelic.cmd and elevated context so we’re not sure what changed on May 8. Any ideas
I think that line provides your line of investigation. What did Microsoft change that impacted the applicationhost.config? It could be an OS upgrade or patches that were applied to the VM.
I doubt New Relic would be able to answer that question. Unless you are also running the Infrastructure agent on that VM and you can view the changes in the Inventory for the VM
What jumps out at me is you’re running a version of the agent that is over four years old. That version of the agent doesn’t even support async code. Further, it was built on the .NET 3.5 framework and has some significant issues with TLS v1.1, v1.2, and v1.3. This would be a good time to seriously consider upgrading.
IMPORTANT: Upgrading to the current version will result in some immediately apparent changes. There have been a ton of improvements with transaction naming, so you will likely see transaction names you are not used to. As well, the agent used to track database calls and external requests outside the context of a transaction, which was a bug. That has been fixed. The tracking of async methods in a transaction can cause changes in the charts that might be confusing at first. More detail about that can be found here:
It would be a good idea to read through the release notes for all of the releases since the one you are on:
Finally, I want to address the Azure side of things real quick. @stefan_garnham is definitely on the right track in asking what Microsoft change might have affected this. We see this issue with App Services all the time. A previously reporting application will suddenly stop reporting with no discernable difference. However, when we look at the settings for the application instance, we find there was an “Azure restart event” that took place. This may have been as inconsequential as moving the instance to a new platform, but the result is often the agent quits reporting. An instance restart typically solves this, though. That said, you may want to review the Diagnose and solve problems section in the Azure portal virtual machine menu for the system(s). There may be something in there that would explain why this happened when it did.
There was a PaaS diagnostic upgrade from 1.12.2 to 1.14.0. There was also a guest OS upgrade. What we are wondering in here is that anyone else encountering the same issue? It seems that the permission to HKLM\SOFTWARE\New Relic.NET Agent\ is not there anymore.
Thanks @kyle for jumping in.
During the troubleshooting steps we tried
- reboot the actual machine
- upgraded to the latest NuGet package
but the issue that it does not have access to HKLM\SOFTWARE\New Relic.NET Agent\ persist.
Since you mentioned this is a Windows environment, I wonder if this is related to something we’ve seen lately where Windows updates are removing New Relic registry keys.
There’s a script you can run to restore the settings mentioned here:
Let me know if that helps!
We have absolutely the same issue with our services and two of our critical production services stopped logging anything to NR a week or so ago.
We install NR agent on deploying Azure Cloud service as one of StartUp tasks.
First of all, we started with upgrading version of NR .NET Agent from 4.* version to 8.15.455 and redeployed the service, but it did not help.
So we found this thread and tried manually repair installation of NR Agent as suggested at your latest link. And it helped.
I checked everything that suggested in this and this threads. And the installation process doesn’t fill COR_ENABLE_PROFILING and COR_PROFILER environment variables (at HKLM\SYSTEM\CurrentControlSet\Services\W3SVC\Environment) although the ‘repair’ process does this.
But as we install NR on the deployment, it looks quite strange to ‘repair’ installation right after we installed your agent. I expect your installation process should work in exactly the same way as ‘repair’ process, does it make sense?
So what would be your suggestions? Manually ‘repair’ installation after each deployment and do this manually on the auto-scalling procedure (actually, no, it’s impossible)? Or you could kindly submit a ticket to your engineers to fix the issue as it looks quite urgent?
Thank you in advance for your help.
We’ve seen this issue quite a few times recently with Azure Cloud services.
In all of the cases we’ve seen so far, the agent installation actually does succeed and set the registry keys correctly, but then something in the VM, outside of the agent, runs after that and removes them. We don’t know what that something is, but our theory is that it has something to do with the Microsoft Diagnostics agent, which also uses the same registry keys.
The purpose of those registry keys is to set the required environment variables in IIS processes, so the workaround would be to set those variables system-wide. That way all processes get the variables, even if the registry keys are deleted.
The easiest way to do that would be to have the .NET agent set the variables with its “Instrument all .NET applications” feature. If you’re installing with the .msi from the command line (
msiexec.exe) you can use the
INSTALLLEVEL=50 flag. If you’re installing with the NewRelicWindowsAzure Nuget package, you can alter the
newrelic.cmd file that comes with the package to add that flag to the msiexec command.
Hope this helps.
Thank you @dmorris!
I tried to use INSTALLLEVEL=50 but it did not help, and I don’t see these environment variables (system-wide) after installing your msi from the command line.
We will try to set up these variables manually in newrelic.cmd file on Monday. If you already have a script to do this and can share it, it would be really helpful.
I’ve found that the easiest way to fix this by editing the
newrelic.cmd is to find the following lines:
IF "%IsWorkerRole%" EQU "true" ( msiexec.exe /i %NR_INSTALLER_NAME% /norestart /quiet NR_LICENSE_KEY=%LICENSE_KEY% INSTALLLEVEL=50 /lv* %RoleRoot%\nr_install.log ) ELSE ( msiexec.exe /i %NR_INSTALLER_NAME% /norestart /quiet NR_LICENSE_KEY=%LICENSE_KEY% /lv* %RoleRoot%\nr_install.log )
The only thing to change would be to add another
INSTALLLEVEL=50 to the
ELSE statement per below (so that when web roles are used, it adds the variables system-wide):
IF "%IsWorkerRole%" EQU "true" ( msiexec.exe /i %NR_INSTALLER_NAME% /norestart /quiet NR_LICENSE_KEY=%LICENSE_KEY% INSTALLLEVEL=50 /lv* %RoleRoot%\nr_install.log ) ELSE ( msiexec.exe /i %NR_INSTALLER_NAME% /norestart /quiet NR_LICENSE_KEY=%LICENSE_KEY% INSTALLLEVEL=50 /lv* %RoleRoot%\nr_install.log )
This might cause more logs to be generated in the New Relic logs folder so be aware that this may take up some space on the drive.
Hope that helps!
If you look at your scripts inside IF-ELSE statement, it’s absolutely the same, so I just removed IF-ELSE statement at all and kept only the line with INSTALLLEVEL=50.
And as I wrote half an hour ago - it does not work, no events are pushed to NR portal.
So, yes, agree that it’s the easiest way, but I prefer some harder way which will work well.
Right now I manually added two environment variables in our Stage environment, restarted IIS and everything works well, so I’m planning to implement this by command line script on Monday.
We worked with MSFT and determined that for our case it was caused by the Guest OS Upgrade update for Family 5. While I can find a work around for my system by not scaling and leaving it up all the time, I suggest that NR work with MSFT to expedite the resolution.
@wsantos3 I’ve notified our product manager of this to see whether we can engage Microsoft on this. Ultimately, this is in Microsoft’s hands, because as you said, this is caused by a Microsoft upgrade. The .NET agent is working as expected, and does not have control over corruption of its installation.