AWS Integration Stream suddenly stopped working

AWS Integration stream was working a couple of days ago.
Now on the UI we have the message “We haven’t received any metrics from AWS account .”

We haven’t changed anything.
Suddenly we now have hundreds of permission errors in the account status dashboard and when querying the IntegrationErrors in NRQL.

The stream method has always seemed flaky and this feels like just another example. It’s really difficult to debug why it’s suddenly broken.
All of a sudden the API calls graph has spiked up (who knows if that’s charging us)

Yet I can still query AWS metrics in the Data Explorer?
What’s going on?!

Main reason I’ve got here is I need to debug a different integration but my IntegrationError metrics are being flooded with AWS UnrecognizedClientException and ServiceAccessDenied errors

Did you resolve this? I just tried to set up a fresh account and am getting the same error (UnrecognizedClientException ServiceAccessDenied) with no other data coming through…

Only way I was able to fix my issue was setting it up fresh.
Sounds like you are already doing that.

I am still getting the ServiceAccessDenied error but only for a couple of regions and services. We’re still receiving the rest of our data.
If you are recieving that error for all of the requests then I’d suggest the permissions in AWS might need looking at to make sure New Relic can stream out all the data.

Thanks - looks like I hadn’t finished setting up the firehose from AWS to New Relic: Amazon CloudWatch Metric Streams integration | New Relic Documentation

Will try that and hopefully things will start working…

2 Likes

@mike.smith Following up since I see that you were able to fix the primary issues. Please confirm if you are needing further support. Thanks!

@newrelic387 Would also love to know if it started working for you again after finishing the setup?

Hi @JoiConverse
Please see my latest ticket opening this same issue again.

We keep seeing the metrics disappear for no reason.
Can this be looked into please?
We need to be able to rely on New Relic to alert us to problems, but if it keeps dropping the AWS connection without reason or notification then we cannot rely on it at all to alert us to issues in our infrastructure.

Thanks

2 Likes

@mike.smith Thank you for sharing the topic you posted regarding this issue. It looks like one of our support team members was able to assist.

@JoiConverse @mike.smith what was the solution? The same is happening to my account I opened a new thread but Just found yours and would like to know what was the solution! Thanks a lot

Yeah as @mike.smith mentioned “UnrecognizedClientException ServiceAccessDenied” is likely an IAM permission error.

There are 3 places that will use IAM permissions. When the Metric Stream does an sts:AssumeRole to send data to the Firehose, when Firehose does an sts:AssumeRole to put objects in s3, and when the NewRelicInfrastructure-Integrations role allows your New Relic account to do an sts:AssumeRole.

Since the streams are configured per region and you mentioned that some regions were affected and not others I’m wondering if there are any temporary outages for a given AWS region.

Also since redeploying the stream likely involves using the same NewRelicInfrastructure-Integrations role I’m wondering if it’s specific to the Metric Stream being unable to assume a role (or possibly the token can’t be renewed after it expires) and thus forward metrics to Firehose.

@troycox @JoiConverse I’m also experiencing the same problem now. All integrations have stopped working (several regions).

The error in Kinesis delivery stream is

"Delivery to the endpoint was unsuccessful. See Troubleshooting HTTP Endpoints in the Firehose documentation for more information. Response received with status code. 403"

Based on AWS documentation:

  • 403: Indicates that the access key you configured for your delivery stream does not have permissions to deliver data to the configured endpoint.

We have not changed anything in our end so I assume that the API key provided by New Relic to us has stopped working. Can you please check if this is the case and fix it.

The same problem comes up in several messages without a solution, in all of them it seems that the integration just stops working after a while which suggests that the authentication expires and cannot be created again.

This is a serious problem because it breaks the monitoring without any warning.

I also tried creating a new API key but that did not help. Seems like the only option in our end would be to unlink the aws account from New Relic and re-configure it. But that cannot be the solution if this needs to be done regularly. I’ll wait to see if New Relic support can get this fixed first.

Hi @sauli.ketola

I hope you are having a good day. Congrats on your first post in the community.

I am reaching out to let you know that I am working on this, apologies that it is taking longer than expected. We see you, and we want to get you supported.

Can you confirm that you are still facing this issue ? If so can you provide a link to where you are seeing the issue in your account. Only New Relics will be able to access this link.

I did find another post here that a user in this thread said was related to their issue and appears to have worked from them.

Please let me know if the linked post was helpful.

Hi,

One of the pages where you can see that there’s an issue is when you go to Infrastructure → AWS and there’s a message “We haven’t received any metrics from AWS account xxxxxxxxxxx.”
and of course from the fact that there are no metrics.

So far I have tried to:

  • unlink the aws account from new relic and link it again with a new iam role
  • re-create the kinesis stream
  • re-create the cloud watch stream
  • create a new api key and use it in the stream

none of these has worked and I’m still receiving the 403 error from the new relic api.

I found one message board thread Stream from AWS not working - #9 by vhenrique where the issue was with the new relic subscription and they were using an account with a trial period for the infra monitoring, when the trial ended they started receiving the 403 message. But that should not be the case with us, we have an active Infrastructure Essentials subscription.
Can you confirm that the AWS and especially ELB monitoring should work with this subscription?

The monitoring worked for a while, we had it configured for several regions (several kinesis streams) and all of them stopped working around same time without any changes in our end.

Hey there @sauli.ketola,

I hope you are well. I looked over your account and did notice that there was an AWS application in your infrastructure agent on the account. Is this the integration that was previously broken or are you still having troubles with one not working? I am more than happy to bring an engineer in again to look over this but I want to verify you didn’t resolve this yourself. If you did figure it out we would love to hear what you did to fix it!

If you are still needing assistance please reach out and we will continue to provide you with support. I hope to hear from you soon!

Yes, the problem is still present. On this page you will see the issue https://one.newrelic.com/infra/aws?account=3243985&state=0a9c6fc2-0927-48b0-3b7e-2a96e21d4810

The account that is used is “Platform Upgrade”

Hey there @sauli.ketola,

Thank you for letting us know you need further assistance. I am looping in one of our Infrastructure Agent experts to help get to the bottom of this with us. They will respond here once we have a solution for you. I appreciate your patience while we continue to provide support.

Please let us know if there is anything else we can help with as well. I hope you have a great day!

Hi @sauli.ketola. I reached out to our Account team to look over your account and it appears that several changes were made on May 5th, the date in which you wrote in.

Infrastructure Essentials doesn’t necessarily correlate to our cloud integration products however I don’t have much information about how our product categories work.

Since you tried troubleshooting and creating an additional api key I suspect those changes removed the functionality from your account and I would encourage you to reach out to your sales/account rep to verify if that is the case.

Regards,

Troy

First, this problem is not about infrastructure agents, those are not used at all in this case. This is about streaming the aws cloud watch metrics from the aws account to new relic.

And yes, I did create a new API key, but that did not cause the issue, that was something I tried to do to fix this and I did also switch back to the original API key.

The problem is that the new relic API is returning a 403 (Forbidden) response to the attempts to send the metrics to new relic. Could you please investigate why that happens?

Hey there @sauli.ketola,

I hope you are well.

I am having our engineers look at this further to verify why you are experiencing this error. If we find that it is due to a subscription error then I will get you connected with that team so they can help resolve it. Please let us know if you have any other questions in the mean time.

Hello sauli.ketola, I would reiterate the earlier call to contact your account team as it’s highly likely to be a problem related to usage limits and account status. The reason behind the 403 should be something your account team can readily identify.