Lambda Troubleshooting Framework- General Knowledge Part 1

Lambda Troubleshooting Framework - General Knowledge for Troubleshooting Part 1

Part 2 of this guide is here

The goal here is to help identify where the focus of troubleshooting efforts should begin.

Lifecycle of an Invocation

The lifecycle of our Extension includes several phases: Extension init, Runtime init, Function init, Invoke, and Shutdown.

Some of the phases are async and a couple are going to spend a small amount of time blocking other things with synchronous calls. Specifically, from one of our Lambda Monitoring Engineers:

There are two synchronous phases to the extension’s execution, where it actually blocks the progress of the invocation. We spend less than 1ms receiving telemetry from the agent at the end of the invocation, and storing it in a buffer.

Every 7 seconds we send accumulated telemetry by making an HTTP call to the NR ingest service. This takes on the order of 70ms in us-east-1. In AWS regions further from the ingest service, it may take longer.

If logging is enabled, we use a somewhat different approach: we send logs as they arrive, but we don’t block the invocation lifecycle for it.

All of that to say, the Extension adds a few milliseconds to the invocation. However, that pales in comparison to the time it takes for the actual instrumentation of code to happen, which varies depending on what is being instrumented, how much code there is, what the function is doing, and whether objects like database clients are initialized on a global scope or a nested scope. Nested inits will take much longer to instrument on each and every invocation versus just once for a cold start on global scope.

It’s important to note:

The extension process is just a transport for the telemetry and (optionally) logs.

The extension doesn’t instrument your lambda. The agent in our layer (or the OpenTracing SDK) does that, and it’s way more work than just sending telemetry.

Details on harvests

  • The first invocation is sent quickly but without some metrics.
  • Subsequent payloads are sent in a batch every 7 seconds.
  • Functions that get invoked in a burst will still send telemetry every 7 seconds, but if the burst ends before 7 seconds, the subsequent invocations will not get sent until AWS shuts down the container, so best to send another invocation after 7 seconds to get the batch to send.
  • Invocation time: time between sending payload and receiving a response.
  • There are environment variables to fine tune the batch timing per function.
# default ripe harvest: every 7 sec

# default rot harvest: every 14 sec

Architecture and Design

See our architecture doc.

“Advanced Lambda Monitoring” is actually two products, one built on top of the other. The “base” is simply the Infrastructure AWS integration that gets CloudWatch metrics. The “full package” involves adding APM Agent instrumentation code to Lambda functions, and then one of two ways can be chosen to ship invocation logs and payloads to us:

  1. The New Relic Lambda Extension built into our layers can be used to bypass CloudWatch and sends function logs and invocation payloads to us directly…
  2. Legacy CloudWatch method: Subscribe those functions to our newrelic-log-ingestion function that packages everything together and sends it up to New Relic.

Our endpoints for function logs and invocation payloads are:

InfraEndpointEU -
InfraEndpointUS -
LogEndpointEU   -
LogEndpointUS   -

Whichever method is used to send us invocation telemetry, instrumentation of the function code is done the same, via our layers, and can be implemented in several ways depending on the language.

NRQL Types

See our docs for a description of New Relic’s backend.


The following query will show a list of function names from the linked AWS account and a count of invocations. If names are showing, the integration is working.

SELECT latest(provider.invocations.Sum) FROM ServerlessSample WHERE dataSourceName = 'Lambda' FACET provider.functionName SINCE 1 day ago LIMIT 100


See a description of Lambda monitoring data types in our docs.

The following query deternines whether a function is listed as instrumented or not in the Lambda UI:

SELECT count(*) FROM AwsLambdaInvocation WHERE entityGuid = 'YOUR_ENTITY_GUID' SINCE 30 days ago LIMIT 1

In order for AwsLambdaInvocation to be populated with data, we look for one of two log lines.

Here’s how to decode the payloads to see what is getting instrumented prior to sending to New Relic.


AwsLambdaInvocationError events provide detail regarding an error that occurred during an invocation. It will provide more detail than a trace would provide. We display this information about invocation errors in the Lambda nerdlet in the New Relic One Explorer.

One of our Lambda engineers describes it this way:

There are cases where a node agent may ignore an error, noticeError() overrides this, thus forcing the agent to gather details about a specific error.

Here is an example NRQL query to get error details:

SELECT * FROM AwsLambdaInvocationError SINCE 1 week ago WHERE aws.lambda.arn = 'YOUR_FUNCTION_ARN'


Span events and metrics will also get recorded. This query shows a count of span events by AWS integration (linked account name).

SELECT count(*) FROM Span FACET providerAccountName SINCE 1 week ago

The Span type is especially useful for seeing distributed traces. The following query will highlight any entities connected by the trace, like an app with an APM agent installed, a browser app with the Browser agent, and one or more Lambda functions. If no parent app is listed, that is the root span. Sort by timestamp to see the flow of the trace.

SELECT count(*),latest(name),max(timestamp) FROM Span  WHERE traceId = 'YOUR_TRACE_ID' FACET appName,provider.functionName, SINCE 1 week ago LIMIT MAX

This query will count spans between a browser app with the Browser agent installed and an instrumented Lambda function. It can be helpful to see if an approximately equal number of span events are making it from one entity to another.

 SELECT filter(count(*), where = 'YOUR_BROWSER_APP' as 'Browser'), filter(count(*), where provider.functionName = 'YOUR_LAMBDA_FUNCTION' as 'Lambda') FROM Span SINCE 1 week ago

One to Many

The typical design is to use one New Relic account linked with many AWS accounts, each with their own name. The name can be specified with the following --linked-account-name or -n parameter in the following cli command:

newrelic-lambda integrations install -a NR_ACCOUNT_ID -r REGION -n LINKED_ACCOUNT_NAME -k YOUR_USER_API_KEY --no-aws-permissions-check

The above command will also save your New Relic license key in the AWS Secrets Manager unless the --disable-license-key-secret parameter is specified.

Many to One

It is possible use many New Relic accounts with one AWS account. However, our New Relic Lambda CLI has not yet been designed for use with this scenario. The following manual steps outline this process:

Setting up the Integrations

  1. The New Relic accounts should be related in some way: either one’s a main account and the other’s a sub-account, or the two New Relic accounts are sub-accounts of another main account. Note that when setting up each Integration, you can use the same integration role with managed policy ReadOnlyAccess if one already exists in your AWS account, at which point the trusted identity for the role will show both New Relic Account ID’s there.
  2. The different Lambdas would authenticate with the accounts’ different license keys. It might be cleanest to do this with the NEW_RELIC_LICENSE_KEY environment variable rather than using the Secrets Manager, but either method is possible.

If using the AWS Secrets Manager to store your New Relic license key

  1. You’ll need a different secret name and ID for each New Relic account. You’ll also need a dedicated secrets access policy, which would need to be attached to each function’s execution role. Here are the parameters for creating the license key secret. Here is the IAM access policy needed by the function to retrieve the secret.
  2. Set an environment variable on each Lambda function to point it to your specific secret id: NEW_RELIC_LICENSE_KEY_SECRET: YOUR_SECRET_ID

Using the Legacy CloudWatch Method to Send Payloads

If not using our Extension method to send payloads to New Relic, our legacy CloudWatch method requires special consideration be given to the newrelic-log-ingestion Lambda function.

  1. The Extension can be disabled on each of your Lambda functions with: NEW_RELIC_LAMBDA_EXTENSION_ENABLED: false.
  2. One newrelic-log-ingestion function will be needed for each New Relic account linked. Each newrelic-log-ingestion function would be assigned the NEW_RELIC_LICENSE_KEY for the account you want to point it to. See this doc for further details.
  3. Similarly, when setting up the function’s log subscription filter, you’d specify which log ingestion function gets triggered on a log event.
  4. The New Relic Lambda CLI and AWS Deployment App won’t work to add multiple newrelic-log-ingestion functions to one AWS account since they check for an existing function and quit if one is found.

To deploy multiple functions you could do one of the following things:

Request-Response Latency

The Extension’s send blocks delivery of the response.

AWS Roles and Policies

Required by All Integrations

All integrations need at least these permissions on the integration role in the linked AWS account and region.



Config API


Resource Tagging API


Integration Role

The very broad ReadOnlyAccess policy is the default “AWS managed policy” we use when no other policy is specified.

The integration role is specified with the --integration-arn parameter when using the New Relic Lambda CLI. For example:

newrelic-lambda integrations install --nr-account-id <NR_ACCOUNT_ID> --nr-api-key <KEY> --integration-arn arn:aws:iam::<AWS_ACCOUNT_ID>:role/NewRelicLambdaIntegrationRole_<NR_ACCOUNT_ID>

If specifying your own integration role, these are the bare minimum permissions:

    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Action": [
            "Resource": "*"

Specifying both the --integration-arn and --role-name parameters on the newrelic-lambda integrations install command will allow an AWS user without CAPABILITY_IAM to complete the integration.

The role needs a specific trust relationship and condition. Under the “Trust relationships” tab:

  • add account “754728514883” as a trusted entity
  • add your New Relic account ID as a “StringEquals sts:ExternalId YOUR_NEW_RELIC_ID” condition

Note: the permissions on the integration role are not the same as the function’s execution role. For the purposes of the integrations install command, the --role-name option supplies the CLI with the execution role for use by the newrelic-log-ingestion function. It is not the same as the integration role which is only used for the integration between New Relic and the AWS account. The same execution role can be used by many Lambda functions and is typically attached to the function automatically by AWS when the function is created.

Execution Role


The AWSLambdaBasicExecutionRole policy is the default “AWS managed policy” we use when no other policy is specified.

The execution role is specified with the --role-name parameter when using the New Relic Lambda CLI with the integrations install command as described in the Integration Role section. The execution role can be used for multiple functions, including the newrelic-log-ingestion function.

If specifying your own execution role, these are the bare minimum permissions:

    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Action": [
            "Resource": "*"

Secrets Manager Role


The secrets manager role is created by default unless the --disable-license-key-secret parameter is specified, in which case the NEW_RELIC_LICENSE_KEY env variable should be set on the function. The AWS Secrets Manager needs these permissions in a role attached to the function:

    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Action": [
            "Resource": "arn:aws:secretsmanager:<AWS_REGION>:<AWS_ACCOUNT_ID>:secret:NEW_RELIC_LICENSE_KEY-<RANDOM>"


Our layers include the Extension for collecting and sending invocation payloads, function and platform logs to New Relic. The Node.js and Python layers additionally include agents for instrumenting and handling distributed tracing headers.

See our docs for more.


We publish our layers to the following regions: af-south-1, ap-east-1, ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-central-1, eu-north-1, eu-south-1, eu-west-1, eu-west-2, eu-west-3, me-south-1, sa-east-1, us-east-1, us-east-2, us-west-1, us-west-2.


Note: We don’t yet publish layers to AWS Gov Cloud regions, so a layer arn isn’t available there.

If you need to add our layer to your AWS Gov Cloud region, you’ll need to manually download our layer as a zip file, then publish the zip to your gov region. This has to be done once per region.

Here’s how to do it:

  1. Download our layer zip for your function’s runtime, for example Python 3.7. Make sure your aws profile is configured to match the region of the layer arn.
aws lambda get-layer-version --layer-name arn:aws:lambda:us-west-2:451483290750:layer:NewRelicPython37 --version-number 35 | jq -r .Content.Location | xargs curl -o
  1. Publish our zipped layer to your region.
aws lambda publish-layer-version --layer-name NewRelicPython37 --description "New Relic Lambda Python 3.7 layer" --compatible-runtimes python3.7 --zip-file fileb://
  1. You can then add the layer to your function.


  • select it from the list of runtime compatible layers, or
  • copy the layer arn that was output to your console from the publish command

Layer Versions

See our docs for a description of our layers. You can find the latest layer versions here.

Java Layer

Our new Java layer can be used with Java functions to provide auto instrumentation.

Extension Layer

The Extension layer can be used with .NET and Go functions. It doesn’t include any instrumentation logic, it includes the Extension for sending invocation telemetry to New Relic (bypassing CloudWatch).

If the function experiences a timeout, verify that all async functions are returning and not leading to unhandled exceptions. Confirm you’ve completed our handler setup. Our Extension processes each payload synchronosly and must wait for either:

  • function execution to complete
  • function timeout

OpenTracing should be used to implement instrumentation logic for .NET functions utilizing our Extension layer, otherwise a timeout will occur. See our OpenTracing Agent for .NET as a reference.

Node.js and Python Layers

Note: Node 8 and Python 3.6 do not support the AWS Lambda Extensions API, so will default to using our legacy newrelic-log-ingestion function to send invocation payloads and logs.

The Node.js and Python layers include the Extension as well as instrumentation logic in the form of the Node.js and Python New Relic Agents.


Handlers in our Node.js and Python layers: In the AWS Console, the handler set in the function’s Runtime settings should be set to the handler in our layer. This kicks things off in the layer first. The layer then looks for an environment variable NEW_RELIC_LAMBDA_HANDLER which points to the function’s actual handler. In this way the layer starts first in order to instrument the function.