Lambda Troubleshooting Framework - Troubleshooting Lambda

Part 1

(Part 2 in next comment)

Lambda Troubleshooting Framework - Troubleshooting Lambda

We hope this Lambda Troubleshooting Framework will guide you to a quick and easy install. These are the steps that our support teams take to troubleshoot AWS Lambda monitoring.

Resources

The following documentation and repositories will come in handy on your serverless observability journey. You will find requirements, install steps, examples, and implementation inspiration.

Docs

Intro

Enable

UI and Data

Troubleshooting

Repos

Examples

References

Try Examples First

It’s a good idea to try out our example functions first using either the SAM CLI + AWS CLI, or our Terraform examples. You will learn many things that can then be applied to your existing function.

  1. Start with our docs. If using the New Relic Lambda CLI to do a quick working example, make sure to complete the What do you need section before linking your AWS account with your New Relic account.
  2. Follow an example, with one of our supported runtimes, in our Extension repo. For the SAM examples, make sure you have all the Prerequisites and then run the deploy.sh script for your function’s language.
  3. Subsequent deployment strategies may include use of our Continuous Deployment techniques.

Verify Requirements

See our compatibility and requirements doc.

  1. It is highly recommended to use the New Relic Lambda CLI in order to minimize the risk that something gets missed or mixed up.
  2. It is also recommended to install the AWS CLI in order to configure your AWS profile in the terminal, which our CLI will look for and use.
  3. To use our layer that includes our Extension, you will need to verify a few more requirements.
  4. Your AWS profile will also need some permissions to use the New Relic Lambda CLI.

Verify Components

To confirm everything is in place and working, verify:

  1. The right license key is being used in the secrets manager or NEW_RELIC_LICENSE_KEY environment variable on the function. It should match the license key from the New Relic account where the integration exists (find the integration and associated integration role in New Relic One -> Infrastructure -> AWS). Look for your linked account name and click the “Manage services” link to see the associated AWS role ARN.
  2. All required environment variables are present on the function for your runtime. There are environment variables for configuring both the Extension and the agents (Python, Node.js, Go).
  3. The AWS Runtime Settings for your function point to the handler in our layer.
    Python: newrelic_lambda_wrapper.handler
    Node: newrelic-lambda-wrapper.handler
  4. For Node.js and Python functions, the NEW_RELIC_LAMBDA_HANDLER environment variable points to your function’s actual handler.
  5. The integration in New Relic One -> Infrastructure -> AWS exists with the correct AWS account ID and integration role.
  6. Roles for integration, execution, and secretsmanager are all set in AWS, and the integration role additionally includes these base permissions required by all integrations.
  7. The logs and payload endpoints are correct for US or EU (configured with environment variables).
  8. The payload is getting generated and sent successfully by the Extension as seen in CloudWatch logs. Decoding it reveals event spans and error spans included from the function.

Verify with NRQL Queries

  1. Verify the integration is working to record invocation counts and other general metrics.
    SELECT latest(provider.invocations.Sum) FROM ServerlessSample WHERE dataSourceName = 'Lambda' FACET provider.functionName SINCE 1 day ago LIMIT 100
  2. Verify the Lambda service is not getting any ServiceAccessDenied errors. If it is, check that the right license key is being used in the AWS Secrets Manager (for use with the Extension) and/or on the newrelic-log-ingestion function. Also check that the role in AWS associated with the integration has at least these permissions.
    SELECT count(*) as 'Number of errors', max(timestamp) as 'Last seen' FROM IntegrationError WHERE providerAccountName = 'YOUR_LINKED_ACCOUNT_NAME' WHERE dataSourceName = 'Lambda' FACET method,error,awsErrorType,awsErrorCode Since 1 day ago limit 100
  3. Verify the request ID is showing up in New Relic with the following query:
    SELECT * FROM Span WHERE aws.requestId = 'YOUR_REQUEST_ID' SINCE 1 day ago
  4. Verify instrumentation of span events is making it to New Relic.
    SELECT * FROM AwsLambdaInvocation WHERE provider.functionName = 'YOUR_FUNCTION_NAME' SINCE 1 day ago
  5. Verify retention of span events.
    SELECT latest(insightsTotalRetentionInHours)/24 as 'Total (Days)', latest(insightsIncludedRetentionInHours)/24 as 'Included (Days)', latest(insightsTotalRetentionInHours)/24 - latest(insightsIncludedRetentionInHours)/24 as 'Paid (Days)' FROM NrDailyUsage WHERE insightsEventNamespace LIKE '%span%' FACET consumingAccountName,consumingAccountId,insightsEventNamespace SINCE 1 day ago
  6. Verify distributed tracing spans and associated entities.
    SELECT count(*),max(timestamp) FROM Span WHERE traceId = 'YOUR_TRACE_ID' FACET name,appName,provider.functionName,parent.app SINCE 1 week ago LIMIT MAX
  7. Verify you function is not hitting a memory limit (assuming logs are being sent).
    SELECT count(*) from Log where aws.logGroup = '/aws/lambda/YOUR_FUNCTION_NAME' AND message LIKE '%Max Memory Used: YOUR_MEMORY_LIMIT%' SINCE 1 week ago TIMESERIES MAX

Common Issues and Solutions

  1. Check that your function is using one of our supported runtimes.
  2. If the function experiences a timeout, verify that all async functions are returning and not leading to unhandled exceptions. Confirm you’ve completed the handler setup. Our Extension processes each payload synchronosly and must wait for either: the function to indicate to our wrapper that execution has completed, or the function’s timeout value to be reached.
  3. If needed increase the timeout value for your function. It might be necessary to increase the memory limit, which will also increase cpu resources available to your function for faster processing.
  4. Update the layer to the latest version.
  5. Use our examples as a starting point to see how our function handler wraps your actual handler (Java, .NET, Go).
  6. Peruse our .NET OpenTracing agent, Java Tracer, Java AWS, and Go agent repositories for examples and implementation inspiration.
  7. Explore our distributed tracing example.
  8. Confirm the roles:
    - Integration role: The integration role by default uses the AWS managed policy called ReadOnlyAccess and gets created when the integration is first set up with the New Relic Lambda CLI.
    - Execution role: The function needs an execution role which uses the AWS managed policy called AWSLambdaBasicExecutionRole by default, and a Secrets Manager policy with action secretsmanager:GetSecretValue if using the secret to store the license key, otherwise the environment variable NEW_RELIC_LICENSE_KEY can be set on the function. However, both should not be used simultaneously.
  9. Update the New Relic Lambda CLI
    pip3 install -U newrelic-lambda-cli
  10. Create the secret
    - Use the New Relic Lambda CLI to update the integration and newer versions of the CLI will install the license key secret into the AWS Secrets Manager, if it isn’t there already.
    newrelic-lambda integrations update
    - If the license key secret already exists in your AWS Secrets Manager, verify that it is correct. If not correct, delete it and rerun the above command to have the CLI create it again.
    - If manually creating a license key secret, the secret name can be anything but the secret key must be LicenseKey as it is what we look for when extracting the key value from the JSON map.
  11. Add the managed secrets policy to the function, which should be named something like arn:aws:secretsmanager:<AWS_REGION>:<AWS_ACCOUNT_ID>:secret:NEW_RELIC_LICENSE_KEY-abc123, or add it as an inline policy on the execution role.
  12. When switching from the legacy newrelic-log-ingestion function to the new Extension method for sending telemetry to New Relic, make sure to remove the subscription filter by following these log management instructions.

Note: Using our CLI to install the layer is not all that is required to fully instrument Java, .NET, and Go functions since the Extension layer added is only for sending us the invocation telemetry and does not contain an agent (only Node.js and Python layers contain agents).

Typical Things New Relic Support Will Ask About

Issue

State the issue as clearly as possible. Please include as much detail as possible including what troubleshooting steps you have already tried (and the results) as well as addressing the information in the unknowns section below.

Unknowns

  • Whether this is a new endeavor or an existing Lambda that was working before.
  • Whether there are any error messages in CloudWatch, CloudFormation, or the function’s invocation details.
  • The name of the integration at: New Relic One -> Infrastructure -> AWS
  • How the integration was initially set up.
  • Confirm that the AWS account, integration role, and AWS region are all correct.
  • The name, language, and runtime version of the function.
  • The memory limit and timeout value for the function.
  • How invocation telemetry is being shipped, i.e. The legacy CloudWatch path or the New Relic Lambda Extension.
  • The version of the layer added to the function, and if the newrelic-log-ingestion function is being used with the legacy CloudWatch path, the version of that function.
  • How the handlers have been set up.
  • Environment variables that are set on the function.
  • The process for deploying and invoking the function.
  • Does it show as “instrumented” in the New Relic One Entity Explorer?
  • Whether the dependency tree is missing any dependencies (especially for .NET and Java functions).
  • Whether the function has the needed execution role and secretsmanager policy applied.

Export SAM Config

The SAM config file can highlight issues with runtime config, missing environment variables, incorrect values, versions, etc.

  1. Navigate to your Lambda function in the AWS Console.
  2. In the top-right, click the Actions dropdown, select “Export function”.
  3. Click “Download AWS SAM file”.
  4. Attach the yaml file to the ticket.
  5. Repeat for the newrelic-log-ingestion function if you have one.

Extension debug logs

It’s good to check what the Extension is doing in detail.

NEW_RELIC_LAMBDA_EXTENSION_ENABLED: true
NEW_RELIC_EXTENSION_LOG_LEVEL: DEBUG
NEW_RELIC_EXTENSION_SEND_FUNCTION_LOGS: true

Export CloudWatch Logs

It is useful to collect CloudWatch logs for both the function’s invocation and the newrelic-log-ingestion function if being used. The function can typically be configured to output details on what is being instrumented, how the invocation is working, and the status of the request to send logs and payloads to New Relic.

  1. Invoke the functions in AWS Lambda.
  2. Navigate to CloudWatch → Logs → Insights in your AWS console.
  3. Select your function(s) and also the newrelic-log-ingestion stream if not using our Extension.
  4. Apply an appropriate Time Filter (depending on when the issue occurred), set the sort order to sort @timestamp asc, and the limit to the exact number of lines for the latest invocation, then click “Run query”. The first log line for the latest invocation should appear at the top.
  5. Under Export results select Copy table to clipboard (markdown).
  6. Paste the text into a new .txt or .log file and upload the file to this ticket.

Most logs exported from CloudWatch will contain a lot of whitespace, which makes the file much larger than it needs to be. Removing the whitespace before attaching to the New Relic Support ticket will make it more manageable. The following has been tested in Bash:

# delete whitespace and last character from each line
cat cw.log | sed -e 's/ *.$//' > cw-clean.log

CloudFormation Stacks

Get CloudFormation details

# list stacks
aws cloudformation list-stacks | grep StackName | grep NewRelic

# list change-sets if any
aws cloudformation list-change-sets --stack-name NewRelicLogIngestion
aws cloudformation list-change-sets --stack-name NewRelicLambdaIntegrationRole-<account-id>
aws cloudformation list-change-sets --stack-name NewRelicLicenseKeySecret

# get template summaries
aws cloudformation get-template-summary --stack-name NewRelicLogIngestion
aws cloudformation get-template-summary --stack-name NewRelicLambdaIntegrationRole-<account-id>
aws cloudformation get-template-summary --stack-name NewRelicLicenseKeySecret

# delete the problematic stacks
aws cloudformation delete-stack --stack-name NewRelicLogIngestion
aws cloudformation delete-stack --stack-name NewRelicLambdaIntegrationRole-<account-id>
aws cloudformation delete-stack --stack-name NewRelicLicenseKeySecret

Part 2

(Part 1 in above comment)

Environment Variables

Log Ingestion Function

The following environment variables are required for sending invocation payloads to New Relic:

LICENSE_KEY: YOUR_LICENSE_KEY
INFRA_ENABLED: true
LOGGING_ENABLED: true

A subscription filter and optional pattern links logs output from your function invocations to the newrelic-log-ingestion function, which then forwards invocation payloads to our cloud-collector and optionally logs to the log-api.

Lambda Extension

The environment variables can be used to configure the operation of the Extension in the layer which wraps your function.

NEW_RELIC_LAMBDA_EXTENSION_ENABLED: true
NEW_RELIC_LICENSE_KEY: YOUR_LICENSE_KEY
NEW_RELIC_LICENSE_KEY_SECRET: YOUR_SECRET_ID
NEW_RELIC_TELEMETRY_ENDPOINT: TELEMETRY_ENDPOINT_URI
NEW_RELIC_LOG_ENDPOINT: LOG_ENDPOINT_URI
NEW_RELIC_HARVEST_RIPE_MILLIS: 7_000
NEW_RELIC_HARVEST_ROT_MILLIS: 12_000
NEW_RELIC_EXTENSION_SEND_FUNCTION_LOGS: true
NEW_RELIC_EXTENSION_LOG_LEVEL: DEBUG

The Extension can output more detailed logs to CloudWatch by setting the environment variable NEW_RELIC_EXTENSION_LOG_LEVEL: DEBUG. The Extension does not send CloudWatch logs to New Relic, it only sends function logs. So you will not see “extension logs” that start with [NR_EXT] in New Relic, but they can be viewed in CloudWatch to assist in troubleshooting the Extension.

Node.js and Python Handler

The NEW_RELIC_LAMBDA_HANDLER needs to point to your actual function handler. Without this, the handler in our layer won’t know which handler in your function to execute. The environment variables are documented in our legacy Lambda monitoring doc which details manual installation steps for Node.js and for Python.

NEW_RELIC_LAMBDA_HANDLER: YOUR_ACTUAL_HANDLER (required)

Distributed Tracing

Most languages need the following environment variables to be set on the function to enable distributed tracing.

NEW_RELIC_ACCOUNT_ID: YOUR_NEW_RELIC_ACCOUNT_ID
NEW_RELIC_TRUSTED_ACCOUNT_KEY: your parent accountId or NEW_RELIC_ACCOUNT_ID if no parent

Node.js and Python additionally need the following environment variable to enable distributed tracing:

NEW_RELIC_DISTRIBUTED_TRACING_ENABLED: true

Java additionally needs the following environment variable to enable distributed tracing:

NEW_RELIC_PRIMARY_APPLICATION_ID: YOUR_NEW_RELIC_ACCOUNT_ID

.NET additionally needs the following environment variable to enable distributed tracing for SQS or SNS:

NEW_RELIC_USE_DT_WRAPPER: true

Node.js

These Node.js environment variables can be set on the Lambda function in the AWS Console or via SAM templates, CloudFormation templates, Terraform templates, etc. The Node.js agent logs can be set to trace level to get more detail on how it is instrumenting your function in CloudWatch.

NEW_RELIC_APP_NAME: YOUR_APP_NAME
NEW_RELIC_NO_CONFIG_FILE: true
NEW_RELIC_LOG_ENABLED: true
NEW_RELIC_LOG_LEVEL: trace
NEW_RELIC_LOG: stdout

Note: The Node.js agent automatically detects if it is running in an AWS environment and so it is recommended to NOT set the NEW_RELIC_SERVERLESS_MODE_ENABLED environment variable.

Python

The following Python environment variables relate to Lambda functions. The Python agent logs can be set to debug level and recorded in CloudWatch by sending to stderr.

NEW_RELIC_SERVERLESS_MODE_ENABLED: true
NEW_RELIC_LOG_LEVEL: debug
NEW_RELIC_LOG: stderr

Note: The Python agent does not automatically detect if it is running in an AWS environment and so it IS recommended to set the NEW_RELIC_SERVERLESS_MODE_ENABLED environment variable.

Java

Optionally, to enable debug logging for our OpenTracing Java Wrapper, set the following environment variable on the function:

NEW_RELIC_DEBUG: true

Go

To enable agent logs to be sent to CloudWatch, our Go agent doesn’t require any environment variables. Instead, it uses a go agent logger.

Common Invocation Issues

Integration Errors

If any integration errors are happening, you will notice some missing invocations. You can check for missing invocations by looking up the aws.requestId for a particular invocation. The requestId will be visible in CloudWatch logs.

`SELECT * FROM Span WHERE aws.requestId = 'YOUR_REQUEST_ID' SINCE 1 day ago`

In our internal downstream consumers of Lambda telemetry, if we get a payload that has a problem (invalid JSON, etc), we drop the request and create an NrIntegrationError event. This allows you to see the type of error and uuid of the request as a way to debug what happened.

SELECT * FROM NrIntegrationError SINCE 1 week ago

For Infrastructure-related account errors for a particular integration name (linked account name), like ServiceAccessDenied:

SELECT count(*) as 'Number of errors', max(timestamp) as 'Last seen' FROM IntegrationError WHERE providerAccountName = 'YOUR_INTEGRATION_NAME' FACET dataSourceName, method, error SINCE 1 day ago

New Relic Support can additionally look up errors not exposed externally for issues like lambdaIntegrationDisabled. Please contact New Relic Support if you are missing invocations but don’t see anything reported in NrIntegrationError or IntegrationError and request a check of LambdaLogConsumerError in staging for AWS Lambda.

Lambda Timeouts

See Common Issues and Solutions.

  1. To narrow down the possible causes of the timeout, it’s a good idea to test with one of our example functions.

  2. If the timeout still happens, we can rule out an issue with the function’s code. In this case it would more likely be an environmental issue or an issue with the New Relic layer itself.

  3. Check the function’s timeout setting. The default value may be set too low for your function.

Import Module Errors

After following the manual instructions to add our layer to your function, you may encounter the following errors.

[ERROR] Runtime.ImportModuleError: Unable to import module 'newrelic_lambda_wrapper': No module named 'newrelic_lambda_wrapper'
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'aiohttp'

The MODULE_NOT_FOUND error will occur if the layer has not been added to your function.

Console.log

Depending on the language of your function, there are different ways that you can log output to CloudWatch.

  • In the case of a timeout, it would be good to know precisely where the timeout is occuring in the invocation lifecycle.

  • In the case of excessive overhead after adding the layer, logging resource usage along the way can help determine what is contributing to the overhead.

Logging to console the start and end of each call within the function can provide more context.

Node.js

// Option 1: catch uncaught exception and log transaction state
process.on("uncaughtException", function (err) {
  console.log(err)
  console.log(newrelic.getTransaction())
  newrelic.noticeError(err)
})

// Option 2: with more detail about the origin
process.on("uncaughtException", (err, origin) => {
  fs.writeSync(process.stderr.fd, `Caught exception: ${err}\n` + `Exception origin: ${origin}`)
  console.log(newrelic.getTransaction())
  newrelic.noticeError(err, origin)
})

// Option 3: with detail on resources
const used = process.memoryUsage();
for (let key in used) {
  console.log(`function name: ${myFunction.name}, key: ${key}, value: ${Math.round(used[key] / 1024 / 1024 * 100) / 100} MB`);
}

.NET

using System.Diagnostics;
StackTrace stackTrace = new StackTrace();
Console.WriteLine(stackTrace.GetFrame(1).GetMethod().Name);

What To Do With CloudWatch Logs

CloudWatch logs can give us an idea of how the invocation is working. Of interest are log lines about invocation errors like duration and timeouts, memory usage, function errors, agent logs that show what is being instrumented, and the status of the requests to send function logs and the invocation payload to New Relic.

Verify Common Log Entries

Extension method

Our AWS Lambda Extension outputs log lines like this for cold starts at DEBUG level:

2021-02-05T13:15:11.309-08:00 START RequestId: 2c6ce1cf-c1f0-4591-b717-774acffabd85 Version: $LATEST
2021-02-05T13:15:11.354-08:00 LOGS Name: cloudwatch_lambda_agent State: Subscribed Types: [platform]
2021-02-05T13:15:11.543-08:00 [NR_EXT] New Relic Lambda Extension starting up
2021-02-05T13:15:11.544-08:00 [NR_EXT] Starting log server.
2021-02-05T13:15:11.545-08:00 LOGS Name: newrelic-lambda-extension State: Subscribed Types: [platform]
2021-02-05T13:15:11.545-08:00 EXTENSION Name: cloudwatch_lambda_agent State: Ready Events: [INVOKE,SHUTDOWN]
2021-02-05T13:15:11.545-08:00 EXTENSION Name: newrelic-lambda-extension State: Ready Events: [INVOKE,SHUTDOWN]
2021-02-05T13:15:11.547-08:00 hello world!
2021-02-05T13:15:12.098-08:00 [NR_EXT] Sent 1/1 New Relic payload batches with 1 log events successfully in 476.101ms (475ms to transmit 1.0kB).
2021-02-05T13:15:12.099-08:00 END RequestId: 2c6ce1cf-c1f0-4591-b717-774acffabd85
2021-02-05T13:15:12.099-08:00 REPORT RequestId: 2c6ce1cf-c1f0-4591-b717-774acffabd85 Duration: 552.75 ms Billed Duration: 917 ms Memory Size: 128 MB Max Memory Used: 77 MB Init Duration: 363.83 ms

And like this for subsequent invocations:

2021-02-05T13:15:12.920-08:00 START RequestId: 7c9c9601-8756-4010-82b5-9cb70025a85a Version: $LATEST
2021-02-05T13:15:12.924-08:00 hello world!
2021-02-05T13:15:13.043-08:00 END RequestId: 7c9c9601-8756-4010-82b5-9cb70025a85a
2021-02-05T13:15:13.043-08:00 REPORT RequestId: 7c9c9601-8756-4010-82b5-9cb70025a85a Duration: 119.71 ms Billed Duration: 120 ms Memory Size: 128 MB Max Memory Used: 79 MB

Legacy CloudWatch method

The newrelic-log-ingestion function has log lines like:

2021-02-05T12:35:38.508-08:00 START RequestId: a3eb03ab-689d-4f55-b631-cc57d87fd186 Version: $LATEST
2021-02-05T12:35:38.787-08:00 Log entry sent. Response code: 202. url: https://cloud-collector.newrelic.com/aws/v1
2021-02-05T12:35:38.806-08:00 Log entry sent. Response code: 202. url: https://log-api.newrelic.com/log/v1
2021-02-05T12:35:38.820-08:00 END RequestId: a3eb03ab-689d-4f55-b631-cc57d87fd186

The function logs in CloudWatch will have a line called NR_LAMBDA_MONITORING which includes the base64 encoded payload and looks like:

| 2021-02-17 00:14:58.867 | START RequestId: 1234abcd-1234-abcd-5678-123456789abc Version: $LATEST
| 2021-02-17 00:14:58.949 | [2,"NR_LAMBDA_MONITORING",{"protocol_version":17,"execution_environment":"AWS_Lambda_python3.6","agent_version":"5.16.1.146","arn":"arn:aws:lambda:us-west-2:123456789:function:the-function-name","function_version":"7"},"abcd1234"]
| 2021-02-17 00:14:59.016 | END RequestId: 1234abcd-1234-abcd-5678-123456789abc
| 2021-02-17 00:14:59.016 | REPORT RequestId: 1234abcd-1234-abcd-5678-123456789abc Duration: 147.07 ms Billed Duration: 148 ms Memory Size: 1280 MB Max Memory Used: 219 MB

See how we’ve implemented the collection of the NR_LAMBDA_MONITORING payload for each function language.

Decode the Payload

Sometimes it can be useful to see what is included in the payload sent to New Relic. The payload can be accessed in CloudWatch and details the instrumentation of your function, including: metadata, metric_data, transaction_sample_data, span_event_data, analytic_event_data, errors, traceIds, etc.

When sending via CloudWatch and with NEW_RELIC_LAMBDA_EXTENSION_ENABLED: false, a NR_LAMBDA_MONITORING payload will be seen in CloudWatch logs for the function after each invocation. The value can be decoded for inspection with:

# Mac only with pbpaste
pbpaste | base64 -D | gunzip | jq . > payload.json; code payload.json

# Linux base64 uses -d
pbpaste | base64 -d | gunzip | jq . > payload.json; code payload.json

When using the Extension to send Agent telemetry bytes, viewable with the environment variable NEW_RELIC_EXTENSION_LOG_LEVEL: DEBUG, the telemetry bytes value from CloudWatch logs can be decoded with:

# Mac
cnt=$(pbpaste | base64 -D | jq length); pbpaste | base64 -D | jq -r .[$cnt-1] | base64 -D | gunzip | jq . > /tmp/payload.json; pbpaste | base64 -D | jq ".[$cnt-1] = $(cat /tmp/payload.json)" > payload.json; code payload.json

# Linux
cnt=$(pbpaste | base64 -d | jq length); pbpaste | base64 -d | jq -r .[$cnt-1] | base64 -d | gunzip | jq . > /tmp/payload.json; pbpaste | base64 -d | jq ".[$cnt-1] = $(cat /tmp/payload.json)" > payload.json; code payload.json

Troubleshooting Distributed Tracing

Explore our distributed tracing example.

Verify Spans and Parents

If no root span exists, the trace will be orphaned. In other words, if every entry span has a parent, then no root span exists and it will be considered an orphaned trace.

To find spans for your trace, the best way is to use a NRQL query. The service map can also be used, but it’s important to note that NRQL queries will be the most accurate way to see each span and the entities involved in a trace. Queries can then be added to a dashboard to see more than a service map can provide. Everything at New Relic revolves around NRDB, so NRQL is your friend.

Visualize DT Spans with NRQL

If you have the traceId, this will find all spans and associated entities, assuming there are APM apps and Lambda functions included in the trace. Sort by timestamp to see how the trace is being passed and received.

SELECT count(*),latest(nr.entityType),latest(aws.lambda.eventSource.eventType),latest(name),max(timestamp) FROM Span WHERE traceId = '123456789abcdef' FACET entity.name,parent.app SINCE 1 week ago LIMIT MAX

Headers

Determine what attributes are in the header:

console.log(event.request.headers)

The request event is going to show headers that get passed in from an upstream entity.

Diagraming the Flow of Spans

For distributed tracing, it’s often helpful to diagram what’s happening and where root spans should originate. It can help to understand how many entities/functions are involved, what calls what, and if a certain thing needs special configuration, like API Gateway to allow custom headers to pass through. This example provides a format that works well in support tickets.

SQS
-> ALB
-> Lambda
    name1,
    Java OpenTracing implements TracingRequestStreamHandler,
    runtime version x,
    using Extension layer,
    shows as instrumented in New Relic UI
-> API Gateway
-> Lambda
    name2,
    Java OpenTracing implements StreamLambdaTracing,
    runtime version y,
    using NR_LAMBDA_MONITORING payload,
    not showing as instrumented in New Relic UI, just basic CloudWatch metrics

Serverless Plugin Errors

Invalid Credentials

This is not a New Relic error, but you may encounter it when using our plugin upon sls deploy. It happens when the Serverless Framework can’t find your AWS profile.

  ServerlessError: The security token included in the request is invalid.
      at ..\npm\node_modules\serverless\lib\plugins\aws\provider.js:1481:27
      at processTicksAndRejections (internal/process/task_queues.js:93:5)

By default, the Serverless Framework uses the credentials configured in the “default” profile. If you have a custom profile defined, please make sure to let the Serverless Framework know to use it.

sls config credentials --profile YOUR_PROFILE_NAME --provider aws --key YOUR_KEY --secret YOUR_SECRET
export AWS_PROFILE="YOUR_PROFILE_NAME"

Hitting the EU Endpoint with a US License Key Bug

When performing the serverless deployment, you might see this error:

$ npx serverless@1.34.0 deploy -v --stage=dev --force
Serverless: Plugins: ["serverless-newrelic-lambda-layers"]
Serverless: Adding NewRelic layer to image-usages-forEdition
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Fetch Error --------------------------------------------
invalid json response body at https://api.eu.newrelic.com/graphql reason: Unexpected token < in JSON at position 0

This bug has been fixed in Serverless Framework v0.4.2.

No Integration Found

------------------------------------------------------------
Serverless Deploy Errors:
Serverless: No New Relic AWS Lambda integration found for this New Relic linked account and aws account.
Serverless: Enabling New Relic integration for linked account: New Relic Lambda Integration - 123456 and aws account: ************.
Serverless: Error while creating the New Relic AWS Lambda cloud integration: Error: data.cloudLinkAccount missing in response.
------------------------------------------------------------

Errors in "newrelic-log-ingest lambda":

[NR_EXT] Failed to retrieve license key AccessDeniedException: User: arn:aws:sts::123456789:assumed-role/newrelic-log-ingestion-production-eu-west-1-lambdaRole/newrelic-log-ingestion is not authorized to perform: secretsmanager:GetSecretValue on resource: arn:aws:secretsmanager:eu-west-1:123456789:secret:NEW_RELIC_LICENSE_KEY-ABC123

Need to run the integration install to add the AWS Secrets Manager role and secret.

newrelic-lambda integrations install -a NR_ACCOUNT_ID -r REGION -n LINKED_ACCOUNT_NAME -k YOUR_USER_API_KEY --no-aws-permissions-check

CloudFormation template to GetSecretValue.

          - Effect: Allow
            Action:
              - 'secretsmanager:GetSecretValue'
            Resource: !Ref LicenseKeySecret