Lambda Troubleshooting Framework - Troubleshooting Lambda Part 1

Lambda Troubleshooting Framework - Troubleshooting Lambda Part 1

Part 2 of this guide is here

We hope this Lambda Troubleshooting Framework will guide you to a quick and easy install. These are the steps that our support teams take to troubleshoot AWS Lambda monitoring.

Resources

The following documentation and repositories will come in handy on your serverless observability journey. You will find requirements, install steps, examples, and implementation inspiration.

Docs

Intro

Enable

UI and Data

Troubleshooting

Repos

Examples

References

Try Examples First

It’s a good idea to try out our example functions first using either the SAM CLI + AWS CLI, or our Terraform examples. You will learn many things that can then be applied to your existing function.

  1. Start with our docs. If using the New Relic Lambda CLI to do a quick working example, make sure to complete the What do you need section before linking your AWS account with your New Relic account.
  2. Follow an example, with one of our supported runtimes, in our Extension repo. For the SAM examples, make sure you have all the Prerequisites and then run the deploy.sh script for your function’s language.
  3. Subsequent deployment strategies may include use of our Continuous Deployment techniques.

Verify Requirements

See our compatibility and requirements doc.

  1. It is highly recommended to use the New Relic Lambda CLI in order to minimize the risk that something gets missed or mixed up.
  2. It is also recommended to install the AWS CLI in order to configure your AWS profile in the terminal, which our CLI will look for and use.
  3. To use our layer that includes our Extension, you will need to verify a few more requirements.
  4. Your AWS profile will also need some permissions to use the New Relic Lambda CLI.

Verify Components

To confirm everything is in place and working, verify:

  1. The right license key is being used in the secrets manager or NEW_RELIC_LICENSE_KEY environment variable on the function. It should match the license key from the New Relic account where the integration exists (find the integration and associated integration role in New Relic One -> Infrastructure -> AWS). Look for your linked account name and click the “Manage services” link to see the associated AWS role ARN.
  2. All required environment variables are present on the function for your runtime. There are environment variables for configuring both the Extension and the agents (Python, Node.js, Go).
  3. The AWS Runtime Settings for your function point to the handler in our layer.
    Python: newrelic_lambda_wrapper.handler
    Node: newrelic-lambda-wrapper.handler
  4. For Node.js and Python functions, the NEW_RELIC_LAMBDA_HANDLER environment variable points to your function’s actual handler.
  5. The integration in New Relic One -> Infrastructure -> AWS exists with the correct AWS account ID and integration role.
  6. Roles for integration, execution, and secretsmanager are all set in AWS, and the integration role additionally includes these base permissions required by all integrations.
  7. The logs and payload endpoints are correct for US or EU (configured with environment variables).
  8. The payload is getting generated and sent successfully by the Extension as seen in CloudWatch logs. Decoding it reveals event spans and error spans included from the function.

Verify with NRQL Queries

  1. Verify the integration is working to record invocation counts and other general metrics.
    SELECT latest(provider.invocations.Sum) FROM ServerlessSample WHERE dataSourceName = 'Lambda' FACET provider.functionName SINCE 1 day ago LIMIT 100
  2. Verify the Lambda service is not getting any ServiceAccessDenied errors. If it is, check that the right license key is being used in the AWS Secrets Manager (for use with the Extension) and/or on the newrelic-log-ingestion function. Also check that the role in AWS associated with the integration has at least these permissions.
    SELECT count(*) as 'Number of errors', max(timestamp) as 'Last seen' FROM IntegrationError WHERE providerAccountName = 'YOUR_LINKED_ACCOUNT_NAME' WHERE dataSourceName = 'Lambda' FACET method,error,awsErrorType,awsErrorCode Since 1 day ago limit 100
  3. Verify the request ID is showing up in New Relic with the following query:
    SELECT * FROM Span WHERE aws.requestId = 'YOUR_REQUEST_ID' SINCE 1 day ago
  4. Verify instrumentation of span events is making it to New Relic.
    SELECT * FROM AwsLambdaInvocation WHERE provider.functionName = 'YOUR_FUNCTION_NAME' SINCE 1 day ago
  5. Verify retention of span events.
    SELECT latest(insightsTotalRetentionInHours)/24 as 'Total (Days)', latest(insightsIncludedRetentionInHours)/24 as 'Included (Days)', latest(insightsTotalRetentionInHours)/24 - latest(insightsIncludedRetentionInHours)/24 as 'Paid (Days)' FROM NrDailyUsage WHERE insightsEventNamespace LIKE '%span%' FACET consumingAccountName,consumingAccountId,insightsEventNamespace SINCE 1 day ago
  6. Verify distributed tracing spans and associated entities.
    SELECT count(*),max(timestamp) FROM Span WHERE traceId = 'YOUR_TRACE_ID' FACET name,appName,provider.functionName,parent.app SINCE 1 week ago LIMIT MAX
  7. Verify you function is not hitting a memory limit (assuming logs are being sent).
    SELECT count(*) from Log where aws.logGroup = '/aws/lambda/YOUR_FUNCTION_NAME' AND message LIKE '%Max Memory Used: YOUR_MEMORY_LIMIT%' SINCE 1 week ago TIMESERIES MAX

Common Issues and Solutions

  1. Check that your function is using one of our supported runtimes.
  2. If the function experiences a timeout, verify that all async functions are returning and not leading to unhandled exceptions. Confirm you’ve completed the handler setup. Our Extension processes each payload synchronosly and must wait for either: the function to indicate to our wrapper that execution has completed, or the function’s timeout value to be reached.
  3. If needed increase the timeout value for your function. It might be necessary to increase the memory limit, which will also increase cpu resources available to your function for faster processing.
  4. Update the layer to the latest version.
  5. Use our examples as a starting point to see how our function handler wraps your actual handler (Java, .NET, Go).
  6. Peruse our .NET OpenTracing agent, Java Tracer, Java AWS, and Go agent repositories for examples and implementation inspiration.
  7. Explore our distributed tracing example.
  8. Confirm the roles:
    - Integration role: The integration role by default uses the AWS managed policy called ReadOnlyAccess and gets created when the integration is first set up with the New Relic Lambda CLI.
    - Execution role: The function needs an execution role which uses the AWS managed policy called AWSLambdaBasicExecutionRole by default, and a Secrets Manager policy with action secretsmanager:GetSecretValue if using the secret to store the license key, otherwise the environment variable NEW_RELIC_LICENSE_KEY can be set on the function. However, both should not be used simultaneously.
  9. Update the New Relic Lambda CLI
    pip3 install -U newrelic-lambda-cli
  10. Create the secret
    - Use the New Relic Lambda CLI to update the integration and newer versions of the CLI will install the license key secret into the AWS Secrets Manager, if it isn’t there already.
    newrelic-lambda integrations update
    - If the license key secret already exists in your AWS Secrets Manager, verify that it is correct. If not correct, delete it and rerun the above command to have the CLI create it again.
    - If manually creating a license key secret, the secret name can be anything but the secret key must be LicenseKey as it is what we look for when extracting the key value from the JSON map.
  11. Add the managed secrets policy to the function, which should be named something like arn:aws:secretsmanager:<AWS_REGION>:<AWS_ACCOUNT_ID>:secret:NEW_RELIC_LICENSE_KEY-abc123, or add it as an inline policy on the execution role.
  12. When switching from the legacy newrelic-log-ingestion function to the new Extension method for sending telemetry to New Relic, make sure to remove the subscription filter by following these log management instructions.

Note: Using our CLI to install the layer is not all that is required to fully instrument Java, .NET, and Go functions since the Extension layer added is only for sending us the invocation telemetry and does not contain an agent (only Node.js and Python layers contain agents).

Typical Things New Relic Support Will Ask About

Issue

State the issue as clearly as possible. Please include as much detail as possible including what troubleshooting steps you have already tried (and the results) as well as addressing the information in the unknowns section below.

Unknowns

  • Whether this is a new endeavor or an existing Lambda that was working before.
  • Whether there are any error messages in CloudWatch, CloudFormation, or the function’s invocation details.
  • The name of the integration at: New Relic One -> Infrastructure -> AWS
  • How the integration was initially set up.
  • Confirm that the AWS account, integration role, and AWS region are all correct.
  • The name, language, and runtime version of the function.
  • The memory limit and timeout value for the function.
  • How invocation telemetry is being shipped, i.e. The legacy CloudWatch path or the New Relic Lambda Extension.
  • The version of the layer added to the function, and if the newrelic-log-ingestion function is being used with the legacy CloudWatch path, the version of that function.
  • How the handlers have been set up.
  • Environment variables that are set on the function.
  • The process for deploying and invoking the function.
  • Does it show as “instrumented” in the New Relic One Entity Explorer?
  • Whether the dependency tree is missing any dependencies (especially for .NET and Java functions).
  • Whether the function has the needed execution role and secretsmanager policy applied.

Export SAM Config

The SAM config file can highlight issues with runtime config, missing environment variables, incorrect values, versions, etc.

  1. Navigate to your Lambda function in the AWS Console.
  2. In the top-right, click the Actions dropdown, select “Export function”.
  3. Click “Download AWS SAM file”.
  4. Attach the yaml file to the ticket.
  5. Repeat for the newrelic-log-ingestion function if you have one.

Extension debug logs

It’s good to check what the Extension is doing in detail.

NEW_RELIC_LAMBDA_EXTENSION_ENABLED: true
NEW_RELIC_EXTENSION_LOG_LEVEL: DEBUG
NEW_RELIC_EXTENSION_SEND_FUNCTION_LOGS: true

Export CloudWatch Logs

It is useful to collect CloudWatch logs for both the function’s invocation and the newrelic-log-ingestion function if being used. The function can typically be configured to output details on what is being instrumented, how the invocation is working, and the status of the request to send logs and payloads to New Relic.

  1. Invoke the functions in AWS Lambda.
  2. Navigate to CloudWatch → Logs → Insights in your AWS console.
  3. Select your function(s) and also the newrelic-log-ingestion stream if not using our Extension.
  4. Apply an appropriate Time Filter (depending on when the issue occurred), set the sort order to sort @timestamp asc, and the limit to the exact number of lines for the latest invocation, then click “Run query”. The first log line for the latest invocation should appear at the top.
  5. Under Export results select Copy table to clipboard (markdown).
  6. Paste the text into a new .txt or .log file and upload the file to this ticket.

Most logs exported from CloudWatch will contain a lot of whitespace, which makes the file much larger than it needs to be. Removing the whitespace before attaching to the New Relic Support ticket will make it more manageable. The following has been tested in Bash:

# delete whitespace and last character from each line
cat cw.log | sed -e 's/ *.$//' > cw-clean.log

CloudFormation Stacks

Get CloudFormation details

# list stacks
aws cloudformation list-stacks | grep StackName | grep NewRelic

# list change-sets if any
aws cloudformation list-change-sets --stack-name NewRelicLogIngestion
aws cloudformation list-change-sets --stack-name NewRelicLambdaIntegrationRole-<account-id>
aws cloudformation list-change-sets --stack-name NewRelicLicenseKeySecret

# get template summaries
aws cloudformation get-template-summary --stack-name NewRelicLogIngestion
aws cloudformation get-template-summary --stack-name NewRelicLambdaIntegrationRole-<account-id>
aws cloudformation get-template-summary --stack-name NewRelicLicenseKeySecret

# delete the problematic stacks
aws cloudformation delete-stack --stack-name NewRelicLogIngestion
aws cloudformation delete-stack --stack-name NewRelicLambdaIntegrationRole-<account-id>
aws cloudformation delete-stack --stack-name NewRelicLicenseKeySecret