(Part 2 in next comment)
Lambda Troubleshooting Framework - Troubleshooting Lambda
- Try Examples First
- Verify Requirements
- Verify Components
- Verify with NRQL Queries
- Common Issues and Solutions
- Typical Things New Relic Support Will Ask About
- Environment Variables
- Common Invocation Issues
- What To Do With CloudWatch Logs
- Troubleshooting Distributed Tracing
- Serverless Plugin Errors
We hope this Lambda Troubleshooting Framework will guide you to a quick and easy install. These are the steps that our support teams take to troubleshoot AWS Lambda monitoring.
The following documentation and repositories will come in handy on your serverless observability journey. You will find requirements, install steps, examples, and implementation inspiration.
- Account Linking
- Instrumentation Examples
- Instrument Your Functions
- Configure Lambda Monitoring
- Legacy Manual Instrumentation
- Update Serverless Monitoring
UI and Data
- New Relic Lambda CLI
- AWS Log Ingestion (Legacy)
- Layers (Node.js and Python)
- Serverless plugin (Node.js and Python)
- New Relic .NET OpenTracing agent
- New Relic Java Tracer
- New Relic Java AWS and New Relic Java AWS OpenTracing
- New Relic Go agent
- Node.js agent
- Python agent
- Serverless Plugin
- Distributed Tracing
- Java Distributed Tracing
- Java Instrumentation with Errors
- Extension config
- CloudFormation Templates and additional templates
- Our Endpoints
- newrelic-log-ingestion function
- Default log subscription pattern
- Node.js agent API
- Python agent API
- Go agent API
Try Examples First
It’s a good idea to try out our example functions first using either the SAM CLI + AWS CLI, or our Terraform examples. You will learn many things that can then be applied to your existing function.
- Start with our docs. If using the New Relic Lambda CLI to do a quick working example, make sure to complete the What do you need section before linking your AWS account with your New Relic account.
- Follow an example, with one of our supported runtimes, in our Extension repo. For the SAM examples, make sure you have all the Prerequisites and then run the deploy.sh script for your function’s language.
- Subsequent deployment strategies may include use of our Continuous Deployment techniques.
See our compatibility and requirements doc.
- It is highly recommended to use the New Relic Lambda CLI in order to minimize the risk that something gets missed or mixed up.
- It is also recommended to install the AWS CLI in order to configure your AWS profile in the terminal, which our CLI will look for and use.
- To use our layer that includes our Extension, you will need to verify a few more requirements.
- Your AWS profile will also need some permissions to use the New Relic Lambda CLI.
To confirm everything is in place and working, verify:
- The right license key is being used in the secrets manager or
NEW_RELIC_LICENSE_KEYenvironment variable on the function. It should match the license key from the New Relic account where the integration exists (find the integration and associated integration role in
New Relic One -> Infrastructure -> AWS). Look for your linked account name and click the “Manage services” link to see the associated AWS role ARN.
- All required environment variables are present on the function for your runtime. There are environment variables for configuring both the Extension and the agents (Python, Node.js, Go).
- The AWS Runtime Settings for your function point to the handler in our layer.
- For Node.js and Python functions, the
NEW_RELIC_LAMBDA_HANDLERenvironment variable points to your function’s actual handler.
- The integration in
New Relic One -> Infrastructure -> AWSexists with the correct AWS account ID and integration role.
- Roles for integration, execution, and secretsmanager are all set in AWS, and the integration role additionally includes these base permissions required by all integrations.
- The logs and payload endpoints are correct for US or EU (configured with environment variables).
- The payload is getting generated and sent successfully by the Extension as seen in CloudWatch logs. Decoding it reveals event spans and error spans included from the function.
Verify with NRQL Queries
- Verify the integration is working to record invocation counts and other general metrics.
SELECT latest(provider.invocations.Sum) FROM ServerlessSample WHERE dataSourceName = 'Lambda' FACET provider.functionName SINCE 1 day ago LIMIT 100
- Verify the Lambda service is not getting any
ServiceAccessDeniederrors. If it is, check that the right license key is being used in the AWS Secrets Manager (for use with the Extension) and/or on the
newrelic-log-ingestionfunction. Also check that the role in AWS associated with the integration has at least these permissions.
SELECT count(*) as 'Number of errors', max(timestamp) as 'Last seen' FROM IntegrationError WHERE providerAccountName = 'YOUR_LINKED_ACCOUNT_NAME' WHERE dataSourceName = 'Lambda' FACET method,error,awsErrorType,awsErrorCode Since 1 day ago limit 100
- Verify the request ID is showing up in New Relic with the following query:
SELECT * FROM Span WHERE aws.requestId = 'YOUR_REQUEST_ID' SINCE 1 day ago
- Verify instrumentation of span events is making it to New Relic.
SELECT * FROM AwsLambdaInvocation WHERE provider.functionName = 'YOUR_FUNCTION_NAME' SINCE 1 day ago
- Verify retention of span events.
SELECT latest(insightsTotalRetentionInHours)/24 as 'Total (Days)', latest(insightsIncludedRetentionInHours)/24 as 'Included (Days)', latest(insightsTotalRetentionInHours)/24 - latest(insightsIncludedRetentionInHours)/24 as 'Paid (Days)' FROM NrDailyUsage WHERE insightsEventNamespace LIKE '%span%' FACET consumingAccountName,consumingAccountId,insightsEventNamespace SINCE 1 day ago
- Verify distributed tracing spans and associated entities.
SELECT count(*),max(timestamp) FROM Span WHERE traceId = 'YOUR_TRACE_ID' FACET name,appName,provider.functionName,parent.app SINCE 1 week ago LIMIT MAX
- Verify you function is not hitting a memory limit (assuming logs are being sent).
SELECT count(*) from Log where aws.logGroup = '/aws/lambda/YOUR_FUNCTION_NAME' AND message LIKE '%Max Memory Used: YOUR_MEMORY_LIMIT%' SINCE 1 week ago TIMESERIES MAX
Common Issues and Solutions
- Check that your function is using one of our supported runtimes.
- If the function experiences a timeout, verify that all async functions are returning and not leading to unhandled exceptions. Confirm you’ve completed the handler setup. Our Extension processes each payload synchronosly and must wait for either: the function to indicate to our wrapper that execution has completed, or the function’s timeout value to be reached.
- If needed increase the timeout value for your function. It might be necessary to increase the memory limit, which will also increase cpu resources available to your function for faster processing.
- Update the layer to the latest version.
- Use our examples as a starting point to see how our function handler wraps your actual handler (Java, .NET, Go).
- Peruse our .NET OpenTracing agent, Java Tracer, Java AWS, and Go agent repositories for examples and implementation inspiration.
- Explore our distributed tracing example.
- Confirm the roles:
- Integration role: The integration role by default uses the AWS managed policy called
ReadOnlyAccessand gets created when the integration is first set up with the New Relic Lambda CLI.
- Execution role: The function needs an execution role which uses the AWS managed policy called
AWSLambdaBasicExecutionRoleby default, and a Secrets Manager policy with action
secretsmanager:GetSecretValueif using the secret to store the license key, otherwise the environment variable
NEW_RELIC_LICENSE_KEYcan be set on the function. However, both should not be used simultaneously.
- Update the New Relic Lambda CLI
pip3 install -U newrelic-lambda-cli
- Create the secret
- Use the New Relic Lambda CLI to update the integration and newer versions of the CLI will install the license key secret into the AWS Secrets Manager, if it isn’t there already.
newrelic-lambda integrations update
- If the license key secret already exists in your AWS Secrets Manager, verify that it is correct. If not correct, delete it and rerun the above command to have the CLI create it again.
- If manually creating a license key secret, the secret name can be anything but the secret key must be
LicenseKeyas it is what we look for when extracting the key value from the JSON map.
- Add the managed secrets policy to the function, which should be named something like
arn:aws:secretsmanager:<AWS_REGION>:<AWS_ACCOUNT_ID>:secret:NEW_RELIC_LICENSE_KEY-abc123, or add it as an inline policy on the execution role.
- When switching from the legacy
newrelic-log-ingestionfunction to the new Extension method for sending telemetry to New Relic, make sure to remove the subscription filter by following these log management instructions.
Note: Using our CLI to install the layer is not all that is required to fully instrument Java, .NET, and Go functions since the Extension layer added is only for sending us the invocation telemetry and does not contain an agent (only Node.js and Python layers contain agents).
Typical Things New Relic Support Will Ask About
State the issue as clearly as possible. Please include as much detail as possible including what troubleshooting steps you have already tried (and the results) as well as addressing the information in the unknowns section below.
- Whether this is a new endeavor or an existing Lambda that was working before.
- Whether there are any error messages in CloudWatch, CloudFormation, or the function’s invocation details.
- The name of the integration at:
New Relic One -> Infrastructure -> AWS
- How the integration was initially set up.
- Confirm that the AWS account, integration role, and AWS region are all correct.
- The name, language, and runtime version of the function.
- The memory limit and timeout value for the function.
- How invocation telemetry is being shipped, i.e. The legacy CloudWatch path or the New Relic Lambda Extension.
- The version of the layer added to the function, and if the
newrelic-log-ingestionfunction is being used with the legacy CloudWatch path, the version of that function.
- How the handlers have been set up.
- Environment variables that are set on the function.
- The process for deploying and invoking the function.
- Does it show as “instrumented” in the New Relic One Entity Explorer?
- Whether the dependency tree is missing any dependencies (especially for .NET and Java functions).
- Whether the function has the needed execution role and secretsmanager policy applied.
Export SAM Config
The SAM config file can highlight issues with runtime config, missing environment variables, incorrect values, versions, etc.
- Navigate to your Lambda function in the AWS Console.
- In the top-right, click the Actions dropdown, select “Export function”.
- Click “Download AWS SAM file”.
- Attach the yaml file to the ticket.
- Repeat for the
newrelic-log-ingestionfunction if you have one.
Extension debug logs
It’s good to check what the Extension is doing in detail.
NEW_RELIC_LAMBDA_EXTENSION_ENABLED: true NEW_RELIC_EXTENSION_LOG_LEVEL: DEBUG NEW_RELIC_EXTENSION_SEND_FUNCTION_LOGS: true
Export CloudWatch Logs
It is useful to collect CloudWatch logs for both the function’s invocation and the
newrelic-log-ingestion function if being used. The function can typically be configured to output details on what is being instrumented, how the invocation is working, and the status of the request to send logs and payloads to New Relic.
- Invoke the functions in AWS Lambda.
- Navigate to CloudWatch → Logs → Insights in your AWS console.
- Select your function(s) and also the
newrelic-log-ingestionstream if not using our Extension.
- Apply an appropriate Time Filter (depending on when the issue occurred), set the sort order to sort @timestamp asc, and the limit to the exact number of lines for the latest invocation, then click “Run query”. The first log line for the latest invocation should appear at the top.
- Under Export results select Copy table to clipboard (markdown).
- Paste the text into a new .txt or .log file and upload the file to this ticket.
Most logs exported from CloudWatch will contain a lot of whitespace, which makes the file much larger than it needs to be. Removing the whitespace before attaching to the New Relic Support ticket will make it more manageable. The following has been tested in Bash:
# delete whitespace and last character from each line cat cw.log | sed -e 's/ *.$//' > cw-clean.log
Get CloudFormation details
# list stacks aws cloudformation list-stacks | grep StackName | grep NewRelic # list change-sets if any aws cloudformation list-change-sets --stack-name NewRelicLogIngestion aws cloudformation list-change-sets --stack-name NewRelicLambdaIntegrationRole-<account-id> aws cloudformation list-change-sets --stack-name NewRelicLicenseKeySecret # get template summaries aws cloudformation get-template-summary --stack-name NewRelicLogIngestion aws cloudformation get-template-summary --stack-name NewRelicLambdaIntegrationRole-<account-id> aws cloudformation get-template-summary --stack-name NewRelicLicenseKeySecret # delete the problematic stacks aws cloudformation delete-stack --stack-name NewRelicLogIngestion aws cloudformation delete-stack --stack-name NewRelicLambdaIntegrationRole-<account-id> aws cloudformation delete-stack --stack-name NewRelicLicenseKeySecret