APM Best Practice Guide
Application Performance Monitoring (APM) is crucial in the digital age. Knowing exactly what your applications and systems are doing at any given time can give you the visibility you need to spot anomalies early, before your customer does. APM gives you the visibility to truly understand your digital business.
We want to empower you and your team to have more insight into each of your applications and, more importantly, know exactly what you need to do to improve their performance.
You may want to start by reviewing these short videos to help you get the lay of the land. Then, in this post, we aim to help you set yourself up for success.
When you have reviewed these best practices, show off your new found skills. Take the APM Best Practices Quiz to earn your badge.
Create a naming convention:
When several different applications are on the same account, and each application spans multiple environments (for example, development, test, pre-production, production), it can be hard to find a specific application in your overview dashboard. That’s why we recommend that you establish a naming convention for all your apps and use labels to help with search and filter. Using a convention like
<environment>-<appname>-<language> can make it much easier to spot patterns in your data.
For more on application naming, including how to rename your app, and labeling check out these docs:
Set up Account & User Access:
You’ll want to establish a hierarchy of accounts and sub-accounts for different applications. This is useful, for example, to limit users to viewing only apps relevant to their group. After the Owner or an Admin creates one or more sub-accounts, they can also manage those accounts, including providing account users with granular access controls to each part of the account.
- Master and Sub Account Hierarchy
- Relic Solution: New Relic Account Architecture and Moving Accounts
- Relic Solution: SAML SSO Tips
- Add-on Roles
Understanding what you are seeing
Set your Apdex value to understand your customer experience:
The APM Application overview page has a number of charts, which means you may have trouble knowing where to start. So where do you look first to start troubleshooting the performance of your application? There is often a correlation between the charts on that page. Your response times might spike if you are in a period of higher than normal throughput, or perhaps you are seeing slower transactions due to a high error rate. Most of these charts have an impact on the Apdex, in the upper right corner of the Overview Page.
Apdex is an industry standard measure of application performance. A score between 0-1 that gives you an
at a glance look at the performance of your apps. You set an ideal value based on your application’s performance history and your audience, and Apdex will translate scores based on transaction response times into Satisfied/Tolerated/Frustrated. This makes it easy to understand if your app’s performance is impacting the customer experience.
Analyze and resolve errors:
You’re looking at the error rate chart spiking, what next? Let’s click through to Error Analytics & see what we can find. Over in Error Analytics we can see a breakdown of the count of each error type. Clicking in to those will bring you to the Error Trace. Each error trace captures a lot of helpful data around that error. Such as the returned HTTP Response Code, as well as a Stack Trace. Any custom attributes you have set up are also shown here. All there to help you locate errors in your stack and speed up your time to resolution.
In addition to Error Tracing, Transaction Traces are helpful to see slow transactions, and with Distributed Tracing you can analyse across your stack to see where transactions are slowing down.
Making data actionable
Create a Dashboard with these NRQL Queries:
Insights brings all your data together in one place. NRQL gives you the power to ask the important questions of your data. A good place to start looking at your data in Insights is to review what data is there by default, and filling gaps with Custom Attributes.
Percentage of erroring transactions, per application:
FROM Transaction SELECT percentage(count(*), WHERE error IS FALSE) AS 'Good', percentage(count(*), WHERE error IS TRUE) AS 'Errors', percentage(count(*), WHERE error IS NULL) AS 'Unknowns' SINCE 2 days ago facet appName
Transaction Response Times over an hour
FROM Transaction SELECT average(duration) SINCE 1 HOUR AGO facet appName TIMESERIES MAX
Set up Alerts:
Alerts are the most important factor in monitoring. Monitoring your entire stack serves no purpose if you are not going to be notified when your apps are misbehaving. New Relic Alerts allows you to configure notifications to a number of common destinations (Slack, OpsGenie, Hipchat, etc…), as well as a generic Webhook to send notifications to.
When getting started with APM Alerting, we recommend setting up conditions for Apdex falling below your acceptable threshold, response times rising beyond acceptable levels, and error rates rising. The exact thresholds for these conditions depend on what you consider to be abnormal for your application, so you may need to experiment to find the right level.
- Guide to Effective Alerting
- Tutorial: Intro to APM Alerting
- Relic Solution: Alert Incident Preferences are the Key to Consistent Alert Notifications
Ready to Learn More?
Looking for more APM best practices and tips? Check out the Level Up categories.