This dashboard serves as a single view of the overall health of an application, targeted towards all stakeholders, including leadership, engineering, and operations. Not only does it display simple, non-technical metrics to service as quick indicators of issues (response time, number of users), but it also summarizes the health of every component of the application (web jobs, Redis cache, etc.).
Required Products: APM, Synthetics, Browser, Infrastructure
Level of Effort: Medium
This dashboard uses custom Insights events and an Azure Infrastructure integration
Use the gear button to edit your dashboard and configure the following settings (below is an example):
- Dashboard Filter: Enabled
- Enabled Event Types:
- Enabled Attributes:
Average Response Time per Service
FROM Transaction SELECT average(duration) where appName LIKE 'app_name' facet appName since 1 day ago LIMIT 100
Shows the breakdown of response time by microservice. Clicking on a service facets the dashboard.
Subscription Message Buckets
FROM AzureServiceBusSubscriptionSample SELECT uniqueCount(name) facet cases(where messages < 1000 as 'Message Count < 1000', where messages >= 1001 and messages <= 9000 as 'Message Count 1001-9000' , where messages > 9001 as 'Message Count > 9000') LIMIT 1000 where resourceGroupName = 'resource_group'
Displays service bus subscriptions in buckets based on the number of messages within the subscription. Clicking on one of the buckets filters the adjacent chart to display only those subscriptions.
Service Bus Subscriptions
FROM AzureServiceBusSubscriptionSample SELECT max(messages) AS 'Total Messages', (sum(deadLetterMessages)/sum(messages))*100 as 'Dead Letter %', (sum(activeMessages)/sum(messages))*100 as 'Active %' facet name LIMIT 1000 where resourceGroupName = 'resource_group'
Displays all service bus subscriptions, showing subscriptions with the most messages at the top. Also shows active vs deadletter percentage to indicate how healthy the subscription is.
Web Job Status
from WebJobSample select webjobEmoji AS 'Status', webJobName where webJobName like '%job%'
Shows the status of all web jobs…with emojis. means “Running,” means “Stopped,” and means “Pending restart.”
Cosmos DB Doomsday Clock
from CosmosDBSample select max(CollectionSizeGB) since 60 minutes ago where Account = 'account' and Collection = 'collection'
This custom event shows the size of our largest Cosmos DB collection as a percentage, with a max of 10. Unpartitioned Cosmos DB collections cannot hold more than 10 GB of data, hence the doomsday. My personal favorite chart and least favorite type of SQL.