Automatic Replacement of Infrastructure Metrics in Dashboards

This post explains the replacements that we are transparently applying in some dashboards, to ensure that in the future, our traditional sample-based metrics can coexist with our newer and more powerful dimensional metrics when both subsystems become even more integrated.

Dimensional metrics bring restrictions about the operations that can be specified within an NRQL query: you can’t operate between two metrics inside an aggregation function. The following example queries are currently working, but might stop working in the future:

SELECT max(memoryUsedBytes / memoryTotalBytes * 100) FROM SystemSample
SELECT average(podsDesired - podsReady) FROM K8sReplicasetSample
SELECT latest(cpuUsedCores / cpuRequestedCores) FROM K8sContainerSample

To prevent breaking customers’ dashboards, we are transparently replacing the above NRQL calculations from some infrastructure types by other equivalent metrics. For example, if the above NRQL queries were located in a dashboard, they would be transparently fixed and replaced by the following queries:

SELECT max(memoryUsedPercent) FROM SystemSample
SELECT average(podsMissing) FROM K8sReplicasetSample
SELECT latest(cpuCoresUtilization / 100) FROM K8sContainerSample

Despite the change is automatic, customers defining their dashboards as Terraform definition files might undo our changes as soon as they re-apply their files.

What metrics are being replaced?

The following table shows the metrics and operations that are being replaced by their equivalent single metric, when they appear inside an aggregation function (max, average, latest, percentile…). It can be used as a guide to identify and replace them in your local NRQL queries.

Event Type Legacy Operation Replacement Metric
SystemSample memoryFreeBytes/memoryTotalBytes*100 memoryFreePercent
SystemSample memoryFreeBytes/memoryTotalBytes memoryFreePercent/100
SystemSample memoryUsedBytes/memoryTotalBytes*100 memoryUsedPercent
SystemSample memoryUsedBytes/memoryTotalBytes memoryUsedPercent/100
StorageSample readBytesPerSecond+writeBytesPerSecond readWriteBytesPerSecond
K8sReplicasetSample podsDesired-podsReady podsMissing
K8sStatefulsetSample podsDesired-podsReady podsMissing
K8sDeploymentSample podsDesired-podsReady podsMissing
K8sDaemonsetSample podsDesired-podsReady podsMissing
K8sContainerSample cpuUsedCores/cpuLimitCores*100 cpuCoresUtilization
K8sContainerSample cpuUsedCores/cpuLimitCores cpuCoresUtilization/100
K8sContainerSample cpuUsedCores/cpuRequestedCores*100 requestedCpuCoresUtilization
K8sContainerSample cpuUsedCores/cpuRequestedCores requestedCpuCoresUtilization/100
K8sContainerSample memoryUsedBytes/memoryLimitBytes*100 memoryUtilization
K8sContainerSample memoryUsedBytes/memoryLimitBytes memoryUtilization/100
K8sContainerSample memoryUsedBytes/memoryRequestedBytes*100 requestedMemoryUtilization
K8sContainerSample memoryUsedBytes/memoryRequestedBytes requestedMemoryUtilization/100
K8sNodeSample cpuUsedCores/allocatableUsedCores*100 allocatableCpuCoresUtilization
K8sNodeSample cpuUsedCores/allocatableUsedCores allocatableCpuCoresUtilization/100
K8sNodeSample memoryWorkingSetBytes/allocatableMemoryBytes*100 allocatableMemoryUtilization
K8sNodeSample memoryWorkingSetBytes/allocatableMemoryBytes allocatableMemoryUtilization/100
K8sNodeSample fsUsedBytes/fsCapacityBytes*100 fsCapacityUtilization
K8sNodeSample fsUsedBytes/fsCapacityBytes fsCapacityUtilization/100
K8sEtcdSample processOpenFds/processMaxFds*100 processFdsUtilization
K8sEtcdSample processOpenFds/processMaxFds processFdsUtilization/100

What if I have metrics operations that are not replaceable by the above?

You can replace operations inside an aggregation function by operations of aggregation functions, when possible.

For example, replace queries like:

FROM SystemSample SELECT average(cpuSystemPercent + cpuUserPercent)

by an equivalent like:

FROM SystemSample SELECT average(cpuSystemPercent) + average(cpuUserPercent)

What about my NRQL alerts?

At the moment, alerts are not affected by this change.