In Technical Support, we see a lot of questions about this metric seen on both the Hosts and Processes pages in New Relic Infrastructure. I’m going to go into some detail about what it means on the different pages and how to use this knowledge to your best advantage!
What does CPU Percent mean on the Hosts page?
On the Hosts page, CPU Percent is actually a derived metric that is part of the
SystemSample event. That is, we don’t collect CPU Percent, but derive it from several other metrics. Specifically, the
cpuPercent attribute is an aggregation of
Because of the way we aggregate this metric, what is shown on the Infrastructure dashboard will differ from what you see in, for example, Cloudwatch. Cloudwatch, in particular, does not add
cpuStealPercent into their “CPU Utilization” metric. There is nothing wrong with the way Amazon calculates CPU Utilization, but it’s important to know that it’s done differently, and how it differs.
cpuPercent is calculated, the Hosts page shows metrics from a “whole system” perspective. That is, if a given server has 32 cores installed and is showing
cpuPercent of 50%, that means 16 of those 32 cores are being used in one way or another.
What does CPU Percent mean on the Processes page?
On the Processes page, CPU Percent is scoped to individual processes instead of hosts. Because of this, the CPU Percent metric does not take into account the resources of the entire system. Instead, it shows how much of a single CPU core each process is taking.
This means that, using the example I presented above of a 32-core server, a process showing
cpuPercent of 50% is actually only taking half of a single core, or about 1.5625% of the CPU cycles available on the whole server. This is why you will see
cpuPercent go over 100% on the Processes page – that simply means that the process is demanding more than a single core’s worth of CPU cycles.
What if I only want to alert on 2 or 3 aspects of CPU Percent?
This is where it gets fun. All data collected by the New Relic Infrastructure Agent is sent to Insights, where it can be queried using NRQL. Since you can set up NRQL alert conditions, you can build a query to scope to only certain aspects of the
cpuPercent metric shown on the Hosts page.
The important details to know are that, for host monitoring (rather than process monitoring),
cpuPercent is part of
SystemSample. We also need to know which parts of
cpuPercent we’d like to target in our alert condition.
For the sake of this example, let’s say we’re not concerned with either
cpuStealPercent. We only want to open an alert violation if the combined
cpuSystemPercent go past the threshold. We could use the following query as a basis for our NRQL alert condition:
SELECT average(cpuUserPercent + cpuSystemPercent) FROM SystemSample
You can change which metrics the NRQL query is targeting by adding or removing elements from within the parentheses.
We can even generate a timeseries graph in Insights to look at only the aggregation of these two metrics. We’d do that using this query:
SELECT average(cpuUserPercent + cpuSystemPercent) FROM SystemSample SINCE 60 minutes ago TIMESERIES 1 minute
You can adjust this query’s time frame and timeseries bucketing to suit your needs.
I hope all of this helps you better interpret your Infrastructure dashboard and find more ways to use Infrastructure to suit your personal use cases.