Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Relic Solution: CPU Percent in Infrastructure -- What It Means And How To Use It

levelup
infrastructure
rfb
cpuusage

#1

In Technical Support, we see a lot of questions about this metric seen on both the Hosts and Processes pages in New Relic Infrastructure. I’m going to go into some detail about what it means on the different pages and how to use this knowledge to your best advantage!

What does CPU Percent mean on the Hosts page?

On the Hosts page, CPU Percent is actually a derived metric that is part of the SystemSample event. That is, we don’t collect CPU Percent, but derive it from several other metrics. Specifically, the cpuPercent attribute is an aggregation of cpuUserPercent, cpuSystemPercent, cpuIoWaitPercent and cpuStealPercent.

Because of the way we aggregate this metric, what is shown on the Infrastructure dashboard will differ from what you see in, for example, Cloudwatch. Cloudwatch, in particular, does not add cpuStealPercent into their “CPU Utilization” metric. There is nothing wrong with the way Amazon calculates CPU Utilization, but it’s important to know that it’s done differently, and how it differs.

Beyond how cpuPercent is calculated, the Hosts page shows metrics from a “whole system” perspective. That is, if a given server has 32 cores installed and is showing cpuPercent of 50%, that means 16 of those 32 cores are being used in one way or another.

What does CPU Percent mean on the Processes page?

On the Processes page, CPU Percent is scoped to individual processes instead of hosts. Because of this, the CPU Percent metric does not take into account the resources of the entire system. Instead, it shows how much of a single CPU core each process is taking.

This means that, using the example I presented above of a 32-core server, a process showing cpuPercent of 50% is actually only taking half of a single core, or about 1.5625% of the CPU cycles available on the whole server. This is why you will see cpuPercent go over 100% on the Processes page – that simply means that the process is demanding more than a single core’s worth of CPU cycles.

What if I only want to alert on 2 or 3 aspects of CPU Percent?

This is where it gets fun. All data collected by the New Relic Infrastructure Agent is sent to Insights, where it can be queried using NRQL. Since you can set up NRQL alert conditions, you can build a query to scope to only certain aspects of the cpuPercent metric shown on the Hosts page.

The important details to know are that, for host monitoring (rather than process monitoring), cpuPercent is part of SystemSample. We also need to know which parts of cpuPercent we’d like to target in our alert condition.

For the sake of this example, let’s say we’re not concerned with either cpuIoWaitPercent or cpuStealPercent. We only want to open an alert violation if the combined cpuUserPercent and cpuSystemPercent go past the threshold. We could use the following query as a basis for our NRQL alert condition:

SELECT average(cpuUserPercent + cpuSystemPercent) FROM SystemSample

You can change which metrics the NRQL query is targeting by adding or removing elements from within the parentheses.

We can even generate a timeseries graph in Insights to look at only the aggregation of these two metrics. We’d do that using this query:

SELECT average(cpuUserPercent + cpuSystemPercent) FROM SystemSample SINCE 60 minutes ago TIMESERIES 1 minute

You can adjust this query’s time frame and timeseries bucketing to suit your needs.


I hope all of this helps you better interpret your Infrastructure dashboard and find more ways to use Infrastructure to suit your personal use cases.


How does newrelic-infra retrieve CPU and load average info
NewRelic Infrastructure reports 100% CPU usage, htop doesn't agree
#2

Love this article. I’m still searching for a way to compare CPU count and load average against the CPU % usage. The NRQL query seems to fail me, but the logic looks something like below:

IF CPU % Utilization > 90% AND (LoadAvg15Min >= CPUCount) THEN

It is so close I can taste it – this article just makes me want that NRQL query even more! :smiley:


#3

Hi @jbiggley

Unfortunately it isn’t possible to compare the two attributes in a NRQL query. This is due to the fact that coreCount is stored as a string, while loadAverageFifteenMinute is stored as a float variable. Storing coreCount as a numeric value is an excellent idea and I encourage you to post your use use case at https://discuss.newrelic.com/c/feature-ideas/feature-ideas-infrastructure.

That said, if you wanted to set a hard limit for loadAverageFifteenMinute, you could make this query work like so (using a hard limit of 4 as an example):

SELECT count(*) from SystemSample WHERE cpuPercent > 90 AND loadAverageFifteenMinute > 4

You could then use this query to set up an alert condition.

I hope this helps!


#4

For some reason I thought coreCount was being changed from a string to a float variable? Is there a bug/feature request for that floating around somewhere that hasn’t been implemented yet? Maybe I’m just having dreams about New Relic :stuck_out_tongue:


#5

This is indeed a feature request, but posts from our customers on the Feature Ideas section can definitely help to prioritize this request, as our Product Managers watch those sections and take them very seriously, especially when the feature request exists both internally and on the forums.


#6

Interesting read, did not know this.

thank you.


#7

Those SystemSample queries are good to get overall system CPU cycles. How can I (with the infrastructure agent) now query CPU cycles for a specific process only? We run load tests for performance metrics and utilize NR for stats. With the server agent I was able to gather stats for w3w process only.


#8

Hi @jasmith0429,

For doing what you describe, you would need to query ProcessSample instead of SystemSample. You could do this either through the UI, by scoping to the host of interest and then filtering to the process or processes you’re concerned with, then making sure that one of the graphs shows CPU%.

The other way you can do this is through Insights, by using a query similar to the one below:

SELECT average(cpuPercent) FROM ProcessSample WHERE entityName = 'myHost' AND processDisplayName = 'myProcess' SINCE 3 hours ago TIMESERIES 1 minute

I hope this helps!