Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Max CPU Stats Decline Over Longer Period

insights
bug
systemsample
maxcpu

#1

Please paste the [permalink][1] to the page in question

https://insights.newrelic.com/accounts/383185/dashboards/399007?edit=3019560ow:

Please include any NRQL you are using:

SELECT max(cpuPercent) FROM SystemSample where hostname in (‘USW2YGL02’,‘USW2YGL03’,‘USW2YGL04’,‘USW2YGL05’, ‘USW2YGLBETA01’) since ‘2017-10-17 17:45:00’ until ‘2017-10-17 18:15:00’ facet hostname

SELECT max(cpuPercent) FROM SystemSample where hostname in (‘USW2YGL02’,‘USW2YGL03’,‘USW2YGL04’,‘USW2YGL05’, ‘USW2YGLBETA01’) since ‘2017-10-17 17:15:00’ until ‘2017-10-17 18:15:00’ facet hostname

Please share your question/describe your issue below. Include any screenshots that may help us understand your question:

When I query for max CPU over a 30 minute period, I get 100% for at least one facet.

When I query for Max CPU over a 60 minute period which encompasses the 30 minute period the maximum CPU goes down to 66.56%. This is obviously incorrect.

This occurs regularly.

Our CPU peaks are short lived. We need to be able to see these even when viewing a large time period. How best can I do that?


#2

Hi @BaronS - When Faceting query results over extended periods of time I always add LIMIT 1000 on the end to ensure I am returning all data, not the default.


#3

Thanks Stefan!

I think this simplified example makes the issue more clear:

SELECT max(cpuPercent) FROM SystemSample where hostname = ‘USW2YGL02’ since ‘2017-10-17 17:45:00’ until ‘2017-10-17 18:15:00’

Returns: 99.84

SELECT max(cpuPercent) FROM SystemSample where hostname = ‘USW2YGL02’ since ‘2017-10-17 17:15:00’ until ‘2017-10-17 18:15:00’

Returns: 66.56

This is clearly a lie.

Perhaps more prevarication than calumny but it still makes convincing decision makers much more difficult.

“Yes, more New Relic please!”

“Yes we are maxing CPU! New Relic just doesn’t show it! You have to zoom in like this!”

Puts me in a difficult spot.


#4

I can confirm that I see the same behavior.

That said, what would be the purpose of this sort of metric for you? If CPU ever hit 100 for any data point is there a problem? I mean you are trying to get one data point in an hour or over a period. How fine grained is this data and aren’t you really interested in whether the CPU is sustained for some period of time within that window?

Try this:
SELECT count(*) FROM SystemSample where cpuPercent = 100 since ‘2017-10-17 17:45:00’ until ‘2017-10-17 18:15:00’

or This
SELECT count(*) FROM SystemSample where cpuPercent > 90 since ‘2017-10-17 17:45:00’ until ‘2017-10-17 18:15:00’

I don’t disagree with wanting to be able to get the max value of an attribute within a time period without NR trying to aggregate data points as the window gets larger.


#5

@BaronS : I am also interested to hear the use case for this kind of metric, as @6MM mentioned above. :blush:

Let us know if you need anything!


#6

Hi, sorry, thought it was more of a rhetorical question. :slight_smile:

We get momentary spikes where we hit 100% for several seconds, say up to 20 that indicate we have a problem.

In our case this is a coding issue where our pages get in a bad state.

The user gets unpredictable behavior at these times so this is not obvious to the end user. One of the only indications in instrumentation is the CPU spike.

I would like to look at a day and see how many times it occurred. I also want to see what time it occurred and then link it to unpredictable behavior from user experience, NR stats, or IIS logs in Loggly.

This allows me to build a business case to my bosses so we can justify investing resources to fix the issue.

Does that make my motivation clear?


#7

Sounds good, @BaronS! Thank you for the follow up! It is always helpful and interesting to hear about other’s use cases. Your reply definitely makes things clear—thanks!