Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Detect search bots crawling stats


#1

Hello,

There was a huge increase in my Cloud server CPU load and my host told me that search engine bots caused that.
Is there an option or a way to detect search bots crawling stats, so I can know which bot caused the high load and block it or limit it.

Regards


#2

@pcnador - With Insights, you can create a query that looks at the userAgentName for the PageView events over the last couple days, and use that to identify the User Agent for the bot in question.


#3

I tried this query SELECT * FROM PageView WHERE appName = 'Site - Desktop' WHERE userAgentName LIKE '%bot%' SINCE 1 day ago. But no luck. No results returned.


#4

@samar.panda It may be that the user agent name does not contain bot. If you are seeing an increase in throughput that you think might be down to bots, you may want to check a query like below:

SELECT count(*) FROM PageView WHERE appName = 'Site - Desktop' SINCE 1 day ago facet userAgentName

This can show you the userAgents that most requests are coming from - you may be able to determine whether or not these are bots from that. If not it may point you in the right direction, giving you the info you need to dig further into your sites access logs.


#5

UserAgentName shows ‘Chrome / Firefox’ etc. Its missing the detailed useragent string. So, its difficult to differentiate from bot / normal user’s traffic.


#6

It’s possible that the bots are not using a unique UserAgent you can decipher through that Insights Attribute.

Would it be possible to check the sites server access logs to see if there is a unique attribute that the bot traffic is using?

If you can identify an attribute the bot traffic uses then you may be able to send that data as custom attributes to New Relic, allowing you to filter to that traffic in your insights queries.


#7

Hi @samar.panda - From the post below, it looks like you will have to capture the raw header and either add the raw value as a custom attribute, or add a custom attribute which identifies it is a bot (eg: True/False value).


#8

Hi @samar.panda - this sounds like it it’s been a challenge to work through! I was wondering if you have had any success yet from the suggestions that either @RyanVeitch or @stefan_garnham have supplied.


#9

I can try adding custom data attribute to detect bots. And i need to set this before load event of the page is fired. In order to include this attribute in pageview events of newrelic.


#10

We haven’t implemented collecting custom attributes for browser agent yet. Let me plan to do so. Any other ways to automatically start collecting these values? Something like enabling in the newrelic configurations / dashboard / console?

Then it could have been done instantly. As this will not require any release.


#11

@samar.panda - I’m not sure of any way to capture the raw headers you would need without the custom instrumentation. This is unfortunately not automatic.


#12

ok then we will go ahead with the custom integration.


#13

Please do follow up here to let us know if that helps solve your problem :smiley: