Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Feature idea: Auto Detecting Robot Vs Crawlers

feature-idea

#1

Auto Detecting Robot Vs Crawlers by scanning and flagging IPs/ set of IPs requesting APM.

We observed a spike quite often due to scrapping/spamming and I assume the same must be happening to others as well. With some DNS API if it shows IPs globally attacking and affecting many of us will help us nail down them not just by blocking IPs but introducing some methods to filter them out with message 429 "Too Many Requests"

Actually here NewRelic would help us by collecting and showing the information like

  1. Crawler
  2. Suspicious Spammer
  3. Spammer
  4. NA - Normal User/ Group of users
  5. XYZ - Company Load Test Tool

Different companies logging the data in NewRelic can also help by tagging some IPs.
I am assuming that this will reduce many unwanted attacks.
We tried to add a custom attribute to see IP as suspicious spammer

  • NRQL queries I have tried so far:
    SELECT count(*) FROM Transactions FACET appName, httpHost , remoteIpAddress SINCE 1 day ago TIMESERIES

We tried to add a custom attribute to see IP as suspicious spammer but may not help other companies whereas they might be also have been getting spam from the same IP(s).

So, it would be similar to Google Ad Sense Service alert and not allow the owner to proceed further until their domain / IP is whitelisted


#2

This is an interesting conversation to have!

Right now New Relic doesn’t collect any data (by default) that could be used to identify the source.
By that I mean, we don’t collect any identifiable info such as IP Addresses.

Do you think there are ways, without collecting PII data, that New Relic could highlight requests that appear to be crawlers/bots?

Of course, your solution of custom instrumenting is good! you are able to collect the IP address data which I’m sure helps you in this.


#3

Really appreciate your prompt reply to this.
Please read my reply to your comments.

Right now New Relic doesn’t collect any data (by default) that could be used to identify the source.
By that I mean, we don’t collect any identifiable info such as IP Addresses.

Oh! in that case, NewReilc needs to rely on clients which won’t help the thing I was proposing.

Do you think there are ways, without collecting PII data, that New Relic could highlight requests that appear to be crawlers/bots?

I doubt so! w/o collecting PII it would be possible but now as per GDPR policy, it will be more legal to keep track of IP as PII as I think many companies might be using this tool as an essential on which user does not have control. So I have a positive filling that this approach will work with GDPR in place.

Spammer won’t accept the GDPR so considered as default accepted case But still, it depends on countries and region as a reverse case but still half battle we can win.

Of course, your solution of custom instrumenting is good! you are able to collect the IP address data which I’m sure helps you in this.

Yes, We somewhat rely on this custom instrumentation.


#4

The information in the post below may provide you a start. You could use your firewall logs to get the ip addresses if the header information is also recorded.


#5


Depending on the metrics and parameters you capture it is possible to detect a lot more.
In this case we have created a unique identifier per session on the APM side and check for dedicated request parameters (customAttribute) in the Transaction event.
So we are able to identify (in this case) click fraud on google CPC (cost per click) campaigns.

But at the end of the day you can slice and dice everything.


#6

Really thanks for information but I think anyone can set the user agent they want so I can set a user agent as if I am real bot when requesting to any server.

Check user agent switcher plugins available for Browser like Chrome or Mozilla.


#7

Hi all

Here I am proposing a global solution and not a company specific solution.

Your company can provide the list of crawlers versus spammer which anyone can use even though they have not been attacked and verify weather it is real crawler/ spammer


#8

Recently someone tried to hit our servers from three different IPS as if request are coming from Bing and searching for Amazon.


#9

This sounds like a log use case. Something you could accomplish with NR as well.