Sogou: A Not So Clever Chinese Spider
Sogou is a Chinese spider collecting data for their search engine, or something. We have tried to block Sogou using robots.txt without any success.
User-agent: Sogou web spider
Disallow: /User-agent: Sogou inst spider
Disallow: /
So we had to manually block that specific User Agent (sogou).
But after manually blocking the “user agent” part
Date/Time: Sun, 24 Sep 2023 18:43:34 +0300 »
IP address: 123.126.68.40 »
User agent: Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07) »
Why blocked: Suspected bot or scraper probe! »
Country code lookup: CN »
Verified identity: Sogou »
Request method: GET »
Hostname: sogouspider-123-126-68-40.crawl.sogou.com »
Sogou came immediatelly back from a different IP range and NO HOST NAME
Date/Time: Sun, 24 Sep 2023 18:43:36 +0300 »
IP address: 49.7.20.28 »
User agent: Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07) »
Why blocked: Suspected bot or scraper probe! »
Country code lookup: CN »
Request method: GET »
Hostname: – »
and then again it came back with HOSTNAME REMOVED AND USER AGENT CHANGED !
Date/Time: Sun, 24 Sep 2023 18:43:37 +0300 »
IP address: 49.7.20.28 »
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko)
Why blocked: Outdated browser (C)! Chrome/78.0.3904.97 Safari/537.36 »
Country code lookup: CN »
Request method: GET »
Hostname: – »
So this is NOT a legitimate Search Engine / Spider. It is a-not-so clever scraper from China and it is now completely blocked using other more drastic methods.
Note that HUAWEI and Alibaba spiders use the very same bad tactics. You block the spiders and they come again as Cloud visitors from non-Chinese IPs (some from AWS also). More on this to a future blog post.
bye bye Sogou …