首頁 繪圖設計 工作閒聊 比賽活動 美術討論 標籤 圖片
PChome: rude PchomeBot spider
PChome: rude PchomeBot spider


Type(Type) 2013/11/8 14:34

PChome: Very rude PchomeBot spider/robot




Q: Does PCHomebot reads and follows robots.txt
A: No, it doesn't care your robots.txt. That is, a rude one.


Q: Does PCHomebot provide contact mailbox or contact form in user agent URL?
A: No, there is NO contact info at that page. Again, a rude one.

=> http://www.pchome.com.tw/pchomebot.htm


代碼:

"Mozilla/5.0 (compatible; PChomebot/1.0; http://www.pchome.com.tw/pchomebot.htm)

PChomebot 說明

PChomebot是PChome為了提供更好的網路服務與購物經驗所開發出的程式。
這個程式會根據網頁內文與連結,自動做資料收集與分析,原則上不會影響網站的運作。
如果有對您的網站造成影響,請與我們聯絡。






Q: What do you think?
A: If the PcHomeBot brings abnormal load to your sites, just ban it.


Q: IP address range?
A: From Seednet/SPARQ

代碼:

220.228.172.*
122.147.50.*
113.196.127.*
123.51.198.*
122.147.50.*

Type(Type) 2014/7/14 15:45

PChome Robot 完全不理會 robots.txt



... 亂抓 robots.txt disallow 的連結。

以為 YodaoBot(有道) 已經是很誇張的 spider bot
沒想到 PCHOME 更是誇張!
pchomebot.htm 頁面也沒有任何聯絡資訊!

一天抓 300,000+ pages/day ,連續好幾天,令人各種無言。


Add this into your .htaccess or conf files
代碼:

 
     BrowserMatch       "PChomebot"      is_crazy_bot

    #
    # Ban is_crazy_bot:  PCHOME BOT
    #
    <FilesMatch "^(.*)$">
          Order Allow,Deny
          Allow from all
          Deny from env=is_crazy_bot
    </FilesMatch>


Type(Type) 2014/7/14 16:03


These pchomebot are from SPARQnet & SPARQ.com.tw



代碼:

[07:50] > grep 'pchomebot.htm)' /var/log/apache2/access.log | awk '{print $1}' > /tmp/pchomebot.txt
[07:52] > grep 'pchomebot.htm)' /var/log/apache2/access.log.1 | awk '{print $1}' >> /tmp/pchomebot.txt
[07:54] > sort /tmp/pchomebot.txt | uniq



113.196.127.66
113.196.127.67
113.196.127.68
113.196.127.69
113.196.127.70
113.196.127.71
113.196.127.72
113.196.127.73
113.196.127.74
113.196.127.75
113.196.127.76
113.196.127.77
113.196.127.78
113.196.127.79
113.196.127.80
113.196.127.81
113.196.127.82

122.147.50.17
122.147.50.18
122.147.50.19
122.147.50.20
122.147.50.21
122.147.50.22
122.147.50.23
122.147.50.24
122.147.50.25
122.147.50.27
122.147.50.28
122.147.50.29
122.147.50.30

123.51.198.66
123.51.198.67
123.51.198.68
123.51.198.69
123.51.198.70
123.51.198.71
123.51.198.72
123.51.198.73
123.51.198.74
123.51.198.75
123.51.198.76
123.51.198.77
123.51.198.78
123.51.198.79
123.51.198.80
123.51.198.82

220.228.172.130
220.228.172.131
220.228.172.132
220.228.172.133
220.228.172.134
220.228.172.135
220.228.172.138
220.228.172.139
220.228.172.140
220.228.172.141
220.228.172.143
220.228.172.144
220.228.172.145
220.228.172.146



(8,208 views)
[更多討論] 討論區 Windows, Linux, Perl, PHP, C/C++, Driver, Web 理論、應用、硬體、軟體

"PChome: rude PchomeBot spider" 傳統頁面(電腦版)

首頁 繪圖設計 工作閒聊 比賽活動 美術討論 標籤 圖片
傳統桌面版 [ 登入/註冊 ]
© Vovo2000.com Mobile Version 小哈手機版 2024