fateame 发表于 2016-12-27 08:02:23

nginx强行屏蔽——微软(BING),无语。。。

微软(BING)完全不遵守robots规则
以下是我的robots文件

User-agent: *
Disallow: /


结果在我的日志里却发现有大量的:

"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.115
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.155
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.137
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.207.95
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.159
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.211
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.227
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.227
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.232
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.182


从日志来看,bing算法相当差,爬行的频率相当高。
这对于我这种动态的应用简直就是一个噩梦,无奈只能强行屏蔽
服务器使用的是nginx。
在配置文件中,添加如下代码:

if ($http_user_agent ~ (msnbot) )
{
return 404;
}


没想到大名鼎鼎的微软,居然也如此无赖

再次来到bing.com
输入

site:我的服务器的域名

可以看到已经没有快照了,虽然有大量的地址。。。。
页: [1]
查看完整版本: nginx强行屏蔽——微软(BING),无语。。。