nginx强行屏蔽——微软(BING),无语。。。
微软(BING)完全不遵守robots规则以下是我的robots文件
User-agent: *
Disallow: /
结果在我的日志里却发现有大量的:
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.115
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.155
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.137
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.207.95
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.159
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.211
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.227
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.227
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.232
"GET /xxxxxx HTTP/1.0" 302 165 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.182
从日志来看,bing算法相当差,爬行的频率相当高。
这对于我这种动态的应用简直就是一个噩梦,无奈只能强行屏蔽
服务器使用的是nginx。
在配置文件中,添加如下代码:
if ($http_user_agent ~ (msnbot) )
{
return 404;
}
没想到大名鼎鼎的微软,居然也如此无赖
再次来到bing.com
输入
site:我的服务器的域名
可以看到已经没有快照了,虽然有大量的地址。。。。
页:
[1]