Scrapy 1.2.1 发布，web 爬虫框架

guyuehhh 发表于 2016-11-23 08:28:01

　　欢迎加入运维网交流群：263444886http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://my.oschina.net/img/hot3.png
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://static.oschina.net/uploads/logo/scrapy_UUYzY.png　　Scrapy 1.2.1 发布了。
　　Scrapy 是一套基于基于Twisted的异步处理框架，纯python实现的爬虫框架，用户只需要定制开发几个模块就可以轻松的实现一个爬虫，用来抓取网页内容以及各种图片。
　　更新内容：
　　新功能

[*]　　New FEED_EXPORT_ENCODING setting to customize the encoding used when writing items to a file. This can be used to turn off \uXXXX escapes in JSON output. This is also useful for those wanting something else than UTF-8 for XML or CSV output (#2034).
[*]　　startproject command now supports an optional destination directory to override the default one based on the project name (#2005).
[*]　　New SCHEDULER_DEBUG setting to log requests serialization failures (#1610).
[*]　　JSON encoder now supports serialization of set instances (#2058).
[*]　　Interpret application/json-amazonui-streaming as TextResponse (#1503).
[*]　　scrapy is imported by default when using shell tools (shell, inspect_response) (#2248).
　　Bug 修复

[*]　　DefaultRequestHeaders middleware now runs before UserAgent middleware (#2088). Warning: this is technically backwards incompatible, though we consider this a bug fix.
[*]　　HTTP cache extension and plugins that use the .scrapy data directory now work outside projects (#1581). Warning: this is technically backwards incompatible, though we consider this a bug fix.
[*]　　Selector does not allow passing both response and text anymore (#2153).
[*]　　Fixed logging of wrong callback name with scrapy parse (#2169).
[*]　　Fix for an odd gzip decompression bug (#1606).
[*]　　Fix for selected callbacks when using CrawlSpider with scrapy parse (#2225).
[*]　　Fix for invalid JSON and XML files when spider yields no items (#872).
[*]　　Implement flush() for StreamLogger avoiding a warning in logs (#2125).
　　重构

[*]　　canonicalize_url has been moved to w3lib.url (#2168).
　　下载地址：

[*]　　Source code (zip)
[*]　　Source code (tar.gz)

页: [1]

运维网's Archiver

Scrapy 1.2.1 发布，web 爬虫框架