Scrapy 1.0 发布，Web 爬虫框架

sunsir 发表于 2015-7-25 08:15:12

腾讯SNG夏日招聘集结号：200个空缺岗位，JAVA、C++、云计算、Android……
　　Scrapy 1.0 发布，该版本有多项新的特性和 bug 修复，部分内容如下：
　　新特性和增强

[*]　　Python logging (:issue:`1060`, :issue:`1235`, :issue:`1236`, :issue:`1240`,:issue:`1259`, :issue:`1278`, :issue:`1286`)
[*]　　FEED_EXPORT_FIELDS option (:issue:`1159`, :issue:`1224`)
[*]
　　Dns cache>
[*]　　support namespace prefix in xmliter_lxml (:issue:`963`)
[*]
　　Reactor threadpool max>
[*]　　Allow spiders to return dicts. (:issue:`1081`)
[*]　　Add Response.urljoin() helper (:issue:`1086`)
[*]　　look in ~/.config/scrapy.cfg for user config (:issue:`1098`)
[*]　　handle TLS SNI (:issue:`1101`)
[*]　　Selectorlist extract first (:issue:`624`, :issue:`1145`)
[*]　　Added JmesSelect (:issue:`1016`)
[*]　　add gzip compression to filesystem http cache backend (:issue:`1020`)
[*]　　CSS support in link extractors (:issue:`983`)
[*]　　httpcache dont_cache meta #19 #689 (:issue:`821`)
[*]　　add signal to be sent when request is dropped by the scheduler(:issue:`961`)
[*]　　avoid download large response (:issue:`946`)
[*]　　Allow to specify the quotechar in CSVFeedSpider (:issue:`882`)
[*]　　Add referer to "Spider error processing" log message (:issue:`795`)
[*]　　process robots.txt once (:issue:`896`)
[*]　　GSoC Per-spider settings (:issue:`854`)
[*]　　Add project name validation (:issue:`817`)
[*]　　GSoC API cleanup (:issue:`816`, :issue:`1128`, :issue:`1147`,:issue:`1148`, :issue:`1156`, :issue:`1185`, :issue:`1187`, :issue:`1258`,:issue:`1268`, :issue:`1276`, :issue:`1285`, :issue:`1284`)
[*]　　Be more responsive with IO operations (:issue:`1074` and :issue:`1075`)
[*]　　Do leveldb compaction for httpcache on closing (:issue:`1297`)
　　弃用和清除：

[*]　　Deprecate htmlparser link extractor (:issue:`1205`)
[*]　　remove deprecated code from FeedExporter (:issue:`1155`)
[*]　　a leftover for.15 compatibility (:issue:`925`)
[*]　　drop support for CONCURRENT_REQUESTS_PER_SPIDER (:issue:`895`)
[*]　　Drop old engine code (:issue:`911`)
[*]　　Deprecate SgmlLinkExtractor (:issue:`777`)
　　更多内容请查看发行日志。
　　该版本下载：Source code (zip)
　　
　　Scrapy 是一套基于基于Twisted的异步处理框架，纯python实现的爬虫框架，用户只需要定制开发几个模块就可以轻松的实现一个爬虫，用来抓取网页内容以及各种图片，非常之方便～
　　

页: [1]

运维网's Archiver

Scrapy 1.0 发布，Web 爬虫框架