Scrapy Item Record Extension
Project description
os-scrapy-record
This project provide extensions to process Response/Failure, generate standard Item.
Install
pip install os-scrapy-record
You can run example spider directly in the project root path
scrapy crawl example
APIs
-
os_scrapy_record.ResponseCallback
- the
callback
method of this extension will replace the defaultRequest.callback
, process Response and generate FetchRecord - the
callback
method will not work when the request already set callback function - the
callback
method will override theparse
method of spider - enable extension in the project settings.py file:
EXTENSIONS = { "os_scrapy_record.ResponseCallback": 1, }
- the
-
os_scrapy_record.ResponseErrback
- the
errback
method of this extension will replace the defaultRequest.errback
, process Failure and generate FetchRecord - the
errback
method will not work when the request already set errback function - enable extension in the project settings.py file:
EXTENSIONS = { "os_scrapy_record.ResponseErrback": 1, }
- the
-
os_scrapy_record.FetchRecord
This class is subclass of Item
the mumbers of this class are:
- request:
os_scrapy_record.items.RequestItem
, members: url, method, headers, body - meta:
dict
, request.meta, it is better to use lower case and '_' as separator as key - response:
os_scrapy_record.items.ResponseItem
,members: headers, body, status, ip_address(Scrapy 2.1.0+), failure
- request:
-
os_scrapy_record.fetch_status.FetchStatus
A mumber of ResponseItem, include HTTP, DNS, Network and user defined status. It is a two-tuple object: group and code. e.g, HTTP:200, DNS:-2, SERVER:111, RULE:16
Unit Tests
sh scripts/test.sh
License
MIT licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
os_scrapy_record-0.0.7.tar.gz
(9.9 kB
view hashes)