Scrapy extension to store info in storage service
Project description
A scrapy extension to store requests and responses information in storage service.
Installation
You can install scrapy-pagestorage using pip:
pip install scrapy-pagestorage
You can then enable the middleware in your settings.py:
SPIDER_MIDDLEWARES = { ... 'scrapy_pagestorage.PageStorageMiddleware': 900 }
How to use it
Enable extension through settings.py:
PAGE_STORAGE_ENABLED = True PAGE_STORAGE_ON_ERROR_ENABLED = True
Configure the exension through settings.py:
PAGE_STORAGE_MODE = "VERSIONED_CACHE" PAGE_STORAGE_LIMIT = 100 PAGE_STORAGE_ON_ERROR_LIMIT = 100
The extension is auto-enabled for auto-spiders (SHUB_SPIDER_TYPE in [auto, portia]).
Settings
PAGE_STORAGE_MODE
Default: None
A string which specifies if the extension will store information using cache store or versioned cache store (set PAGE_STORAGE_MODE=”VERSIONED_CACHE” to use versioned one).
PAGE_STORAGE_LIMIT
An integer to set a limit of visited pages amount to store.
PAGE_STORAGE_ON_ERROR_LIMIT
An integer to set a limit for page errors amount to store.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapy_pagestorage-0.2.0-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da2bf15b92a70a9f3f9b0a59eabb2a76761e3e85d99d858d6b02bc4d4aa5080b |
|
MD5 | 8b71ff41252b9583c11ae00bbefb137b |
|
BLAKE2b-256 | 2279d5dd5f9764bcd0340531bed900414456fdf63a6239749b84c4e8d3a5aeaa |