A middleware to cache http response for Scrapy
Project description
Overview
scrapy-httpcache is a scrapy middleware to save http cache in mongodb. Besides, scrapy-httpcache contains two extra storage plugin, including request_error_storage and banned_storage. request_error_storage can save Request which occur error. banned_storage can save Banned Request whose block_checker can be override.
Requirements
Python 3.3+
Works on Linux, Windows, Mac OSX, BSD
Install
The quick way:
pip install scrapy-httpcache
OR copy this middleware to your scrapy project.
Documentation
In settings.py, for example:
# ----------------------------------------------------------------------------- # SCRAPY HTTPCACHE SETTINGS # ----------------------------------------------------------------------------- DOWNLOADER_MIDDLEWARES.update({ 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': None, 'scrapy_httpcache.downloadermiddlewares.httpcache.AsyncHttpCacheMiddleware': 900, }) HTTPCACHE_ENABLED = True HTTPCACHE_IGNORE_HTTP_CODES = [301, 302, 500, 503] HTTPCACHE_STORAGE = 'scrapy_httpcache.extensions.httpcache_storage.MongoDBCacheStorage' HTTPCACHE_MONGODB_HOST = '127.0.0.1' HTTPCACHE_MONGODB_PORT = 27017 HTTPCACHE_MONGODB_USERNAME = 'root' HTTPCACHE_MONGODB_PASSWORD = 'password' HTTPCACHE_MONGODB_CONNECTION_POOL_KWARGS = {} HTTPCACHE_MONGODB_AUTH_DB = 'admin' HTTPCACHE_MONGODB_DB = 'cache_storage' HTTPCACHE_MONGODB_COLL = 'cache' # ----------------------------------------------------------------------------- # SCRAPY HTTPCACHE BANNED SETTINGS (optional) # ----------------------------------------------------------------------------- BANNED_STORAGE = 'scrapy_httpcache.extensions.banned_storage.MongoBannedStorage' # ----------------------------------------------------------------------------- # SCRAPY HTTPCACHE REQUEST ERROR SETTINGS (optional) # ----------------------------------------------------------------------------- REQUEST_ERROR_STORAGE = 'scrapy_httpcache.extensions.request_error_storage.MongoRequestErrorStorage'
If you want to remove banned response, use send_catch_log_deferred to send signal to scrapy_httpcache.signals.remove_banned with kwargs contains (spider, response, exception), which callback function return a Deferred.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scrapy-httpcache-0.0.5.tar.gz
.
File metadata
- Download URL: scrapy-httpcache-0.0.5.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6df37c945e69a1374ebbcec188eea7c01b3640bcfce46bffc1d51c1fd03caa3d |
|
MD5 | c7eb4292d6fefe6fc254152d2a08ee0b |
|
BLAKE2b-256 | 7197103414578b9a7a58ad6e0aff38969844c6b50066aff3cdb7d79b631d896a |