Skip to main content

A middleware to cache http response for Scrapy

Project description

PyPI Version Build Status

Overview

scrapy-httpcache is a scrapy middleware to save http cache in mongodb. Besides, scrapy-httpcache contains two extra storage plugin, including request_error_storage and banned_storage. request_error_storage can save Request which occur error. banned_storage can save Banned Request whose block_checker can be override.

Requirements

  • Python 3.3+

  • Works on Linux, Windows, Mac OSX, BSD

Install

The quick way:

pip install scrapy-httpcache

OR copy this middleware to your scrapy project.

Documentation

In settings.py, for example:

# -----------------------------------------------------------------------------
# SCRAPY HTTPCACHE SETTINGS
# -----------------------------------------------------------------------------
DOWNLOADER_MIDDLEWARES.update({
    'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': None,
    'scrapy_httpcache.downloadermiddlewares.httpcache.AsyncHttpCacheMiddleware': 900,
})

HTTPCACHE_ENABLED = True
HTTPCACHE_IGNORE_HTTP_CODES = [301, 302, 500, 503]
HTTPCACHE_STORAGE = 'scrapy_httpcache.extensions.httpcache_storage.MongoDBCacheStorage'
HTTPCACHE_MONGODB_HOST = '127.0.0.1'
HTTPCACHE_MONGODB_PORT = 27017
HTTPCACHE_MONGODB_USERNAME = 'root'
HTTPCACHE_MONGODB_PASSWORD = 'password'
HTTPCACHE_MONGODB_CONNECTION_POOL_KWARGS = {}
HTTPCACHE_MONGODB_AUTH_DB = 'admin'
HTTPCACHE_MONGODB_DB = 'cache_storage'
HTTPCACHE_MONGODB_COLL = 'cache'

# -----------------------------------------------------------------------------
# SCRAPY HTTPCACHE BANNED SETTINGS (optional)
# -----------------------------------------------------------------------------
BANNED_STORAGE = 'scrapy_httpcache.extensions.banned_storage.MongoBannedStorage'

# -----------------------------------------------------------------------------
# SCRAPY HTTPCACHE REQUEST ERROR SETTINGS (optional)
# -----------------------------------------------------------------------------
REQUEST_ERROR_STORAGE = 'scrapy_httpcache.extensions.request_error_storage.MongoRequestErrorStorage'

If you want to remove banned response, use send_catch_log_deferred to send signal to scrapy_httpcache.signals.remove_banned with kwargs contains (spider, response, exception), which callback function return a Deferred.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-httpcache-0.0.5.tar.gz (13.5 kB view details)

Uploaded Source

File details

Details for the file scrapy-httpcache-0.0.5.tar.gz.

File metadata

File hashes

Hashes for scrapy-httpcache-0.0.5.tar.gz
Algorithm Hash digest
SHA256 6df37c945e69a1374ebbcec188eea7c01b3640bcfce46bffc1d51c1fd03caa3d
MD5 c7eb4292d6fefe6fc254152d2a08ee0b
BLAKE2b-256 7197103414578b9a7a58ad6e0aff38969844c6b50066aff3cdb7d79b631d896a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page