Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

A middleware to cache http response for Scrapy

Project description

PyPI Version Build Status

Overview

scrapy-httpcache is a scrapy middleware to save http cache in mongodb. Besides, scrapy-httpcache contains two extra storage plugin, including request_error_storage and banned_storage. request_error_storage can save Request which occur error. banned_storage can save Banned Request whose block_checker can be override.

Requirements

  • Python 3.3+
  • Works on Linux, Windows, Mac OSX, BSD

Install

The quick way:

pip install scrapy-httpcache

OR copy this middleware to your scrapy project.

Documentation

In settings.py, for example:

# -----------------------------------------------------------------------------
# SCRAPY HTTPCACHE SETTINGS
# -----------------------------------------------------------------------------
DOWNLOADER_MIDDLEWARES.update({
    'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': None,
    'scrapy_httpcache.downloadermiddlewares.httpcache.AsyncHttpCacheMiddleware': 900,
})

HTTPCACHE_ENABLED = True
HTTPCACHE_IGNORE_HTTP_CODES = [301, 302, 500, 503]
HTTPCACHE_STORAGE = 'scrapy_httpcache.extensions.httpcache_storage.MongoDBCacheStorage'
HTTPCACHE_MONGODB_HOST = '127.0.0.1'
HTTPCACHE_MONGODB_PORT = 27017
HTTPCACHE_MONGODB_USERNAME = 'root'
HTTPCACHE_MONGODB_PASSWORD = 'password'
HTTPCACHE_MONGODB_CONNECTION_POOL_KWARGS = {}
HTTPCACHE_MONGODB_AUTH_DB = 'admin'
HTTPCACHE_MONGODB_DB = 'cache_storage'
HTTPCACHE_MONGODB_COLL = 'cache'

# -----------------------------------------------------------------------------
# SCRAPY HTTPCACHE BANNED SETTINGS (optional)
# -----------------------------------------------------------------------------
BANNED_STORAGE = 'scrapy_httpcache.extensions.banned_storage.MongoBannedStorage'

# -----------------------------------------------------------------------------
# SCRAPY HTTPCACHE REQUEST ERROR SETTINGS (optional)
# -----------------------------------------------------------------------------
REQUEST_ERROR_STORAGE = 'scrapy_httpcache.extensions.request_error_storage.MongoRequestErrorStorage'

If you want to remove banned response, use send_catch_log_deferred to send signal to scrapy_httpcache.signals.remove_banned with kwargs contains (spider, response, exception), which callback function return a Deferred.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scrapy-httpcache, version 0.0.5
Filename, size File type Python version Upload date Hashes
Filename, size scrapy-httpcache-0.0.5.tar.gz (13.5 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page