Skip to main content

Scrapy middleware for submitting URLs to the Internet Archive Wayback Machine

Project description

Scrapy Wayback Middleware

Build Status

Middleware for submitting all scraped response URLs to the Internet Archive Wayback Machine for archival.

Installation

pip install scrapy-wayback-middleware

Setup

Add scrapy_wayback_middleware.WaybackMiddleware to your project's SPIDER_MIDDLEWARES settings.

Configuration

To configure custom behavior for certain methods, subclass WaybackMiddleware and override the get_item_urls method to pull additional links to archive from individual items or handle_wayback to change how responses from the Wayback Machine are handled.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scrapy-wayback-middleware, version 0.0.2
Filename, size File type Python version Upload date Hashes
Filename, size scrapy_wayback_middleware-0.0.2-py3-none-any.whl (4.2 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size scrapy-wayback-middleware-0.0.2.tar.gz (2.6 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page