Skip to main content

Scrapy middleware for submitting URLs to the Internet Archive Wayback Machine

Project description

Scrapy Wayback Middleware

Middleware for submitting all scraped response URLs to the Internet Archive Wayback Machine for archival.

Installation

pip install scrapy-wayback-middleware

Setup

Add scrapy_wayback_middleware.WaybackMiddleware to your project's SPIDER_MIDDLEWARES settings.

Configuration

To configure custom behavior for certain methods, subclass WaybackMiddleware and override the get_item_urls method to pull additional links to archive from individual items or handle_wayback to change how responses from the Wayback Machine are handled.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-wayback-middleware-0.0.1.tar.gz (2.6 kB view hashes)

Uploaded Source

Built Distribution

scrapy_wayback_middleware-0.0.1-py3-none-any.whl (4.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page