Scrapy middleware for submitting URLs to the Internet Archive Wayback Machine
Project description
Scrapy Wayback Middleware
Middleware for submitting all scraped response URLs to the Internet Archive Wayback Machine for archival.
Installation
pip install scrapy-wayback-middleware
Setup
Add scrapy_wayback_middleware.WaybackMiddleware
to your project's SPIDER_MIDDLEWARES
settings.
Configuration
To configure custom behavior for certain methods, subclass WaybackMiddleware
and override the get_item_urls
method to pull additional links to archive from individual items or handle_wayback
to change how responses from the Wayback Machine are handled.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for scrapy-wayback-middleware-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52cfd62305eac1eede931dc922ecd89dc1f3dfd3ab5e40607cc0708c6ef9a4ce |
|
MD5 | 1bb3120d5dbb0358ebe143bc604dc6a3 |
|
BLAKE2b-256 | 87e87275f89556ff3510e153776174d8ab9bcdc6ebf6ab2f0323bb208131eca9 |
Close
Hashes for scrapy_wayback_middleware-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0722278dad1cdae3e3164d9ef24aa45eaacf4bed581b39709663a744a766dc5 |
|
MD5 | f13ba7e4c7a25b2e994a98ee6c431ecb |
|
BLAKE2b-256 | 1d35e4d78b1c23579e382897a6ff53fd2e7665f4c3aa411bd78f1eaf115155f7 |