A webarchive extension for Scrapy
Project description
Scrapy Webarchive
Scrapy Webarchive is a plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.
Features
- Save web crawls in WACZ format (multiple storages supported; local and cloud).
- Crawl against WACZ format archives.
- Integrate seamlessly with Scrapy’s spider request and response cycle.
Compatibility
- Python 3.7, 3.8, 3.9, 3.10, 3.11 and 3.12
Documentation
Documentation is available online at developers.thequestionmark.org/scrapy-webarchive/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapy_webarchive-0.4.1.tar.gz
(26.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy_webarchive-0.4.1.tar.gz.
File metadata
- Download URL: scrapy_webarchive-0.4.1.tar.gz
- Upload date:
- Size: 26.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3cb404041235332499346853c115b21df9ada911263806ebe490781d7f003cc
|
|
| MD5 |
45bdfe9d262cd2f02ff3bd5ced175389
|
|
| BLAKE2b-256 |
4fee47dd00956d1854d52683cf6d4c5326bedfa8b7994241d07fae7894533c57
|
File details
Details for the file scrapy_webarchive-0.4.1-py3-none-any.whl.
File metadata
- Download URL: scrapy_webarchive-0.4.1-py3-none-any.whl
- Upload date:
- Size: 29.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f49b58e5ce6f3287883016c08a3d404c74a4c851681553a55e7ff18a595536d8
|
|
| MD5 |
c7e0ede1c6569b48f1337ae99beb6d5e
|
|
| BLAKE2b-256 |
6664833fadcc1847380ff7bddc9e7be51d198cc70285fd51266a83f7d907db63
|