A webarchive extension for Scrapy
Project description
Scrapy Webarchive
Scrapy Webarchive is a plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.
Features
- Save web crawls in WACZ format (multiple storages supported; local and cloud).
- Crawl against WACZ format archives.
- Integrate seamlessly with Scrapy’s spider request and response cycle.
Compatibility
- Python 3.7, 3.8, 3.9, 3.10, 3.11 and 3.12
Documentation
Documentation is available online at developers.thequestionmark.org/scrapy-webarchive/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapy_webarchive-0.2.0.tar.gz
(21.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy_webarchive-0.2.0.tar.gz.
File metadata
- Download URL: scrapy_webarchive-0.2.0.tar.gz
- Upload date:
- Size: 21.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
308978ae59a6d473a22ddcb5a7cf4b290fed2917c0e29b278fafcd3b64f4584f
|
|
| MD5 |
19dfaf38a700dbb730d9ab3bfd024ba2
|
|
| BLAKE2b-256 |
824926eb82e5884e1fcab1ac01cb1054ea2d7ea038af85a0ef758dfaa0e90db1
|
File details
Details for the file scrapy_webarchive-0.2.0-py3-none-any.whl.
File metadata
- Download URL: scrapy_webarchive-0.2.0-py3-none-any.whl
- Upload date:
- Size: 20.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0337a3f4cb7eed8e55af36fab041cce05f57469126002b50eeaf572eb094b50a
|
|
| MD5 |
bccd6f066de4a795038e7fb1af7f8573
|
|
| BLAKE2b-256 |
1d2f7ffec455c284ef80dd5f056efc0d1d521d9e4e342877e969e60583a0259d
|