A webarchive extension for Scrapy
Project description
Scrapy Webarchive
Scrapy Webarchive is a plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.
Features
- Save web crawls in WACZ format (multiple storages supported; local and cloud).
- Crawl against WACZ format archives.
- Integrate seamlessly with Scrapy’s spider request and response cycle.
Compatibility
- Python 3.7, 3.8, 3.9, 3.10, 3.11 and 3.12
Documentation
Documentation is available online at developers.thequestionmark.org/scrapy-webarchive/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapy_webarchive-0.1.0.tar.gz
(21.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy_webarchive-0.1.0.tar.gz.
File metadata
- Download URL: scrapy_webarchive-0.1.0.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66168565100999fa29d50f796c8d77048195870fadb9a135a5efdffa9bc6146d
|
|
| MD5 |
5957c623a6e2eb5fe8a16ac859925145
|
|
| BLAKE2b-256 |
0b84164268567d91715d4dabd56c72a05ea6f8c1d5e5eb2487096fd941cd2bb2
|
File details
Details for the file scrapy_webarchive-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scrapy_webarchive-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8d45d2bbe4496e7a48a19ae73a78687b5d7ce2f21e6009e0c70b73224d703ef
|
|
| MD5 |
1ce705332aa91670a1f9da0218e1d162
|
|
| BLAKE2b-256 |
af84b1ee56077c90e78c6b46002e60264f3f3aa55fc0d62dd1ad51996f2128d9
|