Skip to main content

A webarchive extension for Scrapy

Project description

Scrapy Webarchive

Docs

Scrapy Webarchive is a plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.

Features

  • Save web crawls in WACZ format (multiple storages supported; local and cloud).
  • Crawl against WACZ format archives.
  • Integrate seamlessly with Scrapy’s spider request and response cycle.

Compatibility

  • Python 3.7, 3.8, 3.9, 3.10, 3.11 and 3.12

Documentation

Documentation is available online at developers.thequestionmark.org/scrapy-webarchive/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_webarchive-0.1.0.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_webarchive-0.1.0-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_webarchive-0.1.0.tar.gz.

File metadata

  • Download URL: scrapy_webarchive-0.1.0.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.8

File hashes

Hashes for scrapy_webarchive-0.1.0.tar.gz
Algorithm Hash digest
SHA256 66168565100999fa29d50f796c8d77048195870fadb9a135a5efdffa9bc6146d
MD5 5957c623a6e2eb5fe8a16ac859925145
BLAKE2b-256 0b84164268567d91715d4dabd56c72a05ea6f8c1d5e5eb2487096fd941cd2bb2

See more details on using hashes here.

File details

Details for the file scrapy_webarchive-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_webarchive-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d8d45d2bbe4496e7a48a19ae73a78687b5d7ce2f21e6009e0c70b73224d703ef
MD5 1ce705332aa91670a1f9da0218e1d162
BLAKE2b-256 af84b1ee56077c90e78c6b46002e60264f3f3aa55fc0d62dd1ad51996f2128d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page