Skip to main content

A webarchive extension for Scrapy

Project description

Scrapy Webarchive

Docs

Scrapy Webarchive is a plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.

Features

  • Save web crawls in WACZ format (multiple storages supported; local and cloud).
  • Crawl against WACZ format archives.
  • Integrate seamlessly with Scrapy’s spider request and response cycle.

Compatibility

  • Python 3.7, 3.8, 3.9, 3.10, 3.11 and 3.12

Documentation

Documentation is available online at developers.thequestionmark.org/scrapy-webarchive/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_webarchive-0.4.1.tar.gz (26.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_webarchive-0.4.1-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_webarchive-0.4.1.tar.gz.

File metadata

  • Download URL: scrapy_webarchive-0.4.1.tar.gz
  • Upload date:
  • Size: 26.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for scrapy_webarchive-0.4.1.tar.gz
Algorithm Hash digest
SHA256 a3cb404041235332499346853c115b21df9ada911263806ebe490781d7f003cc
MD5 45bdfe9d262cd2f02ff3bd5ced175389
BLAKE2b-256 4fee47dd00956d1854d52683cf6d4c5326bedfa8b7994241d07fae7894533c57

See more details on using hashes here.

File details

Details for the file scrapy_webarchive-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_webarchive-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f49b58e5ce6f3287883016c08a3d404c74a4c851681553a55e7ff18a595536d8
MD5 c7e0ede1c6569b48f1337ae99beb6d5e
BLAKE2b-256 6664833fadcc1847380ff7bddc9e7be51d198cc70285fd51266a83f7d907db63

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page