Skip to main content

Python library to work with ARC and WARC files

Project description

Note: This is a fork of the original (now dead) warc repository.

WARC (Web ARChive) is a file format for storing web crawls.

http://bibnum.bnf.fr/WARC/

This warc library makes it very easy to work with WARC files.:

import warc
with warc.open("test.warc") as f:
    for record in f:
        print(record['WARC-Target-URI'], record['Content-Length'])

And WET files.:

import warc
with warc.open("test.warc.wet") as f:
    for record in f:
        print(record['WARC-Target-URI'], record['Content-Length'])

Documentation

The documentation of the warc library is available at http://warc.readthedocs.org/.

Apart from the install from pip, which will not work for this warc3 version, the interface as described there is unchanged.

License

This software is licensed under GPL v2. See LICENSE file for details.

Authors

Original Python2 Versions:

  • Anand Chitipothu

  • Noufal Ibrahim

Python3 Port:

  • Ryan Chartier

  • Jan Pieter Bruins Slot

  • Almer S. Tigelaar

  • Sean MacAvaney (3.10)

WET and seek support

  • Willian Zhang

Change Log

0.2.4 Python 3.10 compatibility (thanks to @seanmacavaney)

0.2.3 Support seeking in WARC/WET

0.2.2 Allow WET parse

older… see https://github.com/internetarchive/warc

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

warc3_wet-0.2.5.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

warc3_wet-0.2.5-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file warc3_wet-0.2.5.tar.gz.

File metadata

  • Download URL: warc3_wet-0.2.5.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for warc3_wet-0.2.5.tar.gz
Algorithm Hash digest
SHA256 15e50402dabaa1e95307f1e2a6169cfd5f137b70761d9f0b16a10aa6de227970
MD5 34564d8e7c0bc43b3763a5986753ff4a
BLAKE2b-256 21c624c9b4a2b2b1741b57d7f44ff9790eb4ef28de898c17c2b1ca1efabf8c96

See more details on using hashes here.

File details

Details for the file warc3_wet-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: warc3_wet-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for warc3_wet-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5a9a525383fb1af159734baa75f349a7c4ec7bccd1b938681b5748515d2bf624
MD5 77424fa299f20991f70f8c4a6ed4737f
BLAKE2b-256 f4990a5582a106679fd9439af51631b6c6cb627fd96cbc85a02927e6812605b8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page