Skip to main content

A high-performance WARC parsing library for Python written in C++/Cython.

Project description

FastWARC

FastWARC is a high-performance WARC parsing library for Python written in C++/Cython. The API is inspired in large parts by WARCIO, but does not aim at being a drop-in replacement. FastWARC supports compressed and uncompressed WARC/1.0 and WARC/1.1 streams. Supported compression algorithms are GZip and LZ4.

FastWARC belongs to the ChatNoir Resiliparse toolkit for fast and robust web data processing.

Installing FastWARC

Pre-built FastWARC binaries for most Linux platforms can be installed from PyPi:

pip install fastwarc

However: the Linux binaries are provided solely for your convenience. Since they are built on the very old manylinux base system for better compatibility, their performance isn't optimal (though still better than WARCIO). For best performance, see the next section on how to build FastWARC yourself.

Building FastWARC From Source

You can compile FastWARC either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:

sudo apt install build-essential python3-dev zlib1g-dev liblz4-dev

To build and install FastWARC from PyPi, run

pip install --no-binary fastwarc fastwarc

That's it. If you prefer to build and install directly from this repository instead, run:

pip install -e fastwarc

To build the wheels without installing them, run:

pip wheel -e fastwarc

# Or:
pip install build && python -m build --wheel fastwarc

Usage Instructions

For detailed usage instructions, please consult the FastWARC User Manual.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FastWARC-0.6.1.tar.gz (333.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

FastWARC-0.6.1-cp310-cp310-win_amd64.whl (226.0 kB view details)

Uploaded CPython 3.10Windows x86-64

FastWARC-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

FastWARC-0.6.1-cp310-cp310-macosx_10_14_x86_64.whl (369.5 kB view details)

Uploaded CPython 3.10macOS 10.14+ x86-64

FastWARC-0.6.1-cp39-cp39-win_amd64.whl (224.9 kB view details)

Uploaded CPython 3.9Windows x86-64

FastWARC-0.6.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

FastWARC-0.6.1-cp39-cp39-macosx_10_14_x86_64.whl (369.2 kB view details)

Uploaded CPython 3.9macOS 10.14+ x86-64

FastWARC-0.6.1-cp38-cp38-win_amd64.whl (226.5 kB view details)

Uploaded CPython 3.8Windows x86-64

FastWARC-0.6.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

FastWARC-0.6.1-cp38-cp38-macosx_10_14_x86_64.whl (368.4 kB view details)

Uploaded CPython 3.8macOS 10.14+ x86-64

FastWARC-0.6.1-cp37-cp37m-win_amd64.whl (220.4 kB view details)

Uploaded CPython 3.7mWindows x86-64

FastWARC-0.6.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

FastWARC-0.6.1-cp37-cp37m-macosx_10_14_x86_64.whl (367.1 kB view details)

Uploaded CPython 3.7mmacOS 10.14+ x86-64

File details

Details for the file FastWARC-0.6.1.tar.gz.

File metadata

  • Download URL: FastWARC-0.6.1.tar.gz
  • Upload date:
  • Size: 333.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.1.tar.gz
Algorithm Hash digest
SHA256 f4e8729c412176d7db6b6b755c9981318890047b0787390ddd5711c45fd88ec9
MD5 5be6c7803a6c5c71df885af883ff3a04
BLAKE2b-256 6e37e96ce8c29d2e5b898dbe502c6af881600d41a628e297cd1de28d67d44264

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.6.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 226.0 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 ddeec0d532e324edc39f088522e690df812f2049abbc6161bbbb241effa56ac2
MD5 26c9c019b1fb0acd9ad8f6149616aca7
BLAKE2b-256 c6c702261f257ed18ee58d4f20794049498ba66435929bee5e397d76a1fdf217

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b79802bb3ea2addfc47aeabba781120b8968321a58b6dea320e02301c55d1e22
MD5 93631122abdee2df20223ab7dc36c24a
BLAKE2b-256 08af68be60f86c403617c216800edc1425be8d57e44234ccb52ab45ae4fcf7ac

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.1-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: FastWARC-0.6.1-cp310-cp310-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 369.5 kB
  • Tags: CPython 3.10, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.1-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 74958f0658fbf13d9193b4891f579c06e828c8690c6f70dfb0d58eb94040d87e
MD5 4dfef623778bf316789c3726657c743d
BLAKE2b-256 80c9fc9f9f87366abe72e6999d6c2486816358fbe84eded3be3812828242f668

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.6.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 224.9 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 35a6a424dcd4f58020df5fc11cd9b198ef359a4ff9a568b03bd476e4f2b36fcb
MD5 7e76bb1e604198df70a75ba1683927e7
BLAKE2b-256 04b0f8792f71f3c855de7f074c4a9c83007f620ed8aa0e57a464f13727829884

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.6.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c20cddbdc9bffdf013aab517b6a34fde8b0238df3662e3f272250e7d126fc71c
MD5 caed8f474ac926e80dc751acf4929c66
BLAKE2b-256 55621ad9256306e3dff4ba8c195d031375ffcf0e7f7006f12f5ba677328f87e9

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.1-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: FastWARC-0.6.1-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 369.2 kB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.1-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 bc253c3b0de823deb427cb03106a54d0cd92ea0fc528aa0d2f24e10c29daca17
MD5 57bbff2cc5012b2a0ac9caaae95635ce
BLAKE2b-256 cfa2b82680e76eb292a481d507a96f307fc0ea71227bb147d5a4c36f59967dec

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.6.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 226.5 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 3d4d26775d18d31ab24a78240f5e63a7a7699fac8619b6f41c7f813f380b4545
MD5 8c707363c2a0a87d4b1bfcc502f4c267
BLAKE2b-256 34040c241e32ef796841dd726ccdc129c6d0182ee14ad08f7b1446c056cb2cc6

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.6.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a46a7f385cbfe2f3a637185cc906aad72e9be6c14a57a2a86d483882f14eef51
MD5 1f5023c9526154a827c59da8e93cb386
BLAKE2b-256 d86caf9b8bf69640ce3e52701bde7e06af8d5b7bb22e61d17803a06e2a0d4bc9

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.1-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: FastWARC-0.6.1-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 368.4 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.1-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 00f69c5ec640aa2439440873833a3b6c6415369208d145ddcfc81a5a7de6ca89
MD5 3fd5132aa9c8d075e16bde655ae4719e
BLAKE2b-256 c1f161796d472bcf8b8908eea4230c5314d0b8a015f94d486d1eb144c9df42c8

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.6.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 220.4 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 cd581cd70a907816e417349c306dc30290e0c0f41d8e244d6683df8dadfebebf
MD5 f26627c4bd04e8c0d337c18b2828a82d
BLAKE2b-256 d8f8f60114931d4c5e577abb7eb9999257c3972da01086f27539855dff8afecc

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.6.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6df4622a86a7bf94bf759793d76d941497e34b0710100104266264ae7717632a
MD5 4064880aea0929151ac2064da138083d
BLAKE2b-256 9b5a8a19d960bb9ac6d5364a645b0e9b343b21de033d5052dc793a6ed71e8df5

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.1-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: FastWARC-0.6.1-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 367.1 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.1-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 0d8e10682543fb47a1333356c4a2a8ade0ed77ed9dba42d787c58c3f6e771576
MD5 abe7aad4c9067704a1bc5b5ce33eb38a
BLAKE2b-256 0a877f447f1f1b2bee552146dd66bfdb745a69f244c8b11e6ccebdeed0dee740

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page