Skip to main content

A high-performance WARC parsing library for Python written in C++/Cython.

Project description

FastWARC

FastWARC is a high-performance WARC parsing library for Python written in C++/Cython. The API is inspired in large parts by WARCIO, but does not aim at being a drop-in replacement. FastWARC supports compressed and uncompressed WARC/1.0 and WARC/1.1 streams. Supported compression algorithms are GZip and LZ4.

FastWARC belongs to the ChatNoir Resiliparse toolkit for fast and robust web data processing.

Installing FastWARC

Pre-built FastWARC binaries for most Linux platforms can be installed from PyPi:

pip install fastwarc

However: the Linux binaries are provided solely for your convenience. Since they are built on the very old manylinux base system for better compatibility, their performance isn't optimal (though still better than WARCIO). For best performance, see the next section on how to build FastWARC yourself.

Building FastWARC From Source

You can compile FastWARC either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:

sudo apt install build-essential python3-dev zlib1g-dev liblz4-dev

To build and install FastWARC from PyPi, run

pip install --no-binary fastwarc fastwarc

That's it. If you prefer to build and install directly from this repository instead, run:

pip install -e fastwarc

To build the wheels without installing them, run:

pip wheel -e fastwarc

# Or:
pip install build && python -m build --wheel fastwarc

Usage Instructions

For detailed usage instructions, please consult the FastWARC User Manual.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FastWARC-0.6.0.tar.gz (333.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

FastWARC-0.6.0-cp310-cp310-win_amd64.whl (226.1 kB view details)

Uploaded CPython 3.10Windows x86-64

FastWARC-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

FastWARC-0.6.0-cp310-cp310-macosx_10_14_x86_64.whl (369.6 kB view details)

Uploaded CPython 3.10macOS 10.14+ x86-64

FastWARC-0.6.0-cp39-cp39-win_amd64.whl (225.0 kB view details)

Uploaded CPython 3.9Windows x86-64

FastWARC-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

FastWARC-0.6.0-cp39-cp39-macosx_10_14_x86_64.whl (369.2 kB view details)

Uploaded CPython 3.9macOS 10.14+ x86-64

FastWARC-0.6.0-cp38-cp38-win_amd64.whl (226.6 kB view details)

Uploaded CPython 3.8Windows x86-64

FastWARC-0.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

FastWARC-0.6.0-cp38-cp38-macosx_10_14_x86_64.whl (368.5 kB view details)

Uploaded CPython 3.8macOS 10.14+ x86-64

FastWARC-0.6.0-cp37-cp37m-win_amd64.whl (220.5 kB view details)

Uploaded CPython 3.7mWindows x86-64

FastWARC-0.6.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

FastWARC-0.6.0-cp37-cp37m-macosx_10_14_x86_64.whl (367.1 kB view details)

Uploaded CPython 3.7mmacOS 10.14+ x86-64

File details

Details for the file FastWARC-0.6.0.tar.gz.

File metadata

  • Download URL: FastWARC-0.6.0.tar.gz
  • Upload date:
  • Size: 333.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.0.tar.gz
Algorithm Hash digest
SHA256 c1497683002bdde2e5d2789511aa834e10769cad52bce64d6bf905bc33cc5184
MD5 5e0560d03420a875ce987f6abb5ea81c
BLAKE2b-256 d2124dac989a5a6af826f0ac3f7e51d522b70e39a99da1be7596664da3636457

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.6.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 226.1 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 1b3d7f3350424161d06c5141a7ce4defd77aa1aa9b89f8a0833179ff48d720e4
MD5 a418bb40d1b616011533f6f4edf1f844
BLAKE2b-256 456b3fc14af7319f2efae45fd07224f94f961233adbfa9f8840db1b857d090e8

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dd6ad8345ff3595c141d842d532f646d0096c3a5df2e4b971ea9f87fc42dd03d
MD5 15836fd4b92bf9f5eed52b266864aed6
BLAKE2b-256 fd1e15f0a9e7f7f517ca4a3acab733fa5e10b60e9dad25bd58592abce22de5fc

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.0-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: FastWARC-0.6.0-cp310-cp310-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 369.6 kB
  • Tags: CPython 3.10, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.0-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 1744a6318d77b4277eb2d941e1a06749b73fb06bc129ebeedd5e6bddb3dfb6f1
MD5 eb020178c548dcfa35010d4431efc93a
BLAKE2b-256 f415b86c6c96fd0eb303faeb25e23bd973749afdadf9714422ce166be82c9c78

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.6.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 225.0 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 d473ff64046660b2984a36f3e49223c804b384c336ae8b053e47d849370255e7
MD5 20d8317d24b25e44d91b259df5cfa355
BLAKE2b-256 3721d1b77391aedd879f3d4887bfab99f1248fca6d1d557de391e938ec7f6ed2

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9148c6370a8b22d4278e82c358b8467702e554b46a0caeaddbe0738ce655430c
MD5 0d5c352d50c90551b3cc431f8df80536
BLAKE2b-256 cb25dcc510ee462e9570f10edaea1704b8806d0ee8691a1cc7ebfcd7511a69e3

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.0-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: FastWARC-0.6.0-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 369.2 kB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.0-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 1a8859b3a111af5c865d95ccbdbb196b6f9973ab8961516de1f109450e9eab33
MD5 e1dbe697f2cbd10159aa9db5280d0df5
BLAKE2b-256 3a77893e01610b3d89c64abcefa9d0f1d15e3279a877da1c06fa217fbb8d101b

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.6.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 226.6 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 e93b9fa95fc9c661f28d7a54b2d3f42b0400bad6e90b87eee1fc0ec2af088b7e
MD5 c99f2022d67ec00d019f0866e5189168
BLAKE2b-256 d979404e9edc3fe91f064f16ede63f933854171c4e253930cbbb673896885bfd

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8b2f94fc96c04ed584d44a1acf7bd7d22e185790e8053de18238e567df7ddb17
MD5 8615e10826bd4d206ddd6201ea062d67
BLAKE2b-256 8c878900884ac9108afa2b65d01c86fe41af00672ab5857335838382cd81d83a

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.0-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: FastWARC-0.6.0-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 368.5 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.0-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 704fde2214221af2f681ee95020c17a58449ee78db38199ebfaf3c61e750a498
MD5 1d4a95598a9b73c1d74ec321a4a91451
BLAKE2b-256 428e8f770d9123ff7da31f6e60c780b35de4adddce13f11179abc29d2f2cf7be

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.6.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 220.5 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 6da70d82fdbacf58a788912c4ebbcb11bb647eed2756d3e4c6e7928821724bbe
MD5 fbe5790dbd4ec2a9a34bf7d72c27b514
BLAKE2b-256 9cd1400f2fa8cb35bee81fc376101d11d6d24f7f03020ffe927afa008a23454b

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.6.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 99563a2cc74691932b13af2d0443df1066c188707a2517c48a8009f4227fadf2
MD5 63cafee804b52712c1f64a43a14980cf
BLAKE2b-256 335e26d9f755e35184ee7eaaaf6ceb11211cc5a7976e5e6c845d607f1c4009f3

See more details on using hashes here.

File details

Details for the file FastWARC-0.6.0-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: FastWARC-0.6.0-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 367.1 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for FastWARC-0.6.0-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 905ff968030acab059aaa4ecbbeba9538a568e5991d1a1224bd6290721b84373
MD5 f99df17576d57ff059521c4b237fed34
BLAKE2b-256 debe52ad70c7c1ae0670fbf641d32fce270b2354f04f90a0a4dc84d49ba2a15b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page