Skip to main content

A high-performance WARC parsing library for Python written in C++/Cython.

Project description

FastWARC

FastWARC is a high-performance WARC parsing library for Python written in C++/Cython. The API is inspired in large parts by WARCIO, but does not aim at being a drop-in replacement. FastWARC supports compressed and uncompressed WARC/1.0 and WARC/1.1 streams. Supported compression algorithms are GZip and LZ4.

FastWARC belongs to the ChatNoir Resiliparse toolkit for fast and robust web data processing.

Installing FastWARC

Pre-built FastWARC binaries for most Linux platforms can be installed from PyPi:

pip install fastwarc

However: the Linux binaries are provided solely for your convenience. Since they are built on the very old manylinux base system for better compatibility, their performance isn't optimal (though still better than WARCIO). For best performance, see the next section on how to build FastWARC yourself.

Building FastWARC From Source

You can compile FastWARC either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:

sudo apt install build-essential python3-dev zlib1g-dev liblz4-dev

To build and install FastWARC from PyPi, run

pip install --no-binary fastwarc fastwarc

That's it. If you prefer to build and install directly from this repository instead, run:

pip install -e fastwarc

To build the wheels without installing them, run:

pip wheel -e fastwarc

# Or:
pip install build && python -m build --wheel fastwarc

Usage Instructions

For detailed usage instructions, please consult the FastWARC User Manual.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FastWARC-0.4.0.tar.gz (315.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

FastWARC-0.4.0-cp310-cp310-win_amd64.whl (208.6 kB view details)

Uploaded CPython 3.10Windows x86-64

FastWARC-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

FastWARC-0.4.0-cp310-cp310-macosx_10_14_x86_64.whl (349.4 kB view details)

Uploaded CPython 3.10macOS 10.14+ x86-64

FastWARC-0.4.0-cp39-cp39-win_amd64.whl (207.7 kB view details)

Uploaded CPython 3.9Windows x86-64

FastWARC-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

FastWARC-0.4.0-cp39-cp39-macosx_10_14_x86_64.whl (349.0 kB view details)

Uploaded CPython 3.9macOS 10.14+ x86-64

FastWARC-0.4.0-cp38-cp38-win_amd64.whl (209.1 kB view details)

Uploaded CPython 3.8Windows x86-64

FastWARC-0.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

FastWARC-0.4.0-cp38-cp38-macosx_10_14_x86_64.whl (348.1 kB view details)

Uploaded CPython 3.8macOS 10.14+ x86-64

FastWARC-0.4.0-cp37-cp37m-win_amd64.whl (203.3 kB view details)

Uploaded CPython 3.7mWindows x86-64

FastWARC-0.4.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

FastWARC-0.4.0-cp37-cp37m-macosx_10_14_x86_64.whl (347.1 kB view details)

Uploaded CPython 3.7mmacOS 10.14+ x86-64

File details

Details for the file FastWARC-0.4.0.tar.gz.

File metadata

  • Download URL: FastWARC-0.4.0.tar.gz
  • Upload date:
  • Size: 315.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for FastWARC-0.4.0.tar.gz
Algorithm Hash digest
SHA256 ac7ec874906988de3162864d0431816e3e4238cbc8416c119112dd2b11f99923
MD5 443003eb007dcee97ec6b84b2ad560da
BLAKE2b-256 0d6a6166d36efa63dab6c8f2b6ba4ead73a0c6c03b9bb4a5bc441c681355eabf

See more details on using hashes here.

File details

Details for the file FastWARC-0.4.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.4.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 208.6 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for FastWARC-0.4.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 570b4bdafe21a8244f29c7f1a2d2bc7754ec25848c3fc40ac4ebd6b6474c30e7
MD5 db74efba080767c8c4791770c28b0c84
BLAKE2b-256 3ff30d8d9aadfe534ebeeac87a94599bf27a6dd3104c008049909c1fa913fd55

See more details on using hashes here.

File details

Details for the file FastWARC-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 37649809d5ef95c9773d44433eda7bcd176b21918a193b42203bdbed4460763b
MD5 30fc7b2b7b482521430aac9b238a0338
BLAKE2b-256 3a17e2b50e7ec0af95083c6c72887ce95d30af5dddfaa183ec154b30d9690f69

See more details on using hashes here.

File details

Details for the file FastWARC-0.4.0-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: FastWARC-0.4.0-cp310-cp310-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 349.4 kB
  • Tags: CPython 3.10, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for FastWARC-0.4.0-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 20ddabfec4394778c8ba841a33ca996562a17333582490fafc7ff3a3396bfc84
MD5 af91567ca5aa408328fe5d2d6fab9e64
BLAKE2b-256 a7f0b7bf6147f3cf332beb873154c33a319fad280dd8924127bd4fe92668db70

See more details on using hashes here.

File details

Details for the file FastWARC-0.4.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.4.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 207.7 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for FastWARC-0.4.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 f6f71238ca8c1e434e54a5287d0e575db7a333930bf8980568315e9c777ded1f
MD5 391f2b766930e236c7a9772454e6efef
BLAKE2b-256 830dbec005a797c4d2a5c717863f4526eea35dd1aa7a5efc164c2622387ff795

See more details on using hashes here.

File details

Details for the file FastWARC-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a1120d1e5c207cf9113d462c74180d2a67dbc491a7307260d68fb631b47ad09e
MD5 50b447e848ad130018fda4cc3166b307
BLAKE2b-256 8089d77fac753fc0d7693b706c061145ec61437ee7dea4009afad89ef851aeac

See more details on using hashes here.

File details

Details for the file FastWARC-0.4.0-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: FastWARC-0.4.0-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 349.0 kB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for FastWARC-0.4.0-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 d7dee6e8aeb1559f3576428df23be4ccab9a4f64fc9ed5cd306c70ee3de4c0bd
MD5 464b8d4e132af275f8c8f416e1dbd19b
BLAKE2b-256 33f5bb8c8e2bb7d9651de9416d519e1574c2c96286960bf34b6c625c34311793

See more details on using hashes here.

File details

Details for the file FastWARC-0.4.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.4.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 209.1 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for FastWARC-0.4.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 385122d5a77984acd5cb7592bdeceb87861f1251fb4213dd077f1229942673af
MD5 5edd675110960406a6c63899f0f05a04
BLAKE2b-256 8639afd7d54cf719d13fc4f05bc104cde013e30a6d1e35920b7d50bdd2294dcb

See more details on using hashes here.

File details

Details for the file FastWARC-0.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0719b4b5526a35581476f19ef47821acb7839ed82b96fdae6e1a0d8872c9667d
MD5 94f12b7fd27a292923875bd4509c2635
BLAKE2b-256 af5c165bdefc7eb39e2682d729af55c32167c70dedf7fd1efc3aff18aae67dc6

See more details on using hashes here.

File details

Details for the file FastWARC-0.4.0-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: FastWARC-0.4.0-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 348.1 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for FastWARC-0.4.0-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 7656f7b1c7d57ab7362addf372b9eb7f49d76f95a80e9419e450927793ce4faf
MD5 130f6805cb5a45062fe0a37a6d026763
BLAKE2b-256 56cc7d8b82785fc178ebde4ae074d615ff34753b44a29cfdad71d940a17be837

See more details on using hashes here.

File details

Details for the file FastWARC-0.4.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.4.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 203.3 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for FastWARC-0.4.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 8698a7961d37a35424427b82dcf8c95386b8bef52fc5d8c98e57078cda6baf9d
MD5 4f377c435beeec0d7016b87e3864bff6
BLAKE2b-256 c91924a6583ab5adb6c5b45866c141a2772b46f2b4c5743cd7d82f256990d692

See more details on using hashes here.

File details

Details for the file FastWARC-0.4.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.4.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5e81fdc251c4c9723e502c4963bb1239a77f0394fd3e22a6ce605e6efa94728d
MD5 5685d070dfc1328975e2ae5326978983
BLAKE2b-256 428621d11cdc77610d7721c64bb2585c5eb4c6973478000443d84cd93892df3d

See more details on using hashes here.

File details

Details for the file FastWARC-0.4.0-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: FastWARC-0.4.0-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 347.1 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for FastWARC-0.4.0-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 60c4193f0beac9f1732b37548d7f3a7dc5d271dd867538f5cdeaf7248e769db6
MD5 b2209fa89b314e5e53edd7759917abfd
BLAKE2b-256 2bfde24e9259d8dd093b92989c9af3f7a29436df08e3a94a9df2ba5ea9b68c27

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page