Skip to main content

A high-performance WARC parsing library for Python written in C++/Cython.

Project description

FastWARC

FastWARC is a high-performance WARC parsing library for Python written in C++/Cython. The API is inspired in large parts by WARCIO, but does not aim at being a drop-in replacement. FastWARC supports compressed and uncompressed WARC/1.0 and WARC/1.1 streams. Supported compression algorithms are GZip and LZ4.

FastWARC belongs to the ChatNoir Resiliparse toolkit for fast and robust web data processing.

Installing FastWARC

Pre-built FastWARC binaries for most Linux platforms can be installed from PyPi:

pip install fastwarc

However: the Linux binaries are provided solely for your convenience. Since they are built on the very old manylinux base system for better compatibility, their performance isn't optimal (though still better than WARCIO). For best performance, see the next section on how to build FastWARC yourself.

Building FastWARC From Source

You can compile FastWARC either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:

sudo apt install build-essential python3-dev zlib1g-dev liblz4-dev

To build and install FastWARC from PyPi, run

pip install --no-binary fastwarc fastwarc

That's it. If you prefer to build and install directly from this repository instead, run:

pip install -e fastwarc

To build the wheels without installing them, run:

pip wheel -e fastwarc

# Or:
pip install build && python -m build --wheel fastwarc

Usage Instructions

For detailed usage instructions, please consult the FastWARC User Manual.

Cite Us

If you use FastWARC, please consider citing our OSSYM 2021 abstract paper:

@InProceedings{bevendorff:2021,
  author =                {Janek Bevendorff and Martin Potthast and Benno Stein},
  booktitle =             {3rd International Symposium on Open Search Technology (OSSYM 2021)},
  editor =                {Andreas Wagner and Christian Guetl and Michael Granitzer and Stefan Voigt},
  month =                 oct,
  publisher =             {International Open Search Symposium},
  site =                  {CERN, Geneva, Switzerland},
  title =                 {{FastWARC: Optimizing Large-Scale Web Archive Analytics}},
  year =                  2021
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastwarc-0.14.9.tar.gz (44.4 kB view details)

Uploaded Source

Built Distributions

FastWARC-0.14.9-cp312-cp312-win_amd64.whl (595.1 kB view details)

Uploaded CPython 3.12 Windows x86-64

FastWARC-0.14.9-cp312-cp312-manylinux_2_28_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

FastWARC-0.14.9-cp312-cp312-manylinux_2_28_aarch64.whl (2.2 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ ARM64

FastWARC-0.14.9-cp312-cp312-macosx_11_0_arm64.whl (448.2 kB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

FastWARC-0.14.9-cp312-cp312-macosx_10_9_x86_64.whl (501.0 kB view details)

Uploaded CPython 3.12 macOS 10.9+ x86-64

FastWARC-0.14.9-cp311-cp311-win_amd64.whl (600.9 kB view details)

Uploaded CPython 3.11 Windows x86-64

FastWARC-0.14.9-cp311-cp311-manylinux_2_28_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

FastWARC-0.14.9-cp311-cp311-manylinux_2_28_aarch64.whl (2.3 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ ARM64

FastWARC-0.14.9-cp311-cp311-macosx_11_0_arm64.whl (452.9 kB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

FastWARC-0.14.9-cp311-cp311-macosx_10_9_x86_64.whl (513.7 kB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

FastWARC-0.14.9-cp310-cp310-win_amd64.whl (598.5 kB view details)

Uploaded CPython 3.10 Windows x86-64

FastWARC-0.14.9-cp310-cp310-manylinux_2_28_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

FastWARC-0.14.9-cp310-cp310-manylinux_2_28_aarch64.whl (2.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ ARM64

FastWARC-0.14.9-cp310-cp310-macosx_11_0_arm64.whl (451.2 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

FastWARC-0.14.9-cp310-cp310-macosx_10_9_x86_64.whl (510.8 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

FastWARC-0.14.9-cp39-cp39-win_amd64.whl (600.1 kB view details)

Uploaded CPython 3.9 Windows x86-64

FastWARC-0.14.9-cp39-cp39-manylinux_2_28_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

FastWARC-0.14.9-cp39-cp39-macosx_11_0_arm64.whl (452.6 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

FastWARC-0.14.9-cp39-cp39-macosx_10_9_x86_64.whl (512.2 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

FastWARC-0.14.9-cp38-cp38-win_amd64.whl (627.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

FastWARC-0.14.9-cp38-cp38-manylinux_2_28_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.28+ x86-64

FastWARC-0.14.9-cp38-cp38-macosx_11_0_arm64.whl (452.5 kB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

FastWARC-0.14.9-cp38-cp38-macosx_10_9_x86_64.whl (511.3 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file fastwarc-0.14.9.tar.gz.

File metadata

  • Download URL: fastwarc-0.14.9.tar.gz
  • Upload date:
  • Size: 44.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for fastwarc-0.14.9.tar.gz
Algorithm Hash digest
SHA256 9dcf415116ba32939ad9c40a51ca20ecfc3ca6851de508f54fcead93ec1eec26
MD5 f5ce1f7f9e84f4455ce0cb6739c52634
BLAKE2b-256 7486b8eeeba8f936d340ca1dc4c596da63370042b43d5b9a6f137265964c95ba

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 94ea086356e7c76ade5e5d53bbcddf616256913a11e00cbd8ad4906d32e1fa96
MD5 add657010be32d61f90de84ec0446697
BLAKE2b-256 eabc9bc238398977ad5605b344b1271016b664affbb3403a64443785dfd0148f

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 11c60e8687d6c964fe70f927caa47ab4c3c813fc827277c155b02581c0cb9a9c
MD5 b676fc7c5a12cd0cd4f98a6a30eae340
BLAKE2b-256 08dada8158ff9660d786b4f065bfd76b8dacc847d23c13a65d33cc8536ef6531

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 cef61169ca664107e3a8c502c7ee474ea44f4a6b44f501eeedea0fec310df025
MD5 a001f644b5a441c670387ccc4287ea9a
BLAKE2b-256 2377fd20aa00964a2e0216a11102b1396e85e0ebe22d945d06cbf1f3fa06b77c

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 34168ef660eb1a6fa3417eb9b798786c0e99bbd81b2b5cacb76df06a23fd7dfc
MD5 dcb6dfc6eaf603f453fb8ec51df37ffa
BLAKE2b-256 917e5f4b05f24324fe2f2a8751cc8dc4a69ff7014e6ae3e54d9812bfa916780c

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 6b0e0e3fbc8ecda7e51f101a15ef1bea0d1d5ab8da362fa111967ce8e929d17b
MD5 6cc13da64da96c2e43e15cf3509ba82b
BLAKE2b-256 313447940adf7d149a9807be4f6cf2acdfbf3aa574d234c4caf202bba67a59f2

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 c8425d598a4371e360897456eb064c795845d8cfb8443efca8d6dfd6e4d7b069
MD5 047c79185ddc8386ddeb6ff51c383d8d
BLAKE2b-256 a6f6303660ce0eaa70a902c4b5c3a7d574ff2de2f8c023aaa455d1ce30354c38

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c502388abf3e8af266bf7ddb6270925068077cd834884d1634412927b20e474e
MD5 eafe5efcdacec4a5ef8e99c49cd7bcf2
BLAKE2b-256 f0f7f8dee6c5b5b06604ac521f32a4c67e46ff597ef3767af65fddd7361ac139

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8334154672387d765e463ab682d6dc94a421fa0aa490f18eaedbb50942dde8b7
MD5 1ef04de5a836f43332e42db6520c571d
BLAKE2b-256 522eb5c2ad6dd1fff84b8ba5d6c1f6d19a9b5671ef2091e305b41e8c51e58eb6

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 56652e2962e1e492c8e43c3f0b5706f3e867052b5395cc12bc883c5c1fe64545
MD5 836e22a52c8f3d32149d5cb075944322
BLAKE2b-256 33e09ed8cb6e675a7616a2731da0987a04667766c8824718eb942d62f0fc6359

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 5029d8d0cae431d77ae952abb575c76307be00f68053dfa6dad3fe97967ea2f3
MD5 ec8ef2675ce17a98f02b4e04cda299a1
BLAKE2b-256 765484a0728c3457a15efeedb4bcbc8dac04165bb28a3f0819e374628ef4138b

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 922ce3b1519fcae68ca755251479721cf97d1fe2c4d0b32a60871aad27b589b4
MD5 900c15d88896c17886b88745f73433d7
BLAKE2b-256 794e1d06fdf5489e40ee5776a24a243514c7585624f0be3abfe6976a66753fa8

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 aab7f38220bf287aeaf77dfcf0c90d25b46a5adc3dd5999015bae8cfd89a2f76
MD5 c73cd855ca19cb0ae49210bd11b9723e
BLAKE2b-256 3d1b926cb0ad1f5ebbd7fa9d6d2b3624e2d0018463fce22ae41a20e6ee19d05e

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ba000cf53b106676c36d3c8d8953413593be7dce47d47b0fed9a9111e1eaa1ea
MD5 94d42bc4e4a3dc5c5e2f82bbfff58b1d
BLAKE2b-256 536510af4818d4b4139df7a31a0f116d05abbc422d181c8fd2d1022fb8112ac9

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9ea6ae1fcf4efed663f84aedd3f2479f2f5d5f1c226cc16e72640dc98dd12ec7
MD5 bee2e429909c2d84b00ec2032e0caad9
BLAKE2b-256 f938cebd2081fc1809ecea6644ab02d9fa3aebcd1135c87b7915b6760036a80f

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 5fbd6c92905dedd6b30cbaf37c485b6838d2a8a744bbc86780be95213d9f7528
MD5 ba882a29cce2a92f8da38493fc4a2f10
BLAKE2b-256 bd34fafaed196fca4260c2b15ecd3981b2b22256c41018c60782c0f30cff6295

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.14.9-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 600.1 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for FastWARC-0.14.9-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 4dc4fd1d41bc87da543f16409924c41d425961f69207b9486ee373af37cb1fdd
MD5 afffe7ee184acb668a8e3bd31a6ad21f
BLAKE2b-256 6d81e51f91fa66d8052e0fc3ca7aa7362713acf3b5b128dbdce2c40b29efa6ed

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 305e246eccc77f87dee5c6c60cfbcb66a2f94d4f519531394a36ea168d1901c9
MD5 7f34252d17c65aba2cbac0ab3cb88df2
BLAKE2b-256 5c073668db2e4bbf84f54a9eb89a70374eeb2738df6b850b5daaae7686fc1973

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e5badc05730155c841dd968a9ff19d1389ecd6d1842ff529d5e7dbc6705a436d
MD5 daab0be3f22769702720579faf44f5e1
BLAKE2b-256 af0d15fc7e87b50737e665fc5bc9ac3fa3eb4c67e8c6976619e48340c04099fc

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c478a638e9222d989dbdfd1618eccbdc35467ee1aeab0a6d03caa7fb7f265fad
MD5 c32f9cb12bf01f7387fd980d3f8307ae
BLAKE2b-256 103ab746647303e7ee7dfaf6984f851c2484e668b47ac11640175669f8a6848d

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: FastWARC-0.14.9-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 627.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for FastWARC-0.14.9-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 690265ee35e06be8bed38a21fce55c483e445cdd57e8e7e491be7220f1da7d5b
MD5 2417a4c04aea33bebb6c65252f24d1cc
BLAKE2b-256 70629f07ea30b8d89a799a36633d8114fcf6adf40e424e89b5364aa44fc2efff

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 85b9ec670a475b8157f8af898e68f6e2bd3397148a98f849dc6df1a3ab846438
MD5 013c1d532af977b0cb2b90100e9f147f
BLAKE2b-256 9fd12d440e76ebf22ec8a1f085267986fb5d887c870b414632c89729499605e0

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2c91709a18349ead68f4d3e541135c3629dd551e93910e66de4fd21474e6cc8d
MD5 0878c125e57f383d969694d6629d8f42
BLAKE2b-256 c14ca4dd240cec935160d44bfeba9c2f54a042dbdbadd769825a53054105be6a

See more details on using hashes here.

File details

Details for the file FastWARC-0.14.9-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for FastWARC-0.14.9-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ba6d729e8dbacfe846c93143bb1145ac972aaa3b87220f70dbeb678054083074
MD5 fca63b157860b5aba0dced86ba51cf6e
BLAKE2b-256 ca02e5c5dead7a1912e87e144744361eb24efdf22f2719d32d6ba982d8a4b4ac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page