Skip to main content

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Project description

ChatNoir Resiliparse

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Resiliparse is a part of the ChatNoir web analytics toolkit.

Installing Resiliparse

Pre-built Resiliparse binaries can be installed from PyPi:

pip install resiliparse

Building Resiliparse From Source

You can compile Resiliparse either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:

# Add Lexbor repository
curl -sL https://lexbor.com/keys/lexbor_signing.key | \
  sudo gpg --dearmor --output /etc/apt/trusted.gpg.d/lexbor.gpg
echo "deb https://packages.lexbor.com/ubuntu/ $(lsb_release -sc) liblexbor" | \
    sudo tee /etc/apt/sources.list.d/lexbor.list

# Install build dependencies (requires libre2-dev>=2022-04-01)
sudo apt update
sudo apt install build-essential python3-dev libuchardet-dev liblexbor-dev libre2-dev

To build and install Resiliparse from PyPi, run

pip install --no-binary resiliparse resiliparse

That's it. If you prefer to build and install directly from this repository instead, run:

pip install -e resiliparse

To build the wheels without installing them, run:

pip wheel -e resiliparse

# Or:
pip install build && python -m build --wheel resiliparse

Usage Instructions

For detailed usage instructions, please consult the Resiliparse User Manual.

Cite Us

If you use ChatNoir or Resiliparse, please consider citing our ECIR 2018 demo paper:

@InProceedings{bevendorff:2018,
  address =             {Berlin Heidelberg New York},
  author =              {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
  booktitle =           {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},
  editor =              {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},
  month =               mar,
  publisher =           {Springer},
  series =              {Lecture Notes in Computer Science},
  site =                {Grenoble, France},
  title =               {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},
  year =                2018
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Resiliparse-0.14.6.tar.gz (88.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

Resiliparse-0.14.6-cp312-cp312-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.12Windows x86-64

Resiliparse-0.14.6-cp312-cp312-manylinux_2_28_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.6-cp312-cp312-manylinux_2_28_aarch64.whl (6.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

Resiliparse-0.14.6-cp312-cp312-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

Resiliparse-0.14.6-cp312-cp312-macosx_10_9_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.12macOS 10.9+ x86-64

Resiliparse-0.14.6-cp311-cp311-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.11Windows x86-64

Resiliparse-0.14.6-cp311-cp311-manylinux_2_28_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.6-cp311-cp311-manylinux_2_28_aarch64.whl (6.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

Resiliparse-0.14.6-cp311-cp311-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

Resiliparse-0.14.6-cp311-cp311-macosx_10_9_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

Resiliparse-0.14.6-cp310-cp310-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.10Windows x86-64

Resiliparse-0.14.6-cp310-cp310-manylinux_2_28_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.6-cp310-cp310-manylinux_2_28_aarch64.whl (5.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ ARM64

Resiliparse-0.14.6-cp310-cp310-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

Resiliparse-0.14.6-cp310-cp310-macosx_10_9_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

Resiliparse-0.14.6-cp39-cp39-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.9Windows x86-64

Resiliparse-0.14.6-cp39-cp39-manylinux_2_28_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.6-cp39-cp39-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

Resiliparse-0.14.6-cp39-cp39-macosx_10_9_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

Resiliparse-0.14.6-cp38-cp38-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.8Windows x86-64

Resiliparse-0.14.6-cp38-cp38-manylinux_2_28_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.6-cp38-cp38-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

Resiliparse-0.14.6-cp38-cp38-macosx_10_9_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

File details

Details for the file Resiliparse-0.14.6.tar.gz.

File metadata

  • Download URL: Resiliparse-0.14.6.tar.gz
  • Upload date:
  • Size: 88.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for Resiliparse-0.14.6.tar.gz
Algorithm Hash digest
SHA256 bb0daea44d94613684c176c89482cc510b085937ba45adeb3fd12ef47521b4b4
MD5 314142af2b90aa47289db0f9232a889d
BLAKE2b-256 9f66ac157a6814dfc249914c0596fccf0762c1a2134a9fd16967b0261a1cb52b

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 3eaec93818451a4cc94a3bad70693025dc97ff59786b39950fe3c96e1126c350
MD5 e041f946c81717c216b46dfaff56788a
BLAKE2b-256 7432cb422a50a82e6a792cc85cb6979b6a13a8fe094c0fdd36f72bb84c81efc2

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4e1ab1ce561df4cbbfb60ffa4af5ab5525f319ee0e9181c2ecd9682501045286
MD5 3530e6c363ad8581169e0f79dc457896
BLAKE2b-256 a9a13e7759d9733af14d1473443c429732fef044a230105af5640f1b070e3a72

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c9db0032babeea20a289f99bd48c094f2e046025f5380c26636b45d2d26091ef
MD5 ee8234805de54364e7d8dc2a81ff311b
BLAKE2b-256 4a8737681d50b9235fc38d92269a678519cc1b01df6e2853ce44538933c5f097

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 39e0b96d242314334de8b57d77d62b40085943b0342eef6f9f2aadd1165eb7c2
MD5 1b5265ed59c2ce4567d38a0707d07854
BLAKE2b-256 f70b1996f8742f5dc5ffc8f3ebe3092d45308391faf113322ce05baa00dd092a

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 5909444cfd5b13f50257d68650db929b592010ec7dbfaf93743b3186e1103c1c
MD5 57490001141f77603bad77d6b564279e
BLAKE2b-256 65e953bbbc0532e6e1b7d4d2e3a183521f6e6d8a35258b87d120a017747bada9

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 7771666c11a78650ba79642bafb81a55337635dccb614cc04bd0834b47700c12
MD5 0abc9428e2fc9672d07e0628b65ba554
BLAKE2b-256 dad54cca0810ca8cb8a421007c583d73b0d7dcad9169bac4142a857cda4454d1

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0698670f731e9a7acab084585333f52a686da5c7afb8404070fec5397300a792
MD5 194b73a374c27de38c30abf8f3215e5f
BLAKE2b-256 63f09781c4ce5921e78afd2f62eadaa71521fbe2f243285afb0cf8d802d41161

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d5c5d1be7fcf4bf2ffa497b3b123d967c3b0d89290702cfd842cad181ece338c
MD5 be6c74239cdb8efc6a53659ec8e59d99
BLAKE2b-256 b27f07a9f1376a903977df372b2543a920f8152ca6846fde2a0461f51d9f2038

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1f1a653145bcd1e59a21e7cee813350dc90074d2f0236e5a4e5247cc6202cd23
MD5 c09a26065e3293ab746c62bb9fdd3d80
BLAKE2b-256 d094254a5515a9a84bb3a758b63913c46352fd0b45d45cf492c2563ab7e40f1c

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 43b5ac68333fa219e26aefec830f6fb09945c91ff37ffd39c2fcbf949115b173
MD5 198994aed59584605af18883519eb9e9
BLAKE2b-256 b798909698c1aa076002dfc7c0f5c10cbadf4ba38ef25f91512f0f8d840ace39

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 bddbd1146c388f7b2c4cfe3c5d8d1c73a4c9bb711ee936ae7bd274a0ae4756d3
MD5 417b547b88f226f4ff78d3d4f4cd2e2a
BLAKE2b-256 3c45f0f62d943168a2286f7ba81761836102268ae937ed65be58b3c30ebe4936

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 945f2dd8a228b9f906d2d977dea1bf03dd1133c5c39cae6f45e1395c3b538cf6
MD5 990aa4f1ab06bfb061a41ce17b308d71
BLAKE2b-256 8f673fccdd8f3dd37ce5d90e84d4bc9a78faec2c3c773b01b04868d0b130fbdf

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 279d68c943ad3ea5b1adc886dfb7d4106726e5b0257e3bae385985f7d117af57
MD5 1b3161b465e473ed2cc039547ce291e9
BLAKE2b-256 b605f27811820bdf468d4c5e3afee8e9c05ab0f2c062b49523191aa70c6ae50f

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 23f4ff2f5efecbdf43ae7d154e4521b3eb57b57eb9332931738d1d1bf0fadfde
MD5 2b42df2e0aea4152197c4d683a82f9fb
BLAKE2b-256 f4c8cbf087184d2427b610e7652f978242fa99bb8e981f980490ff2d654b264b

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3869b6c4805215f357d1c1e2d6db1406dad29eda3260c675b0c692bbd2375fe7
MD5 24d596cde0ccee620404ade493663a57
BLAKE2b-256 253deaededa858e52f22c5baf0dad533df15f12c802e115705ee9d33aa1eb315

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: Resiliparse-0.14.6-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for Resiliparse-0.14.6-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 77f9aa0b8d763cc13cb3193ac2ccb1f4c14b3d4229aa71c7b01563fb48444eba
MD5 350913526950ad7663eeb17602cc7ff0
BLAKE2b-256 e80232bbf4811e04bead422aa2e28c52acdf9000285bb444086d9f164018d47f

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 162a0eb895b4b4623318ee8207d331d6b78f7fb4c391219d5d5bcd4d8bd2c016
MD5 40a4c34b68a71231648e137bb8551a4f
BLAKE2b-256 89e1472a7847fd5cbd34018fa62f5ba80e5e2cec490d7515b7c0a6900a91644b

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6ca0059a1e2b29490f8f34bd3121a9efd37c6f92f25c0ed0bae18bc50e88983a
MD5 36de7b4300225f760f79bdd4384d15cb
BLAKE2b-256 e725c0204e1f76bb94915dad1b332be104a925fb06f07d7fd3cce2fdc1356d78

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 b741f3c25b54de620ab2965912979115e3c7b940ad2d3e657e7e8c36286c7580
MD5 f1e7242c1860a1c91c25de714f97cf8c
BLAKE2b-256 06dc6e153cd68032c95a3a0e69ef643a2e1386cafe12cf045a40a223e100cf68

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: Resiliparse-0.14.6-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for Resiliparse-0.14.6-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 3981a9678998633e0974f681e9ab5ff42dd7b2570d81519cea00ffa88abc5467
MD5 5de552a9e3fb6d4405a908f2dd9cc496
BLAKE2b-256 4f1b02e2a429c9f6eb12b1d7f04eb4242f6a201e5bb84d3f88411b19aefe40e1

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 951dde9414b605897b761aec2a7ee2c9a6e6b8db3345e14a591e2a6ea71b4274
MD5 7c8c266ae93d5316fdb54a8f4473f08c
BLAKE2b-256 489128b7172853fed0fe2fba19349b61af952a1f56f3089fb330b2879b90dd82

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e075015c608e8b4c8a3fec15af9b065819e3e43d58df9b92895ec2d74593cd5c
MD5 588f35c49593418d913043f450ff5985
BLAKE2b-256 ced8d7b6f7944ba3b32f78ed209fcd834455494d24292b6d5247d87d1ea538fa

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.6-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.6-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ebd61fc0fb2a1f38c3de5d72025937016d6b51c613cc747d58c4ffb4bea10001
MD5 bdb237cae2b515ecff7ef01f98f58eff
BLAKE2b-256 7cbcfa35e027acee4ad8645c5d2df6bda44cfc1ffc275eb95d11baea0acb0ecf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page