Skip to main content

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Project description

ChatNoir Resiliparse

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Resiliparse is a part of the ChatNoir web analytics toolkit.

Installing Resiliparse

Pre-built Resiliparse binaries can be installed from PyPi:

pip install resiliparse

Building Resiliparse From Source

You can compile Resiliparse either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:

# Add Lexbor repository
curl -sL https://lexbor.com/keys/lexbor_signing.key | \
  sudo gpg --dearmor --output /etc/apt/trusted.gpg.d/lexbor.gpg
echo "deb https://packages.lexbor.com/ubuntu/ $(lsb_release -sc) liblexbor" | \
    sudo tee /etc/apt/sources.list.d/lexbor.list

# Install build dependencies (requires libre2-dev>=2022-04-01)
sudo apt update
sudo apt install build-essential python3-dev libuchardet-dev liblexbor-dev libre2-dev

To build and install Resiliparse from PyPi, run

pip install --no-binary resiliparse resiliparse

That's it. If you prefer to build and install directly from this repository instead, run:

pip install -e resiliparse

To build the wheels without installing them, run:

pip wheel -e resiliparse

# Or:
pip install build && python -m build --wheel resiliparse

Usage Instructions

For detailed usage instructions, please consult the Resiliparse User Manual.

Cite Us

If you use ChatNoir or Resiliparse, please consider citing our ECIR 2018 demo paper:

@InProceedings{bevendorff:2018,
  address =             {Berlin Heidelberg New York},
  author =              {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
  booktitle =           {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},
  editor =              {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},
  month =               mar,
  publisher =           {Springer},
  series =              {Lecture Notes in Computer Science},
  site =                {Grenoble, France},
  title =               {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},
  year =                2018
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

resiliparse-0.14.9.tar.gz (88.9 kB view details)

Uploaded Source

Built Distributions

Resiliparse-0.14.9-cp312-cp312-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.12 Windows x86-64

Resiliparse-0.14.9-cp312-cp312-manylinux_2_28_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.9-cp312-cp312-manylinux_2_28_aarch64.whl (6.0 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ ARM64

Resiliparse-0.14.9-cp312-cp312-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

Resiliparse-0.14.9-cp312-cp312-macosx_10_9_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.12 macOS 10.9+ x86-64

Resiliparse-0.14.9-cp311-cp311-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.11 Windows x86-64

Resiliparse-0.14.9-cp311-cp311-manylinux_2_28_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.9-cp311-cp311-manylinux_2_28_aarch64.whl (6.0 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ ARM64

Resiliparse-0.14.9-cp311-cp311-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

Resiliparse-0.14.9-cp311-cp311-macosx_10_9_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

Resiliparse-0.14.9-cp310-cp310-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.10 Windows x86-64

Resiliparse-0.14.9-cp310-cp310-manylinux_2_28_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.9-cp310-cp310-manylinux_2_28_aarch64.whl (5.8 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ ARM64

Resiliparse-0.14.9-cp310-cp310-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

Resiliparse-0.14.9-cp310-cp310-macosx_10_9_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

Resiliparse-0.14.9-cp39-cp39-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.9 Windows x86-64

Resiliparse-0.14.9-cp39-cp39-manylinux_2_28_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.9-cp39-cp39-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

Resiliparse-0.14.9-cp39-cp39-macosx_10_9_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

Resiliparse-0.14.9-cp38-cp38-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.8 Windows x86-64

Resiliparse-0.14.9-cp38-cp38-manylinux_2_28_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.9-cp38-cp38-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

Resiliparse-0.14.9-cp38-cp38-macosx_10_9_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file resiliparse-0.14.9.tar.gz.

File metadata

  • Download URL: resiliparse-0.14.9.tar.gz
  • Upload date:
  • Size: 88.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for resiliparse-0.14.9.tar.gz
Algorithm Hash digest
SHA256 a7c525982c24c1f34b03129a91358eda5b7a145d8e7bcb844a248dc483edebfd
MD5 175b144f9ef428659a9259928e8b57b4
BLAKE2b-256 4f6ce85612c7426a5255ab6120616f1cc24a27ee4390f8b3eaffc50fa614723e

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 8735d2d48f9c3df91635cf55c04dcafeb4b452df2326af9bc08295736c166b54
MD5 145499c65d4dc9a9b4aa2f0434a395e8
BLAKE2b-256 922ea3b1a96b4fea02e60e30d52c25be1e18cdf68f7e25afa29a42a72ad77a10

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c7090009343c6f906a7ac6fb8b782341c4c41a9851cc99d3e553a31b4c764d0d
MD5 2a95c4d0aa1950e210889adfd36ea516
BLAKE2b-256 bf1804a54b99394992eb1692bfc6881fd4bf36ddc5ce7a8d219760b974b079db

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9704cd4363e823499dacf7ad0b6eaadbc7c68b9a3373a9e85a906337b34844ff
MD5 c1c996ffa3454b1923d19a609abe0501
BLAKE2b-256 d786f1660de677c01051dbd5258c1a7b15ed80fb6b4548cf072a99da0aab038c

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d0bf1f9fc5eca77b593cc9e2d0c7712f8f406794f31e9032fd92175a3566005b
MD5 6dc5c052efe4afc38119c4228e7edb24
BLAKE2b-256 2bab215d736db80b6351d28a0f3ba1bbf61d4264755d5e23e05847d0ca3908f6

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 6d250eeb9f3e55c308e0ee620a469f56900f863eeb7a4149d120e14457b7fdc1
MD5 82db6110e8379c805aeedb73d482421d
BLAKE2b-256 3268a945778a703c3e034a186eea86fa5e4e3f4b372130bf9ed88591c45a9e3c

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a918bd91267a7be428b931855d31174609028e45cd7df3bbf1c14ae578166cd2
MD5 3a302f7b72d5359afef1a9ac37467bba
BLAKE2b-256 05e4ea43285b93996bdb1c417eb1261e036798ddef0b07a0f35e8d1e0e5666bb

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 405cb62f65d4ab67f4b2415d4218dd37b9299d1ea8998f808c1cbe6ef2beef5e
MD5 980b5589167de023d3b7b8748127dc79
BLAKE2b-256 75d34c94812cbe1fdf71d53a614fd878e010d4219ca3e8bf1590891e0603d670

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3bd2ccbe09a2385f23873c721a9d145622b35728565569ad692b94e1a7564fbb
MD5 49404a0a30b666231701d02da2f72902
BLAKE2b-256 fec9e41cca02bcfb6e3927e9f3104ccdf3a0435f7b6aa859b68e4dc9f440e7ec

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ca3d99b023e48f9dd53d89cca1da2a6bbc6b05c42e4194f10de97da07678bb0f
MD5 2fb7955cedf03989fcdd260eccc1c764
BLAKE2b-256 3b77532aa8fc5e5e7ad5fb3bae66dd48971786f60375bf45d33ff0c19a582093

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1ddb9038d1df66b81c25053d36921bdd69ac64ed1c61def2702d65af91f68d13
MD5 cde0a89b5ecb290140fe75a048581582
BLAKE2b-256 563ec8a741c52a939e525afabfafbd66200cc26d2989cfc9a4e112c31c4c2ce6

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 a67ce5537502499d4d3ea3e51ff0cb64c91e6c46546527f4109ec175629fbff2
MD5 d4ca02a4a535e2c47742023675b7e1e4
BLAKE2b-256 6babd21272a4979246b801e544683c295bf9025e044de903c9b5217160c2bb51

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 62d37db43a171a474bd26e51b6ff9c0068d9b97919391ff2f3fc02a1af24db9a
MD5 e952d3a02557a5a27a58bd171c7d4c0a
BLAKE2b-256 3b9c826e86ecf237c42c9560dc079333207e45ab6d5e3e453e27dc54663e9184

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b54fb9735d1209f99efd1d0fadcf99a2e3230e946e77795d523bc3602a0eb1ef
MD5 9f5a4ad37b450426ff311ddb77153f7a
BLAKE2b-256 f4fd0003074aafbf28c6cc99a57dd89e214dca7b001a5b747c79d7f115412c59

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d9f1bbe1af165289216e1cababc13c01023ee7b21169a055529e86d623e53dee
MD5 5dcca97277c288a2ae1f9f21ddc65ecf
BLAKE2b-256 0db5a558a49c97b146c0ac7935c442ca6ae41ecf342c89671ffd6fc5d146574c

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d8120eab086eea2eed4347e5518d0801821c87d7e304219bdfa6692e103f7684
MD5 22aac773fcd6abf182467d290b1f1bfd
BLAKE2b-256 0d00a4aea00da3614df1ff410fc1a8faf691e0af6e3108b7932be07043505654

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 804a6b22b041f5750a0fea92771910993c78c6cee7a1263ff19966b0de616a67
MD5 6aebd1f183b2ad90ed702f9885de7101
BLAKE2b-256 88c1d7f95a9ed9298d337a2e19699b2bb4f6d54fe013c07decbc8fbb699b7756

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7eb7d695b89924c8507cf5497c4e4dcc61c3f96371aebe71bdcefe6dce1dc70f
MD5 a51cb900e25d452d5b6fda54e9d26f6c
BLAKE2b-256 c8f50a3adf3fa7ee12d58543cfc9bbd0293388382b3a036782000926b4bd4933

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fe5442f734e98e4995d34e989e5e310fe3a9df61c4d527a9625084637f553cec
MD5 f9790dcd06db0332c7270300229c68cf
BLAKE2b-256 01edacd9d67bf5af1599b90821fcbd7cb6d3afbfc7a64e3191650894b98ad1aa

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 2f7cbecff8f6fc31fd92afd93cd9adfb42cbb47cba81e3634e1ab7d5bc03cd39
MD5 f25fee69de21f010b851b1f1f93fc046
BLAKE2b-256 4ef93ddbc9b6b476e597254d4b96c0dd6a312c100d416b8afc79aa81237b7206

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 7ca78eb450d301a6e58897fb53823ecb211ac8b170b0f9ef347dba2f419d7e96
MD5 2d1d71be25676a1a477063545de1081f
BLAKE2b-256 205770bedac23b97b5478a3700733cdd151478a66c7dddeffb37ca5ca531bf49

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0c645367974ad5f786e1e862c007eb6c7fae57c3ae0169a63cc8c542a94a860d
MD5 e4421772dfd8018f522688b271f98eb4
BLAKE2b-256 fe9fe6b2c941a1770ce67540b9d26f3c3717181b18109c7225ac7df25d41e6ce

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b2f192698d64db36ac4d000a7e24ae35a97824155777a7c8058f79266354aea2
MD5 b0eb05d547dc0a4bf4767d0a6ba70990
BLAKE2b-256 621bbd180506b7bd0f7b17081370fbb633372dcb780f358c651660ffaaff5f8e

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.9-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.9-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 715f727373455446c13448e5c7467798a8dc9b775e91fb3b9eb1a05f25080441
MD5 811c2c3edb38583866cfd9be39db6275
BLAKE2b-256 98a0e8abac0d18cbd5bb3fe3dcc62f58c1440d8e93dd8e0d5962e97484533811

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page