Skip to main content

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Project description

ChatNoir Resiliparse

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Resiliparse is a part of the ChatNoir web analytics toolkit.

Installing Resiliparse

Pre-built Resiliparse binaries can be installed from PyPi:

pip install resiliparse

Building Resiliparse From Source

You can compile Resiliparse either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:

# Add Lexbor repository
curl -sL https://lexbor.com/keys/lexbor_signing.key | \
  sudo gpg --dearmor --output /etc/apt/trusted.gpg.d/lexbor.gpg
echo "deb https://packages.lexbor.com/ubuntu/ $(lsb_release -sc) liblexbor" | \
    sudo tee /etc/apt/sources.list.d/lexbor.list

# Install build dependencies (requires libre2-dev>=2022-04-01)
sudo apt update
sudo apt install build-essential python3-dev libuchardet-dev liblexbor-dev libre2-dev

To build and install Resiliparse from PyPi, run

pip install --no-binary resiliparse resiliparse

That's it. If you prefer to build and install directly from this repository instead, run:

pip install -e resiliparse

To build the wheels without installing them, run:

pip wheel -e resiliparse

# Or:
pip install build && python -m build --wheel resiliparse

Usage Instructions

For detailed usage instructions, please consult the Resiliparse User Manual.

Cite Us

If you use ChatNoir or Resiliparse, please consider citing our ECIR 2018 demo paper:

@InProceedings{bevendorff:2018,
  address =             {Berlin Heidelberg New York},
  author =              {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
  booktitle =           {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},
  editor =              {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},
  month =               mar,
  publisher =           {Springer},
  series =              {Lecture Notes in Computer Science},
  site =                {Grenoble, France},
  title =               {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},
  year =                2018
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

resiliparse-0.14.8.tar.gz (88.5 kB view details)

Uploaded Source

Built Distributions

Resiliparse-0.14.8-cp312-cp312-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.12 Windows x86-64

Resiliparse-0.14.8-cp312-cp312-manylinux_2_28_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.8-cp312-cp312-manylinux_2_28_aarch64.whl (6.0 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ ARM64

Resiliparse-0.14.8-cp312-cp312-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

Resiliparse-0.14.8-cp312-cp312-macosx_10_9_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.12 macOS 10.9+ x86-64

Resiliparse-0.14.8-cp311-cp311-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.11 Windows x86-64

Resiliparse-0.14.8-cp311-cp311-manylinux_2_28_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.8-cp311-cp311-manylinux_2_28_aarch64.whl (6.0 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ ARM64

Resiliparse-0.14.8-cp311-cp311-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

Resiliparse-0.14.8-cp311-cp311-macosx_10_9_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

Resiliparse-0.14.8-cp310-cp310-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.10 Windows x86-64

Resiliparse-0.14.8-cp310-cp310-manylinux_2_28_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.8-cp310-cp310-manylinux_2_28_aarch64.whl (5.8 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ ARM64

Resiliparse-0.14.8-cp310-cp310-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

Resiliparse-0.14.8-cp310-cp310-macosx_10_9_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

Resiliparse-0.14.8-cp39-cp39-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.9 Windows x86-64

Resiliparse-0.14.8-cp39-cp39-manylinux_2_28_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.8-cp39-cp39-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

Resiliparse-0.14.8-cp39-cp39-macosx_10_9_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

Resiliparse-0.14.8-cp38-cp38-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.8 Windows x86-64

Resiliparse-0.14.8-cp38-cp38-manylinux_2_28_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.8-cp38-cp38-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

Resiliparse-0.14.8-cp38-cp38-macosx_10_9_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file resiliparse-0.14.8.tar.gz.

File metadata

  • Download URL: resiliparse-0.14.8.tar.gz
  • Upload date:
  • Size: 88.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for resiliparse-0.14.8.tar.gz
Algorithm Hash digest
SHA256 235659f3e8c9139ae9b498672e2f572d6c6e5fce0c4a51e67efe4f17b3b62592
MD5 8e7490a87b8a0399e809811ea589d3ed
BLAKE2b-256 9c25f5598908bdca1528bfb0f8c5ef3f55a1db773017ecd8e13715f105864b0d

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 569df9951aee1f6ad0b144649b6788d1a5f1ccf8fc71f51bcc37f63ed40e3d2b
MD5 f0edb926ad47091210732e9c482f8082
BLAKE2b-256 933cffcd9b4198cc49a7b3cbbd55f53cd070bcc6faf91ca02e73459b2d077b5f

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 40964eaa3c0ccfa4033524a3fc0eccc5d6f208fa0618ed0de702f994e720173a
MD5 7be03fd16a3aa34195ef37cf64473544
BLAKE2b-256 54ca9b0e8b2340a566bc411d6b4c79427074e54e4f34c29daf6cc71918b57803

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 75579849d910f05adaf5c311bb239fc8b47daebf0ef64fef8017d99ca0d5e160
MD5 599112507448b9f9b3a7f26e5e0b84c9
BLAKE2b-256 299a67e741d2ad09e9c3cedac1770f2de361890533e7aad0f4e339222be2c771

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e32f0b5ae258262d35c1903c2420e58ea9ad2aac7ac53b1b3636d94d5915ddac
MD5 823f4043e42973ff99886ef610afa235
BLAKE2b-256 96e442655a1669150a3f36f8642cbf66c69dd832ef2d2b69dfefa98a0cd288c7

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ec16c82a9fac782a2c1bbb219137da34b0df2928c4fa7e428507baec7f8c4c94
MD5 28a50f22433af5ccc31ad0098dd09355
BLAKE2b-256 37f48baf0c1a5644c34aca4952f8b71c6854d6b6194f7fa8aa1a3d3259f4b8e7

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 e7c9eb8258d8ae40b208a0cfb20f6afa5b5ffc30087c15d27530f9d6675a714c
MD5 be473177137f833808b12064904a10c0
BLAKE2b-256 49e60eae8406bf3c04148fdf52ce2fdd5321a4fc43c2f1a872288422b6a3ee23

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 535c031199e72dc4c0d8bf701a62a72551c67f83c6cc1ee97fa50d59d352efe3
MD5 c1571cf7651d01bae340885821e66b69
BLAKE2b-256 2daaa85a668030b38622dd863f97986e81fa3ffa9dd9cae786acf3185d54a328

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 dc69dac223d7280c99fdd48b168ed4b1eb4c84e798b3f1673922f9f25edaef4c
MD5 7451cdbd510b9c35959ea7675fd97da3
BLAKE2b-256 18c6e3d4509d8812464813d67ed4c25fd0d0a60937deebaad6d57d9787e45448

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 389c938cd88701a7922e602a6e5c36154eedbb245a96d7a37a8255bbfa408087
MD5 fa7935da5e8efaad8bd3a512b2574e11
BLAKE2b-256 d441d0decc4d9103ffa1eeee3576ee659691566cf5efb9c78201226c70ae5522

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 cd697cc584002ec753903dc49f6fe3e45526bd0370658322aca737711ccaeeb9
MD5 b3761fd5fa1a97ce35bd3ac947811efc
BLAKE2b-256 7f66d95703508f3ec279a8371b90871b7c376e3505223a5b5c2786cc121793b6

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 8c41d887cc70eb723ea0ddd004e72c5a3877dd887b2d56d3af88afd62e510e99
MD5 41af68fbde11aa509343daa3b0ef5833
BLAKE2b-256 cbd54d8be3c2442dc16b7fe6b38acf65e70be2e4816f01ae8b7741493a376ac7

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2d7c79cf0abe158fbf8fa433e6c201903a20ef36dc9aebec898d9a02fc9cfc6b
MD5 582f6cbcadedd2d54dd8fa1ad097998a
BLAKE2b-256 76482904674148df7e61c095bacea96a41bdb6d2b34d4d806bc97d728b67a8a0

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 bb3f162a2a7f30b81fdf1231696f3a635f4d160a04fe2159ccbf004d5ee342c6
MD5 858f219e157a84da57df975dd17bfd78
BLAKE2b-256 2244008bcc1c0abaabe4667b29b7bc4957f63bb5441314cba6a24ca2530b895b

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ef7563f04e2f01269e2d965124eac0273051f7a3c19df6742913795586cf6e1b
MD5 d0e1e9ac0c4c3f9817a90615e1ff4c60
BLAKE2b-256 d98aa6426c5c44f45a16aabde8ae4c9a430a461322b9a31f8d2d752e1ad91b52

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 936668b28fcdff31f7a2f9dbc5fca397e9e4cead5b0f40962bf33641962bdf7f
MD5 9d2ddd37f1e624530b81fee1c23ed8ed
BLAKE2b-256 0473fdfded3287605bb12c5cc5e89244529311bffe06687457c50fbb877a2be0

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 5920242de45167dc0c340003dde095dcb2a28ff0f40d1c18dcc85be86e50feb9
MD5 f8b52030a831d1a1595df5d7feec1808
BLAKE2b-256 b4552f36279d503c10914cbea7390962fabe3dd20864ef3dbe5af1cb59645142

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 eff37d64b344ccad6da0c2216b27e23767093711713caedfc3cec7b66f8a45b0
MD5 aeb7ae1a63de667517aeb15bbe1bb2d8
BLAKE2b-256 12b3ebfa49be9b3d6b34456bb48c73d066bf04d5024f4977ff3c35494ab9cd94

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8cf2c8ad92c83e1156c1fe2546bba89d7dc82cb7ac6a035c0cdbbadbcdfbfd47
MD5 7a91b80759676ab9079f34d2db4a8dc6
BLAKE2b-256 70d9ee9d7a81c99a2b10aaab5ee225c3aa969a19581f5070db1681c719330f25

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 f582da049625269f228b021632e1df6f52eb70fd1cffbce606110b87e736cb5d
MD5 4c517d52afd4a4e7a2b50658ff0e2bdf
BLAKE2b-256 ce0e6959ca5cb401dbb4d1040dc9b423f322c3a96b4765c8451a1b119b7622ba

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 7dbe7cada9672b9aa8b8e46453f53901a069dc6d69ed69079223bb6558915e5f
MD5 dfe65bde3bbfe7a30f3e48ae3449f6fa
BLAKE2b-256 0592328764e11036beb5126be193c67a6d70fa040892c9ca60ff5f1e5777f5c1

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8fd4484841145d1ab9868da0df7b0499a808e16359b88e4a7b90af871fc5e190
MD5 80bdde3ff6816e30f4b9994aa92fa7e2
BLAKE2b-256 7f43506491a14601569ef70d131c8a782a557d51587f0153a4c36293575ca19f

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 78517631f43fa7c5c4d3bd649a9dca02f3da2c3e4da221979a3e19d99f876353
MD5 10c660c0241cebb90c18446b9334c0d2
BLAKE2b-256 d8e01ecb48e613c6a9298f793928f9f375529f62a1d65e9d0f95a54a045e6dca

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.8-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.8-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a7e835b8ec76a29d80f284daca45c4b7fdbbf95b54c9c31a60478b4ddd20fa22
MD5 f1b9f0ebf485fa92940f24d93a0a34f4
BLAKE2b-256 cd318d25bcccd75eb24e5eefce6d6f83ceb1dab978bcecad3488b6e9d284a952

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page