A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.
Project description
ChatNoir Resiliparse
A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.
Resiliparse is a part of the ChatNoir web analytics toolkit.
Installing Resiliparse
Pre-built Resiliparse binaries can be installed from PyPi:
pip install resiliparse
Building Resiliparse From Source
You can compile Resiliparse either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:
# Add Lexbor repository
curl -sL https://lexbor.com/keys/lexbor_signing.key | \
sudo gpg --dearmor --output /etc/apt/trusted.gpg.d/lexbor.gpg
echo "deb https://packages.lexbor.com/ubuntu/ $(lsb_release -sc) liblexbor" | \
sudo tee /etc/apt/sources.list.d/lexbor.list
# Install build dependencies (requires libre2-dev>=2022-04-01)
sudo apt update
sudo apt install build-essential python3-dev libuchardet-dev liblexbor-dev libre2-dev
To build and install Resiliparse from PyPi, run
pip install --no-binary resiliparse resiliparse
That's it. If you prefer to build and install directly from this repository instead, run:
pip install -e resiliparse
To build the wheels without installing them, run:
pip wheel -e resiliparse
# Or:
pip install build && python -m build --wheel resiliparse
Usage Instructions
For detailed usage instructions, please consult the Resiliparse User Manual.
Cite Us
If you use ChatNoir or Resiliparse, please consider citing our ECIR 2018 demo paper:
@InProceedings{bevendorff:2018,
address = {Berlin Heidelberg New York},
author = {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
booktitle = {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},
editor = {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},
month = mar,
publisher = {Springer},
series = {Lecture Notes in Computer Science},
site = {Grenoble, France},
title = {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},
year = 2018
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file resiliparse-0.14.8.tar.gz
.
File metadata
- Download URL: resiliparse-0.14.8.tar.gz
- Upload date:
- Size: 88.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 235659f3e8c9139ae9b498672e2f572d6c6e5fce0c4a51e67efe4f17b3b62592 |
|
MD5 | 8e7490a87b8a0399e809811ea589d3ed |
|
BLAKE2b-256 | 9c25f5598908bdca1528bfb0f8c5ef3f55a1db773017ecd8e13715f105864b0d |
File details
Details for the file Resiliparse-0.14.8-cp312-cp312-win_amd64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 569df9951aee1f6ad0b144649b6788d1a5f1ccf8fc71f51bcc37f63ed40e3d2b |
|
MD5 | f0edb926ad47091210732e9c482f8082 |
|
BLAKE2b-256 | 933cffcd9b4198cc49a7b3cbbd55f53cd070bcc6faf91ca02e73459b2d077b5f |
File details
Details for the file Resiliparse-0.14.8-cp312-cp312-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp312-cp312-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40964eaa3c0ccfa4033524a3fc0eccc5d6f208fa0618ed0de702f994e720173a |
|
MD5 | 7be03fd16a3aa34195ef37cf64473544 |
|
BLAKE2b-256 | 54ca9b0e8b2340a566bc411d6b4c79427074e54e4f34c29daf6cc71918b57803 |
File details
Details for the file Resiliparse-0.14.8-cp312-cp312-manylinux_2_28_aarch64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp312-cp312-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 6.0 MB
- Tags: CPython 3.12, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75579849d910f05adaf5c311bb239fc8b47daebf0ef64fef8017d99ca0d5e160 |
|
MD5 | 599112507448b9f9b3a7f26e5e0b84c9 |
|
BLAKE2b-256 | 299a67e741d2ad09e9c3cedac1770f2de361890533e7aad0f4e339222be2c771 |
File details
Details for the file Resiliparse-0.14.8-cp312-cp312-macosx_11_0_arm64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.7 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e32f0b5ae258262d35c1903c2420e58ea9ad2aac7ac53b1b3636d94d5915ddac |
|
MD5 | 823f4043e42973ff99886ef610afa235 |
|
BLAKE2b-256 | 96e442655a1669150a3f36f8642cbf66c69dd832ef2d2b69dfefa98a0cd288c7 |
File details
Details for the file Resiliparse-0.14.8-cp312-cp312-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp312-cp312-macosx_10_9_x86_64.whl
- Upload date:
- Size: 2.8 MB
- Tags: CPython 3.12, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec16c82a9fac782a2c1bbb219137da34b0df2928c4fa7e428507baec7f8c4c94 |
|
MD5 | 28a50f22433af5ccc31ad0098dd09355 |
|
BLAKE2b-256 | 37f48baf0c1a5644c34aca4952f8b71c6854d6b6194f7fa8aa1a3d3259f4b8e7 |
File details
Details for the file Resiliparse-0.14.8-cp311-cp311-win_amd64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7c9eb8258d8ae40b208a0cfb20f6afa5b5ffc30087c15d27530f9d6675a714c |
|
MD5 | be473177137f833808b12064904a10c0 |
|
BLAKE2b-256 | 49e60eae8406bf3c04148fdf52ce2fdd5321a4fc43c2f1a872288422b6a3ee23 |
File details
Details for the file Resiliparse-0.14.8-cp311-cp311-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 535c031199e72dc4c0d8bf701a62a72551c67f83c6cc1ee97fa50d59d352efe3 |
|
MD5 | c1571cf7651d01bae340885821e66b69 |
|
BLAKE2b-256 | 2daaa85a668030b38622dd863f97986e81fa3ffa9dd9cae786acf3185d54a328 |
File details
Details for the file Resiliparse-0.14.8-cp311-cp311-manylinux_2_28_aarch64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp311-cp311-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 6.0 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc69dac223d7280c99fdd48b168ed4b1eb4c84e798b3f1673922f9f25edaef4c |
|
MD5 | 7451cdbd510b9c35959ea7675fd97da3 |
|
BLAKE2b-256 | 18c6e3d4509d8812464813d67ed4c25fd0d0a60937deebaad6d57d9787e45448 |
File details
Details for the file Resiliparse-0.14.8-cp311-cp311-macosx_11_0_arm64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.7 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 389c938cd88701a7922e602a6e5c36154eedbb245a96d7a37a8255bbfa408087 |
|
MD5 | fa7935da5e8efaad8bd3a512b2574e11 |
|
BLAKE2b-256 | d441d0decc4d9103ffa1eeee3576ee659691566cf5efb9c78201226c70ae5522 |
File details
Details for the file Resiliparse-0.14.8-cp311-cp311-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp311-cp311-macosx_10_9_x86_64.whl
- Upload date:
- Size: 2.8 MB
- Tags: CPython 3.11, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd697cc584002ec753903dc49f6fe3e45526bd0370658322aca737711ccaeeb9 |
|
MD5 | b3761fd5fa1a97ce35bd3ac947811efc |
|
BLAKE2b-256 | 7f66d95703508f3ec279a8371b90871b7c376e3505223a5b5c2786cc121793b6 |
File details
Details for the file Resiliparse-0.14.8-cp310-cp310-win_amd64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c41d887cc70eb723ea0ddd004e72c5a3877dd887b2d56d3af88afd62e510e99 |
|
MD5 | 41af68fbde11aa509343daa3b0ef5833 |
|
BLAKE2b-256 | cbd54d8be3c2442dc16b7fe6b38acf65e70be2e4816f01ae8b7741493a376ac7 |
File details
Details for the file Resiliparse-0.14.8-cp310-cp310-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp310-cp310-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 5.9 MB
- Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d7c79cf0abe158fbf8fa433e6c201903a20ef36dc9aebec898d9a02fc9cfc6b |
|
MD5 | 582f6cbcadedd2d54dd8fa1ad097998a |
|
BLAKE2b-256 | 76482904674148df7e61c095bacea96a41bdb6d2b34d4d806bc97d728b67a8a0 |
File details
Details for the file Resiliparse-0.14.8-cp310-cp310-manylinux_2_28_aarch64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp310-cp310-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 5.8 MB
- Tags: CPython 3.10, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb3f162a2a7f30b81fdf1231696f3a635f4d160a04fe2159ccbf004d5ee342c6 |
|
MD5 | 858f219e157a84da57df975dd17bfd78 |
|
BLAKE2b-256 | 2244008bcc1c0abaabe4667b29b7bc4957f63bb5441314cba6a24ca2530b895b |
File details
Details for the file Resiliparse-0.14.8-cp310-cp310-macosx_11_0_arm64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.7 MB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef7563f04e2f01269e2d965124eac0273051f7a3c19df6742913795586cf6e1b |
|
MD5 | d0e1e9ac0c4c3f9817a90615e1ff4c60 |
|
BLAKE2b-256 | d98aa6426c5c44f45a16aabde8ae4c9a430a461322b9a31f8d2d752e1ad91b52 |
File details
Details for the file Resiliparse-0.14.8-cp310-cp310-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp310-cp310-macosx_10_9_x86_64.whl
- Upload date:
- Size: 2.8 MB
- Tags: CPython 3.10, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 936668b28fcdff31f7a2f9dbc5fca397e9e4cead5b0f40962bf33641962bdf7f |
|
MD5 | 9d2ddd37f1e624530b81fee1c23ed8ed |
|
BLAKE2b-256 | 0473fdfded3287605bb12c5cc5e89244529311bffe06687457c50fbb877a2be0 |
File details
Details for the file Resiliparse-0.14.8-cp39-cp39-win_amd64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5920242de45167dc0c340003dde095dcb2a28ff0f40d1c18dcc85be86e50feb9 |
|
MD5 | f8b52030a831d1a1595df5d7feec1808 |
|
BLAKE2b-256 | b4552f36279d503c10914cbea7390962fabe3dd20864ef3dbe5af1cb59645142 |
File details
Details for the file Resiliparse-0.14.8-cp39-cp39-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp39-cp39-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 5.9 MB
- Tags: CPython 3.9, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eff37d64b344ccad6da0c2216b27e23767093711713caedfc3cec7b66f8a45b0 |
|
MD5 | aeb7ae1a63de667517aeb15bbe1bb2d8 |
|
BLAKE2b-256 | 12b3ebfa49be9b3d6b34456bb48c73d066bf04d5024f4977ff3c35494ab9cd94 |
File details
Details for the file Resiliparse-0.14.8-cp39-cp39-macosx_11_0_arm64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp39-cp39-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.7 MB
- Tags: CPython 3.9, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8cf2c8ad92c83e1156c1fe2546bba89d7dc82cb7ac6a035c0cdbbadbcdfbfd47 |
|
MD5 | 7a91b80759676ab9079f34d2db4a8dc6 |
|
BLAKE2b-256 | 70d9ee9d7a81c99a2b10aaab5ee225c3aa969a19581f5070db1681c719330f25 |
File details
Details for the file Resiliparse-0.14.8-cp39-cp39-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp39-cp39-macosx_10_9_x86_64.whl
- Upload date:
- Size: 2.8 MB
- Tags: CPython 3.9, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f582da049625269f228b021632e1df6f52eb70fd1cffbce606110b87e736cb5d |
|
MD5 | 4c517d52afd4a4e7a2b50658ff0e2bdf |
|
BLAKE2b-256 | ce0e6959ca5cb401dbb4d1040dc9b423f322c3a96b4765c8451a1b119b7622ba |
File details
Details for the file Resiliparse-0.14.8-cp38-cp38-win_amd64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7dbe7cada9672b9aa8b8e46453f53901a069dc6d69ed69079223bb6558915e5f |
|
MD5 | dfe65bde3bbfe7a30f3e48ae3449f6fa |
|
BLAKE2b-256 | 0592328764e11036beb5126be193c67a6d70fa040892c9ca60ff5f1e5777f5c1 |
File details
Details for the file Resiliparse-0.14.8-cp38-cp38-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp38-cp38-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 5.9 MB
- Tags: CPython 3.8, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8fd4484841145d1ab9868da0df7b0499a808e16359b88e4a7b90af871fc5e190 |
|
MD5 | 80bdde3ff6816e30f4b9994aa92fa7e2 |
|
BLAKE2b-256 | 7f43506491a14601569ef70d131c8a782a557d51587f0153a4c36293575ca19f |
File details
Details for the file Resiliparse-0.14.8-cp38-cp38-macosx_11_0_arm64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp38-cp38-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.7 MB
- Tags: CPython 3.8, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 78517631f43fa7c5c4d3bd649a9dca02f3da2c3e4da221979a3e19d99f876353 |
|
MD5 | 10c660c0241cebb90c18446b9334c0d2 |
|
BLAKE2b-256 | d8e01ecb48e613c6a9298f793928f9f375529f62a1d65e9d0f95a54a045e6dca |
File details
Details for the file Resiliparse-0.14.8-cp38-cp38-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: Resiliparse-0.14.8-cp38-cp38-macosx_10_9_x86_64.whl
- Upload date:
- Size: 2.8 MB
- Tags: CPython 3.8, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7e835b8ec76a29d80f284daca45c4b7fdbbf95b54c9c31a60478b4ddd20fa22 |
|
MD5 | f1b9f0ebf485fa92940f24d93a0a34f4 |
|
BLAKE2b-256 | cd318d25bcccd75eb24e5eefce6d6f83ceb1dab978bcecad3488b6e9d284a952 |