Skip to main content

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Project description

ChatNoir Resiliparse

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Resiliparse is a part of the ChatNoir web analytics toolkit.

Installing Resiliparse

Pre-built Resiliparse binaries can be installed from PyPi:

pip install resiliparse

Building Resiliparse From Source

You can compile Resiliparse either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:

# Add Lexbor repository
curl -L https://lexbor.com/keys/lexbor_signing.key | sudo apt-key add -
echo "deb https://packages.lexbor.com/ubuntu/ $(lsb_release -sc) liblexbor" | \
    sudo tee /etc/apt/sources.list.d/lexbor.list

# Install build dependencies
sudo apt update
sudo apt install build-essential python3-dev libuchardet-dev liblexbor-dev libre2-dev

To build and install Resiliparse from PyPi, run

pip install --no-binary resiliparse resiliparse

That's it. If you prefer to build and install directly from this repository instead, run:

pip install -e resiliparse

To build the wheels without installing them, run:

pip wheel -e resiliparse

# Or:
pip install build && python -m build --wheel resiliparse

Usage Instructions

For detailed usage instructions, please consult the Resiliparse User Manual.

Cite Us

If you use ChatNoir or Resiliparse, please consider citing our ECIR 2018 demo paper:

@InProceedings{bevendorff:2018,
  address =             {Berlin Heidelberg New York},
  author =              {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
  booktitle =           {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},
  editor =              {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},
  month =               mar,
  publisher =           {Springer},
  series =              {Lecture Notes in Computer Science},
  site =                {Grenoble, France},
  title =               {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},
  year =                2018
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Resiliparse-0.14.3.tar.gz (88.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

Resiliparse-0.14.3-cp311-cp311-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.11Windows x86-64

Resiliparse-0.14.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

Resiliparse-0.14.3-cp311-cp311-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

Resiliparse-0.14.3-cp311-cp311-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

Resiliparse-0.14.3-cp310-cp310-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.10Windows x86-64

Resiliparse-0.14.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

Resiliparse-0.14.3-cp310-cp310-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

Resiliparse-0.14.3-cp310-cp310-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

Resiliparse-0.14.3-cp39-cp39-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.9Windows x86-64

Resiliparse-0.14.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

Resiliparse-0.14.3-cp39-cp39-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

Resiliparse-0.14.3-cp39-cp39-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

Resiliparse-0.14.3-cp38-cp38-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.8Windows x86-64

Resiliparse-0.14.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

Resiliparse-0.14.3-cp38-cp38-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

Resiliparse-0.14.3-cp38-cp38-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

File details

Details for the file Resiliparse-0.14.3.tar.gz.

File metadata

  • Download URL: Resiliparse-0.14.3.tar.gz
  • Upload date:
  • Size: 88.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for Resiliparse-0.14.3.tar.gz
Algorithm Hash digest
SHA256 8ac34a95de4d2e9a694d4709d478ab039fdff283560a9f765d885c0a4e4df705
MD5 eee596c821db626b4e2c3aa475b9e287
BLAKE2b-256 1999c73e123faa6159aaebad490f76c133e8ef744560d4d02c663e3a04e030d8

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a10bc000e8b944b135026f2819a56956c41c6c6f75ad252a88711b74f4c62eb1
MD5 ba0f68218bcdb8a4b9c63d807a3b41af
BLAKE2b-256 ad5c73ac1f7c3b376d0f1135696396faca39fad26cead892002749ce7590ce2a

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ad0896f931e58d9199f35d527af896547a26d87b8a02d3b3d6f194583a8e8316
MD5 9b127ed846b268eef9069b002b0a4122
BLAKE2b-256 5207aebd9e4e7264901e0c7ad421aed3a138b4ae2a6c0f49a259224ba2367c82

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e9585e0e11e020bceb2be40918f01cebb2f97a7ab1b632d2206256c99cd7ba92
MD5 8b01f379a9066119f667c06304f91d89
BLAKE2b-256 10a81209b1f535ea2a3167c2194e96ad41cf9ee28c7caf5a193ed2f3643ea3b7

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ec4ff27b1cbb5a8ef6a3fc30a9f7ee4e019cbe40eed68c6f811f0e964a2c89d6
MD5 c44ca7825585f2b3dc9d56032dab80c8
BLAKE2b-256 529890f318f25947c542a594ab6306b53aed6d6520d691ce031c813c87dadaa6

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 956141a12d1a98fef011c8c223d27d797b5157db11d1ccee063d069df3f0d789
MD5 5c9a47748280971deb92b27cc780b560
BLAKE2b-256 5c62b9e28e064a20146417fe42ea4e014a3365b82eca9fe2c20d9b229637c0a4

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6456fd67dd3e6903b4db2710c2c3e59e73cd9b0fc8b9178275d2ffc8c3c9b604
MD5 395daaf9f4b27c810ad022da6189fe94
BLAKE2b-256 34787a098d48b285261f90215b544904580f2b7c95fc76e1566888841469693c

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b27eef602ee6047f6d38e48777f67784255b951b92a33a56030b41efc6258d67
MD5 8647bb2e518fa6a26938d7cd235a4d7b
BLAKE2b-256 3cd7ba72bf04aa99e32cda5ea3921367c9f447ed410b6df4e3a5eb56a30e5905

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 05e95023e076904a4cbe9465ff4d259c45d3d619256f1a297afd3a7c7a502cd5
MD5 0888e8fd94ebc03633ee314e6614bd45
BLAKE2b-256 08fe1fe41a723186915474ba5d139fb514ac150f16196ebc4dd92c69e9d17c81

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: Resiliparse-0.14.3-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for Resiliparse-0.14.3-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 7cd05853f0ece7ceae7d67a33f3aedbeb69cea449e3a698ddb92b56c404286fd
MD5 761f1a3d668ab785a5ee97eec4c8ca96
BLAKE2b-256 ea0d7141876a566b86032a57d5cb183ecdbca63561ed959e9f424b04a8bcf729

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fff2a47b873d1da4ffe07c5de9c01657bed77cb098a38641af289850ccd73770
MD5 0935091b9413ae4cd3864f82ff595f55
BLAKE2b-256 feade63979e88182e149fc0a5a26e92db32713efb2a8740ca1ed9f48aa34b699

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 783aeb59813dbd3ece4a56e221ce22b9f6751a8d16e779e5b7748e3485841790
MD5 35d00927f1fa5899b79a7048dc9ad550
BLAKE2b-256 7eeceab8a44601862415e4487cb5713126ad440d5dba0e91e5331b00b56457b2

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d159441c358cd4c829379bbe436ea87d627cac340d21887ed2ca5117a389ee75
MD5 0a518e7f4f1a6ba6063d035acbe684ad
BLAKE2b-256 939740227fbfe1dc393ab3b3113a43074ee564fb93ec5880c37b75fa25408142

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: Resiliparse-0.14.3-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for Resiliparse-0.14.3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 a1616db7148b7f2404fa495971fa9bbdd5d54d528b755a81a643e124b54f0cff
MD5 ab19cb983f4e05ff2bbdd618e5605afe
BLAKE2b-256 391217fd0f5170c35cbd82e342238310225e4da3802c4a6ed6168024ce23236d

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e1af48b745260e87072563a13cbaeff32f4408aef62d5f8c31dd5f5bcf477138
MD5 7a80e3b193b5a097300aa5d8eefb53d7
BLAKE2b-256 09e42a628d9e81277b82bf25cee50b29cb8ce9aa91c1eb7e00e24452bcd4e8a4

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4b0e697622b651bf68ce97c79cccede37cf71d2e2badc8db2c48c744a22f72c7
MD5 43915ed966dade0e06119c5d79916886
BLAKE2b-256 15b0a76cc9ae4ac4e2f34df187156824f10dda1f37b41c6a373c4a961d056a00

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.3-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.3-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 300fa3e415582a52c6edc38969351f6c4b577c90877a6c416ef7601e36d72b60
MD5 524597ea1f3b8428810300c44f455af9
BLAKE2b-256 61584074d597c803f624105d07446b9652576193a66b0bddda886553f134e9f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page