Skip to main content

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Project description

ChatNoir Resiliparse

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Resiliparse is a part of the ChatNoir web analytics toolkit.

Installing Resiliparse

Pre-built Resiliparse binaries can be installed from PyPi:

pip install resiliparse

Building Resiliparse From Source

You can compile Resiliparse either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:

# Add Lexbor repository
curl -L https://lexbor.com/keys/lexbor_signing.key | sudo apt-key add -
echo "deb https://packages.lexbor.com/ubuntu/ $(lsb_release -sc) liblexbor" | \
    sudo tee /etc/apt/sources.list.d/lexbor.list

# Install build dependencies
sudo apt update
sudo apt install build-essential python3-dev libuchardet-dev liblexbor-dev libre2-dev

To build and install Resiliparse from PyPi, run

pip install --no-binary resiliparse resiliparse

That's it. If you prefer to build and install directly from this repository instead, run:

pip install -e resiliparse

To build the wheels without installing them, run:

pip wheel -e resiliparse

# Or:
pip install build && python -m build --wheel resiliparse

Usage Instructions

For detailed usage instructions, please consult the Resiliparse User Manual.

Cite Us

If you use ChatNoir or Resiliparse, please consider citing our ECIR 2018 demo paper:

@InProceedings{bevendorff:2018,
  address =             {Berlin Heidelberg New York},
  author =              {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
  booktitle =           {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},
  editor =              {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},
  month =               mar,
  publisher =           {Springer},
  series =              {Lecture Notes in Computer Science},
  site =                {Grenoble, France},
  title =               {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},
  year =                2018
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Resiliparse-0.13.3.tar.gz (601.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

Resiliparse-0.13.3-cp310-cp310-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.10Windows x86-64

Resiliparse-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

Resiliparse-0.13.3-cp310-cp310-macosx_10_14_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10macOS 10.14+ x86-64

Resiliparse-0.13.3-cp39-cp39-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.9Windows x86-64

Resiliparse-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

Resiliparse-0.13.3-cp39-cp39-macosx_10_14_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.9macOS 10.14+ x86-64

Resiliparse-0.13.3-cp38-cp38-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.8Windows x86-64

Resiliparse-0.13.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

Resiliparse-0.13.3-cp38-cp38-macosx_10_14_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.8macOS 10.14+ x86-64

Resiliparse-0.13.3-cp37-cp37m-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.7mWindows x86-64

Resiliparse-0.13.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

Resiliparse-0.13.3-cp37-cp37m-macosx_10_14_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.7mmacOS 10.14+ x86-64

File details

Details for the file Resiliparse-0.13.3.tar.gz.

File metadata

  • Download URL: Resiliparse-0.13.3.tar.gz
  • Upload date:
  • Size: 601.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for Resiliparse-0.13.3.tar.gz
Algorithm Hash digest
SHA256 13b028ee2a7aa562b219b3ec4884425d57f8dbde3a38c4197ea70aecb0084072
MD5 e9e912fd30d47811171f77e9b95dbab6
BLAKE2b-256 4f0c6d50a7ae73fe5698612696158207b284757dac6b59b630583251013f8158

See more details on using hashes here.

File details

Details for the file Resiliparse-0.13.3-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.13.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 6125374bf501c37c8476e9e5798f2034446519aa6875c20f4988c3ccf4938697
MD5 7029e5b7c80509d650c7b01f4707b578
BLAKE2b-256 00b17fb89c173f1923cc59c0fe6accbd66edbc7e6a4730f02be62357cba98597

See more details on using hashes here.

File details

Details for the file Resiliparse-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c3e59fe5b98ca8eb9c9ae51defd2f11d6e6ceb676b47aec137003d35573d16b8
MD5 8dca71bc41ddbc9beb1023001c08a2fe
BLAKE2b-256 7f5d3feacb84d3dd6796b52dcc8487e7ac849725173f8e9fc065dbd3e3512e61

See more details on using hashes here.

File details

Details for the file Resiliparse-0.13.3-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.13.3-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 f4d7f36bc90dc95c9287427181373b6d7b82aaa17e2b95ce3c8f87864ca22ecd
MD5 508b09f4027b2c7ea017bd98222a8a9f
BLAKE2b-256 2caa648ae6441c9a4c32eb772ce11bb5258a46579effaf7fdbaa7797a2f60028

See more details on using hashes here.

File details

Details for the file Resiliparse-0.13.3-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: Resiliparse-0.13.3-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for Resiliparse-0.13.3-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 20d394ee3d9e75d2679d38df86ddbbad828c90c79c78c5b9cfe14a2cd3257bd2
MD5 3521b7dcb0955764e11796e2fc28dd04
BLAKE2b-256 9804df6972275b2c4bafdc953b85b0a8aa475da82c4771865ae1aa4b9a72c31e

See more details on using hashes here.

File details

Details for the file Resiliparse-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 803df54f826bfaa5c1bf0307ad768a73a8d7ffafe8f00cb0034114a53bc8ce77
MD5 7de52f617d4cb7edde36070d3b4ec71e
BLAKE2b-256 c8c8164d588731cb667583de6e87ccf1de4cdd56dc7b5f75b074c74f327d992d

See more details on using hashes here.

File details

Details for the file Resiliparse-0.13.3-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.13.3-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 11b6a966ea6ab278ab60b60e18fe2f2279d5c527c57328cd9898cb5b20efd177
MD5 2d19295120776aef430f8073097d7ffd
BLAKE2b-256 7d315b5528eb7cfd8d1cb6adf563025d1ede5dd45ebc09a2e6bc932006f4c3e2

See more details on using hashes here.

File details

Details for the file Resiliparse-0.13.3-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: Resiliparse-0.13.3-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for Resiliparse-0.13.3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 7f3546ee023ecb128fd2b28494396a6331aa5b512e6a34b4a48332c158b5be8c
MD5 63abc94227d0b76c4f41b1d869e96451
BLAKE2b-256 752138171e79429a67ff439993d9e32cce7dc3ea42dfc95caf3c7d6c77959703

See more details on using hashes here.

File details

Details for the file Resiliparse-0.13.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.13.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e6d528aa24056880c229e7042dcf966137c6a17e0ef6228f1f5d1cac0b15e229
MD5 5608380f349128c1a148cba14ec5b6cb
BLAKE2b-256 c37975661f62c679f36168539f20cf0c9188e882966efc2b9acbe5528e70753e

See more details on using hashes here.

File details

Details for the file Resiliparse-0.13.3-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.13.3-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 7ecab15592d635bf64057f68efe60133ac584a6c94af52f5a5939dee5b18e9ab
MD5 7c05ed30c06d927b4e84c83a9bde7320
BLAKE2b-256 fe8caa0b957b1639bb73c25f5c0e3feed890fd66b41d895617a0502129dc9f95

See more details on using hashes here.

File details

Details for the file Resiliparse-0.13.3-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.13.3-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 ec675d71d2707aced67f08da4d4c5435094e6478909272d0e7e0348df9230571
MD5 46fb91d03dbdef4310af7b2f8ee296d7
BLAKE2b-256 f2ae0540a84673635060aecc6512478f72d417f6416bf89a0f42cf5a6c9ea7d0

See more details on using hashes here.

File details

Details for the file Resiliparse-0.13.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.13.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 768d74d74c600b00fee19a587bacba9a7dbf38bbc798d6c8c6380853944e778d
MD5 bebe5f533514d1eaa7c0a23053a98920
BLAKE2b-256 57054b8b48938c672e1efcb82a2365bedb86a9b3d38481e768fb841f2159ed66

See more details on using hashes here.

File details

Details for the file Resiliparse-0.13.3-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.13.3-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 905e3852e8727f2566e9f718b6a55e9982f44ca34b27b9e3ad0ab54555454ee0
MD5 6ce5396aed56f81024a44baaa608f55c
BLAKE2b-256 d22d114478c8ba28e02efc7fb90d4e4d0ed23038a665275495b2ea28073db1fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page