Skip to main content

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Project description

ChatNoir Resiliparse

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Resiliparse is a part of the ChatNoir web analytics toolkit.

Installing Resiliparse

Pre-built Resiliparse binaries can be installed from PyPi:

pip install resiliparse

Building Resiliparse From Source

You can compile Resiliparse either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:

# Add Lexbor repository
curl -L https://lexbor.com/keys/lexbor_signing.key | sudo apt-key add -
echo "deb https://packages.lexbor.com/ubuntu/ $(lsb_release -sc) liblexbor" | \
    sudo tee /etc/apt/sources.list.d/lexbor.list

# Install build dependencies
sudo apt update
sudo apt install build-essential python3-dev libuchardet-dev liblexbor-dev libre2-dev

To build and install Resiliparse from PyPi, run

pip install --no-binary resiliparse resiliparse

That's it. If you prefer to build and install directly from this repository instead, run:

pip install -e resiliparse

To build the wheels without installing them, run:

pip wheel -e resiliparse

# Or:
pip install build && python -m build --wheel resiliparse

Usage Instructions

For detailed usage instructions, please consult the Resiliparse User Manual.

Cite Us

If you use ChatNoir or Resiliparse, please consider citing our ECIR 2018 demo paper:

@InProceedings{bevendorff:2018,
  address =             {Berlin Heidelberg New York},
  author =              {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
  booktitle =           {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},
  editor =              {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},
  month =               mar,
  publisher =           {Springer},
  series =              {Lecture Notes in Computer Science},
  site =                {Grenoble, France},
  title =               {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},
  year =                2018
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Resiliparse-0.10.3.tar.gz (580.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

Resiliparse-0.10.3-cp310-cp310-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.10Windows x86-64

Resiliparse-0.10.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

Resiliparse-0.10.3-cp310-cp310-macosx_10_14_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10macOS 10.14+ x86-64

Resiliparse-0.10.3-cp39-cp39-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.9Windows x86-64

Resiliparse-0.10.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

Resiliparse-0.10.3-cp39-cp39-macosx_10_14_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.9macOS 10.14+ x86-64

Resiliparse-0.10.3-cp38-cp38-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.8Windows x86-64

Resiliparse-0.10.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

Resiliparse-0.10.3-cp38-cp38-macosx_10_14_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.8macOS 10.14+ x86-64

Resiliparse-0.10.3-cp37-cp37m-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.7mWindows x86-64

Resiliparse-0.10.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

Resiliparse-0.10.3-cp37-cp37m-macosx_10_14_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.7mmacOS 10.14+ x86-64

File details

Details for the file Resiliparse-0.10.3.tar.gz.

File metadata

  • Download URL: Resiliparse-0.10.3.tar.gz
  • Upload date:
  • Size: 580.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for Resiliparse-0.10.3.tar.gz
Algorithm Hash digest
SHA256 47fe8ba69a68eda15f774d587822ee49e683efd47771717bf7f6e8512f8f5bbc
MD5 1085c083e875d743ee869d81f4435f0e
BLAKE2b-256 ab66cea5a46d96067ad1a2cc423761669d87c3110d99b4a448a1be3463197502

See more details on using hashes here.

File details

Details for the file Resiliparse-0.10.3-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: Resiliparse-0.10.3-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for Resiliparse-0.10.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 dcd49e13a5630d539a90aafb1e12aa933893e085b645f01cde266f67f75746d2
MD5 7bf18d8828cb12653824f6055427d831
BLAKE2b-256 2ad13340e1485c65cd987f46560c8d0e4927a46ff913f11f0b6bff4c2f0d0b23

See more details on using hashes here.

File details

Details for the file Resiliparse-0.10.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.10.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a54f2ab71d7986ad150672d1791ebce03105986d6642c511536a8a9391852406
MD5 db12a2b35cd9ec778eaff970c4851528
BLAKE2b-256 649428188143ed602c4e8ed998ba278e89eb79652967e9c90189f56c0fdb5cf9

See more details on using hashes here.

File details

Details for the file Resiliparse-0.10.3-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: Resiliparse-0.10.3-cp310-cp310-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.10, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for Resiliparse-0.10.3-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 3f6ac496b62776cf86f82f4db387bcb714bb5a622322a0312ee98fc6d861d66d
MD5 981ee8108bffe3b291edcf6e1c655ac4
BLAKE2b-256 6407ea85b883bb15ccb4eed7d04fb67375d93b455f45f1bbfa46d6d85d154b1c

See more details on using hashes here.

File details

Details for the file Resiliparse-0.10.3-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: Resiliparse-0.10.3-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for Resiliparse-0.10.3-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 2aa3136a1523d1b567c8a8ce29cb8129383dbfc9e6d605a69b3879c4f37fc996
MD5 2595afba0d6289ac68fdcc15ed64e8e5
BLAKE2b-256 73c965d31704a8557ce918b1ed3dfc1ac56edfe89b4dc9b8da149b57995d031a

See more details on using hashes here.

File details

Details for the file Resiliparse-0.10.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.10.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 aeb868dd9fb9cecaa106d3f27735e4f78a205cad76487434c8c18edb8c1590e4
MD5 caa3c57e5bb2b79e27ef9fc3e8afed8b
BLAKE2b-256 06e860c8be5d0e55ad5eab26ded8d60205c0f5b470ed6f84c2454d96eb9c575c

See more details on using hashes here.

File details

Details for the file Resiliparse-0.10.3-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: Resiliparse-0.10.3-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for Resiliparse-0.10.3-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 c247510799b329500248f190274fd4bc5f1c35758b0f8e355dba4c60c7b901b7
MD5 ceb39c9c23f15bd57290e1976cca9469
BLAKE2b-256 f9bf59a4b4123078fdd1f424f9451634bcffca04fb522a9d760e8a2669a629f9

See more details on using hashes here.

File details

Details for the file Resiliparse-0.10.3-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: Resiliparse-0.10.3-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for Resiliparse-0.10.3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 00479be4520479efb54403e4152691f7ae231cac4b81925acbbd4e87a590f8ba
MD5 3d4427f06ea981e4721c0dcf6c2c2f99
BLAKE2b-256 60cd610e294eea2e4efdb9dbf2cca6faa3a6d20bf833556542ddc9eac6ec2898

See more details on using hashes here.

File details

Details for the file Resiliparse-0.10.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.10.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b94040d6e61f7cda223e5e38af97ff3c71208c3d1ad2fc0dd7d9b53bea608a27
MD5 e9c83919d5f09f2ef3976cc7f69f56a7
BLAKE2b-256 3aee18e7f35b985c6100c3c689d45d4063ca6ba0f88314b762fe63995a037a49

See more details on using hashes here.

File details

Details for the file Resiliparse-0.10.3-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: Resiliparse-0.10.3-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for Resiliparse-0.10.3-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 09af3300a60cdacda85f77e179176eb078e09516bb6dd226e1e4e8fb91da3d23
MD5 2d77eed05a15ce7ae5e237da28f62390
BLAKE2b-256 7031de8c27aefd12556fd53da5db4c8ace292d4fbb9f242035d34e5d1764823b

See more details on using hashes here.

File details

Details for the file Resiliparse-0.10.3-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: Resiliparse-0.10.3-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for Resiliparse-0.10.3-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 acc7c91b14d505bcac9a3081245179c2154f7a3bd7a0c33d136caf5ef5a4a047
MD5 409f0df466a9f0938a3bc66fd9549e9e
BLAKE2b-256 e35c2c35a3252cb65d80d4b3e4a3f9880f523ecefc16d17c31cb06bb95425faa

See more details on using hashes here.

File details

Details for the file Resiliparse-0.10.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.10.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 49dd30dd4c14bbe6fd95fdcd04aaee5eb01a7f3ddea3c238ae0ee74061070f90
MD5 6e7a2fdaebfa8e543014e22c9b49bbbc
BLAKE2b-256 79954378adb05584a121a849f745194c45e39d8d0e1f55bb91eb87b0fd3e44f7

See more details on using hashes here.

File details

Details for the file Resiliparse-0.10.3-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: Resiliparse-0.10.3-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for Resiliparse-0.10.3-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 7be2568f961ff2e2ab8e38b775826bdfb583ebbac2625780519dc9dbf65ad293
MD5 bff50e2824625f4a1b26148d8ef50c94
BLAKE2b-256 4856af1dd94233c7a7a255b14e57a52ee4dfd8f2d00a3be6b7843d0bb4fbd6b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page