Skip to main content

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Project description

ChatNoir Resiliparse

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

Resiliparse is a part of the ChatNoir web analytics toolkit.

Installing Resiliparse

Pre-built Resiliparse binaries can be installed from PyPi:

pip install resiliparse

Building Resiliparse From Source

You can compile Resiliparse either from the PyPi source package or directly from this repository, though in any case, you need to install all required build-time dependencies first. On Ubuntu, this is done as follows:

# Add Lexbor repository
curl -sL https://lexbor.com/keys/lexbor_signing.key | \
  sudo gpg --dearmor --output /etc/apt/trusted.gpg.d/lexbor.gpg
echo "deb https://packages.lexbor.com/ubuntu/ $(lsb_release -sc) liblexbor" | \
    sudo tee /etc/apt/sources.list.d/lexbor.list

# Install build dependencies (requires libre2-dev>=2022-04-01)
sudo apt update
sudo apt install build-essential python3-dev libuchardet-dev liblexbor-dev libre2-dev

To build and install Resiliparse from PyPi, run

pip install --no-binary resiliparse resiliparse

That's it. If you prefer to build and install directly from this repository instead, run:

pip install -e resiliparse

To build the wheels without installing them, run:

pip wheel -e resiliparse

# Or:
pip install build && python -m build --wheel resiliparse

Usage Instructions

For detailed usage instructions, please consult the Resiliparse User Manual.

Cite Us

If you use ChatNoir or Resiliparse, please consider citing our ECIR 2018 demo paper:

@InProceedings{bevendorff:2018,
  address =             {Berlin Heidelberg New York},
  author =              {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
  booktitle =           {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},
  editor =              {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},
  month =               mar,
  publisher =           {Springer},
  series =              {Lecture Notes in Computer Science},
  site =                {Grenoble, France},
  title =               {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},
  year =                2018
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

resiliparse-0.14.7.tar.gz (88.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

Resiliparse-0.14.7-cp312-cp312-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.12Windows x86-64

Resiliparse-0.14.7-cp312-cp312-manylinux_2_28_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.7-cp312-cp312-manylinux_2_28_aarch64.whl (6.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

Resiliparse-0.14.7-cp312-cp312-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

Resiliparse-0.14.7-cp312-cp312-macosx_10_9_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.12macOS 10.9+ x86-64

Resiliparse-0.14.7-cp311-cp311-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.11Windows x86-64

Resiliparse-0.14.7-cp311-cp311-manylinux_2_28_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.7-cp311-cp311-manylinux_2_28_aarch64.whl (6.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

Resiliparse-0.14.7-cp311-cp311-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

Resiliparse-0.14.7-cp311-cp311-macosx_10_9_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

Resiliparse-0.14.7-cp310-cp310-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.10Windows x86-64

Resiliparse-0.14.7-cp310-cp310-manylinux_2_28_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.7-cp310-cp310-manylinux_2_28_aarch64.whl (5.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ ARM64

Resiliparse-0.14.7-cp310-cp310-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

Resiliparse-0.14.7-cp310-cp310-macosx_10_9_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

Resiliparse-0.14.7-cp39-cp39-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.9Windows x86-64

Resiliparse-0.14.7-cp39-cp39-manylinux_2_28_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.7-cp39-cp39-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

Resiliparse-0.14.7-cp39-cp39-macosx_10_9_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

Resiliparse-0.14.7-cp38-cp38-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.8Windows x86-64

Resiliparse-0.14.7-cp38-cp38-manylinux_2_28_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.28+ x86-64

Resiliparse-0.14.7-cp38-cp38-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

Resiliparse-0.14.7-cp38-cp38-macosx_10_9_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

File details

Details for the file resiliparse-0.14.7.tar.gz.

File metadata

  • Download URL: resiliparse-0.14.7.tar.gz
  • Upload date:
  • Size: 88.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for resiliparse-0.14.7.tar.gz
Algorithm Hash digest
SHA256 4d7b4bd14a2e01e9cc4db96e431a0675c436d6d387d6518346475aa64105e40a
MD5 5071fa16fac0c555f249b4ed1ed1c09f
BLAKE2b-256 cd4dc6573edd1f9a05a15f5c7ec7347bc12822bf14a88d1c6f81f4a79a5a80b7

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 f756656a8260b6c22ea754963d9b1567e8dd7a5fb348fa06ec4dfb8ed8c012d1
MD5 f322cd1aa7f9578a65598d01b8862fef
BLAKE2b-256 2561bdc535658f18fd863eee7a399f88d2ba50078b9926c6d4d85a579beae22c

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 722e791f33a9526d9dac6383aea41f9d767b825ffe912701227145f0849eefe8
MD5 18f8e7fba525d995b234c73cb39a410d
BLAKE2b-256 7a3a27f96008b03f7d330f730c0d99e0c059702a36a38809b824f06320f16839

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2a28f81bbe57af5706ba3e722420601c61cd0a8845cfe973a5db559693525e88
MD5 1f1e459804c4d1ce0df1a7d2e8d4fd4f
BLAKE2b-256 a781261f497ac5dea77c1ef56f184c06ee3f860938999ebfdc90ea88eb614638

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fb4f2a328eedf00422e0eac87cb8e8d3c33a31826d5875ae4b31e0c7d048efbd
MD5 2b4f1fb801468f2e5942b3c34ed21d0b
BLAKE2b-256 fac6616fbbea02eaa4cdc7ff4d357f4c39e91b0a00bbcd30df796fba3282c9a3

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ae18a14482de73be9867970972c08442584198e3250cfb87f7aa6c0bb05acc77
MD5 6da832a1d7effc4324412f5440fa800d
BLAKE2b-256 58fc027ad38413f586b95a6e711c8cd3c0cc956ecd548fef0fa6da36a95eaf02

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 b862bf49f0a614e43828d54fb789c2ad11fa99d6244fdda6e18bd9f9355ef9b0
MD5 f6877b59986fbf1096e0c2d5c14a6bbf
BLAKE2b-256 deba1ac45b4d46ee8867b61c76ccbef9d9633568535ca6badb11b0536aed600c

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 69908a195661a330fb5384a8c156ccd5c0a1a21cf4069e78146a22b82aa35450
MD5 9161c5321d15a4face07692339fd64db
BLAKE2b-256 12bb77558c4940a7e75feac89dd2637303d1333c6c7febb17523630d1ce72970

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 7718a328d489867610f13a0e05f7c3199f23bac15b7dcc6bea90a587cf70d0bc
MD5 e49524c503fb2ce3339700c96f217cae
BLAKE2b-256 becaf0d6dc3519e89552e3055b8e2a58966a04d85e1374886ccb71d9d7f187bc

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b36dd4d644e33333122bc1d41d244eacbc29d03215e6ca89b1a67ed96f696bda
MD5 595011da9de16375160e2dcbad4e2921
BLAKE2b-256 2003fe20c21c23889a1f412e996cb54f4f05d8e40c5032ae3daaf55228bb4a5f

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 4d7ffa0b0c857392eefc17274a5318564b84dc66131642940cc73dfcc80f8c20
MD5 573af4158963946108f931b516089a42
BLAKE2b-256 9e5f5b3990755e56aacfa6f3009cf12abc30cf75d3a255fbde9c0819b988ef41

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 bd60c79218fa273ae2d785ff1f1dd15b9c43513edfad47d359ac17316c92b75c
MD5 c8a9ae556691e1c00192a4e7ab84b716
BLAKE2b-256 8819d7667c3215122099d3293902473494b7ec9a4d49305e4737bfa14b98387f

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 260d3f27421169c8eba758155d817bc44cdc8cf3edeecdda346f19fab71be9f3
MD5 d52bf0465a7fd617b0f0e51c61daa34b
BLAKE2b-256 2d5306321b030cf4463d6f1f622af866dd01e3d9b25dcd1d63738ec5b3f1badf

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 83cd459e8c07eb2b32e9d4931073476f2350e78dc42fcf6d5839a802a3a63098
MD5 0e1d1a1eee35115aed27435b853f55ab
BLAKE2b-256 367eaeacb44d55e14595cab82519b2295e81f3ff059ad7f471bdeb2d3e22107d

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 988b273f554b7393d3b32473257394d840f987284018240d5613ec8291d5436e
MD5 a6dddd61b3b246491d0fbc3a8674ab19
BLAKE2b-256 058de8adf113063186ec78582dcf9250fd51309ebb123e326691a2700d22597a

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7634982554281eab8cff5f732c4890245f7664c61e5433babef8b7ba98464e3a
MD5 50a7e9a1c68472ba7b45e257e357f837
BLAKE2b-256 109a1866d59a0edeee162d8719dc13a791bfe598726d6c48295784e0a02b9179

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: Resiliparse-0.14.7-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for Resiliparse-0.14.7-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 0664e40435239f332a83c128083da0ce7e020d82f361c2b218698a860a6264e6
MD5 febd6ae1df1243160d9dc87313f76f5c
BLAKE2b-256 b504fdf7efe7428184f79bd00749e6e8c4f3ca859841e68b2fea0fca667d0c4e

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 bd43a63a98d3c76ffb6ce89f6a73b3e30b1c74b145d599a354b642cc8bdecfd0
MD5 14ccf8f03f89773d2cc07d46327fda56
BLAKE2b-256 3262b38290047df20dde79df95a6c9a9633af202811059996c38e207bd275ffd

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fab18d174dea7639b690fc7f5b77760b4a7426c5997968e541827a6b7a963d0e
MD5 917a8029e69a114cfd0bbf9357e38600
BLAKE2b-256 e5f9bea542491498d9253087c97ae3cf588260acfaa393cee7063ed7b9f0046d

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 bae966c2e7c5fc4073ce0d87f2475ccb4f2fdfcc369cd0428693fa3b9fa23c58
MD5 91f3ad8dc02d82b20ba507b308763b05
BLAKE2b-256 b112b7f34809e5f47bd2d93a813c5e0c1b80a5963c4d1966fc8ad92cd1ff4bc9

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: Resiliparse-0.14.7-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for Resiliparse-0.14.7-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 d1fd29b5d3a8e19f5a79418d81196a6cc7506e29f8467fd5a366617f57baf525
MD5 f8b4e0b8d9039ed74a20dc96f4ad8536
BLAKE2b-256 6fdd334b78095e0aef1ea3f1926ab75c3bfd948894d1954cfa225245fce612e5

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f17fecf90461276db1c142da6c8fd222e9a734238f0aa4bc009599eac3e03333
MD5 7b2f4830a6db4d37c85ef2b92ce98210
BLAKE2b-256 f9128888856584c71f8b00092dab754fbef28a98159c954477747d0ffeecefbc

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 128ad3bb3c65aa464bfc953a08b09478e488c454892b1a4ef236e6a350d95e8c
MD5 556d8d52325994674febeef726e4ff20
BLAKE2b-256 96301eba47edb95480f38443f75fe8da76b5f9c32353790e106bf21fb7b18f7a

See more details on using hashes here.

File details

Details for the file Resiliparse-0.14.7-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for Resiliparse-0.14.7-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d08c19f4a0ee109017d6d4a6df56909cd8322aace649bb0b118a37139dba7ad2
MD5 eb4e36567f4cba7cd3e663705c9eecb4
BLAKE2b-256 b6976dbea11cbe29ba0214b4186249ba9fbf88e46c0841a8fce4bb7bb00099bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page