Skip to main content

Use the ChatNoir search engine in PyTerrier.

Project description

PyPi CI Code coverage Python Google Colab Issues Commit activity Downloads License

🔍 chatnoir-pyterrier

Use the ChatNoir REST-API in PyTerrier for retrieval/re-ranking against large corpora such as ClueWeb09, ClueWeb12, ClueWeb22, or MS MARCO.

Powered by the chatnoir-api package.

Installation

Install the package from PyPI:

pip install chatnoir-pyterrier

Usage

You can use the ChatNoirRetrieve PyTerrier module in any PyTerrier pipeline, like you would do with BatchRetrieve.

from chatnoir_pyterrier import ChatNoirRetrieve, Feature

chatnoir = ChatNoirRetrieve(index="msmarco-document-v2.1", features=Feature.SNIPPET_TEXT)
chatnoir.search("python library")

Features

ChatNoir provides an extensive set of extra features, such as the full text or page rank / spam rank (for some indices). These can easily be included in the response data frame for usage in subsequent PyTerrier re-ranking stages like so:

from chatnoir_pyterrier import ChatNoirRetrieve, Feature

chatnoir_msmarco_snippet = ChatNoirRetrieve(index="msmarco-document-v2.1", features=Feature.SNIPPET_TEXT)
chatnoir_msmarco_snippet.search("python library")

chatnoir_cw09_page_spam_rank = ChatNoirRetrieve(index="clueweb09", features=Feature.PAGE_RANK | Feature.SPAM_RANK)
chatnoir_cw09_page_spam_rank.search("python library")

Advanced usage

Please check out our sample notebook or open it in Google Colab.

We also provide a hands-on guide for the Touché 2023 shared tasks here.

Experiments

With chatnoir-pyterrier, it is easy to run benchmarks on a number of shared tasks that run on larger document collections. We demonstrate this by running ChatNoir retrieval on all suported TREC, CLEF, and NTCIR shared tasks available in ir_datasets.

First install the experiment dependencies:

pip install -e .[experiment]

To run the experiments, first create the runs by running:

ray job submit --runtime-env examples/ray-runtime-env.yml --no-wait -- python examples/experiment.py 

This will create runs for each shared task in parallel and save it to a cache.

After creating the runs, the experiment.ipynb notebook can be used to analyze the results.

Development

To build this package and contribute to its development you need to install the build, and setuptools and wheel packages:

pip install build setuptools wheel

(On most systems, these packages are already pre-installed.)

Development installation

Install package and test dependencies:

pip install -e .[test]

Testing

Configure the API keys for testing:

export CHATNOIR_API_KEY="<API_KEY>"

Verify your changes against the test suite to verify.

ruff check .                   # Code format and LINT
mypy .                         # Static typing
bandit -c pyproject.toml -r .  # Security
pytest .                       # Unit tests

Please also add tests for your newly developed code.

Build wheels

Wheels for this package can be built with:

python -m build

Support

If you hit any problems using this package, please file an issue. We're happy to help!

License

This repository is released under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chatnoir_pyterrier-3.0.4.tar.gz (34.5 kB view details)

Uploaded Source

Built Distribution

chatnoir_pyterrier-3.0.4-py3-none-any.whl (31.9 kB view details)

Uploaded Python 3

File details

Details for the file chatnoir_pyterrier-3.0.4.tar.gz.

File metadata

  • Download URL: chatnoir_pyterrier-3.0.4.tar.gz
  • Upload date:
  • Size: 34.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for chatnoir_pyterrier-3.0.4.tar.gz
Algorithm Hash digest
SHA256 f3e65d942ef1fd88f44180758f7820437757f070e62167589ad7db8a2ebfa183
MD5 21095ba5d132b3b63ebb7da192befabf
BLAKE2b-256 0473aebbe6c6e0589e24322991b5b21bc76e16c09ac60159a9795e61c945cc95

See more details on using hashes here.

Provenance

The following attestation bundles were made for chatnoir_pyterrier-3.0.4.tar.gz:

Publisher: ci.yml on chatnoir-eu/chatnoir-pyterrier

Attestations:

File details

Details for the file chatnoir_pyterrier-3.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for chatnoir_pyterrier-3.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f364bef43d88ad627a7c56839a1c047b29bc9afa40229e34a41e42246f597d1d
MD5 0a5f7ecde7f8f3d10f529270e5bb74c1
BLAKE2b-256 97dcbd543bdcefd8b6a923bf2434e4db10ef99831a5a7201992b282007a9047d

See more details on using hashes here.

Provenance

The following attestation bundles were made for chatnoir_pyterrier-3.0.4-py3-none-any.whl:

Publisher: ci.yml on chatnoir-eu/chatnoir-pyterrier

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page