Use the ChatNoir search engine in PyTerrier.
Project description
🔍 chatnoir-pyterrier
Use the ChatNoir REST-API in PyTerrier for retrieval/re-ranking against large corpora such as ClueWeb09, ClueWeb12, ClueWeb22, or MS MARCO.
Powered by the chatnoir-api
package.
Installation
Install the package from PyPI:
pip install chatnoir-pyterrier
Usage
You can use the ChatNoirRetrieve
PyTerrier module in any PyTerrier pipeline, like you would do with BatchRetrieve
.
from chatnoir_pyterrier import ChatNoirRetrieve, Feature
chatnoir = ChatNoirRetrieve(index="msmarco-document-v2.1", features=Feature.SNIPPET_TEXT)
chatnoir.search("python library")
Features
ChatNoir provides an extensive set of extra features, such as the full text or page rank / spam rank (for some indices). These can easily be included in the response data frame for usage in subsequent PyTerrier re-ranking stages like so:
from chatnoir_pyterrier import ChatNoirRetrieve, Feature
chatnoir_msmarco_snippet = ChatNoirRetrieve(index="msmarco-document-v2.1", features=Feature.SNIPPET_TEXT)
chatnoir_msmarco_snippet.search("python library")
chatnoir_cw09_page_spam_rank = ChatNoirRetrieve(index="clueweb09", features=Feature.PAGE_RANK | Feature.SPAM_RANK)
chatnoir_cw09_page_spam_rank.search("python library")
Advanced usage
Please check out our sample notebook or open it in Google Colab.
We also provide a hands-on guide for the Touché 2023 shared tasks here.
Experiments
With chatnoir-pyterrier, it is easy to run benchmarks on a number of shared tasks that run on larger document collections. We demonstrate this by running ChatNoir retrieval on all suported TREC, CLEF, and NTCIR shared tasks available in ir_datasets.
First install the experiment dependencies:
pip install -e .[experiment]
To run the experiments, first create the runs by running:
ray job submit --runtime-env examples/ray-runtime-env.yml --no-wait -- python examples/experiment.py
This will create runs for each shared task in parallel and save it to a cache.
After creating the runs, the experiment.ipynb
notebook can be used to analyze the results.
Development
To build this package and contribute to its development you need to install the build
, and setuptools
and wheel
packages:
pip install build setuptools wheel
(On most systems, these packages are already pre-installed.)
Development installation
Install package and test dependencies:
pip install -e .[test]
Testing
Configure the API keys for testing:
export CHATNOIR_API_KEY="<API_KEY>"
Verify your changes against the test suite to verify.
ruff check . # Code format and LINT
mypy . # Static typing
bandit -c pyproject.toml -r . # Security
pytest . # Unit tests
Please also add tests for your newly developed code.
Build wheels
Wheels for this package can be built with:
python -m build
Support
If you hit any problems using this package, please file an issue. We're happy to help!
License
This repository is released under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file chatnoir_pyterrier-3.0.4.tar.gz
.
File metadata
- Download URL: chatnoir_pyterrier-3.0.4.tar.gz
- Upload date:
- Size: 34.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3e65d942ef1fd88f44180758f7820437757f070e62167589ad7db8a2ebfa183 |
|
MD5 | 21095ba5d132b3b63ebb7da192befabf |
|
BLAKE2b-256 | 0473aebbe6c6e0589e24322991b5b21bc76e16c09ac60159a9795e61c945cc95 |
Provenance
The following attestation bundles were made for chatnoir_pyterrier-3.0.4.tar.gz
:
Publisher:
ci.yml
on chatnoir-eu/chatnoir-pyterrier
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
chatnoir_pyterrier-3.0.4.tar.gz
- Subject digest:
f3e65d942ef1fd88f44180758f7820437757f070e62167589ad7db8a2ebfa183
- Sigstore transparency entry: 148679581
- Sigstore integration time:
- Predicate type:
File details
Details for the file chatnoir_pyterrier-3.0.4-py3-none-any.whl
.
File metadata
- Download URL: chatnoir_pyterrier-3.0.4-py3-none-any.whl
- Upload date:
- Size: 31.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f364bef43d88ad627a7c56839a1c047b29bc9afa40229e34a41e42246f597d1d |
|
MD5 | 0a5f7ecde7f8f3d10f529270e5bb74c1 |
|
BLAKE2b-256 | 97dcbd543bdcefd8b6a923bf2434e4db10ef99831a5a7201992b282007a9047d |
Provenance
The following attestation bundles were made for chatnoir_pyterrier-3.0.4-py3-none-any.whl
:
Publisher:
ci.yml
on chatnoir-eu/chatnoir-pyterrier
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
chatnoir_pyterrier-3.0.4-py3-none-any.whl
- Subject digest:
f364bef43d88ad627a7c56839a1c047b29bc9afa40229e34a41e42246f597d1d
- Sigstore transparency entry: 148679582
- Sigstore integration time:
- Predicate type: