Simple, type-safe access to the ChatNoir search API.
Project description
chatnoir-api
Simple, type-safe access to the ChatNoir search API.
Working with PyTerrier? Check out the chatnoir-pyterrier package.
Installation
Install the package from PyPI:
pip install chatnoir-api
Usage
The ChatNoir API offers two main features: search with BM25F and retrieving document contents.
Search
You can use our Python client to search for documents.
The results object is an iterable wrapper of the search results which handles pagination for you.
List-style indexing is supported to access individual results or sub-lists of results:
from chatnoir_api.v1 import search
results = search("python library", api_key="<YOUR_API_KEY>")
top10_results = results[:10]
print(top10_results)
result_1234 = results[1234]
print(result_1234)
Search a specific index
To limit your search requests to a single index (e.g., ClueWeb22 category B), set the index parameter like this:
from chatnoir_api.v1 import search
results = search("python library", index="clueweb22/b", api_key="<YOUR_API_KEY>")
Phrase search
To search for phrases, use the search_phrases method in the same way as normal search:
from chatnoir_api.v1 import search_phrases
results = search_phrases("python library")
API key
The public, shared, default API key comes with a limited request budget. To use the ChatNoir API more extensively, please request a dedicated API key.
Then, use the api_key parameter to add it to your requests like this:
results = search("python library", api_key="<YOUR_API_KEY>")
Retrieve document contents
Often the title and ID of a document is not enough to effectively re-rank a list of search results.
To retrieve the full content or plain text for a given document you can use the html_contents helper function.
The html_contents function expects a ChatNoir-internal UUID, shorthand UUID, or a TREC ID and the index from which to retrieve the document.
Retrieve by TREC ID
You can retrieve a document by its TREC ID like this:
from chatnoir_api import cache_contents, Index
contents = cache_contents(
"clueweb09-en0051-90-00849",
index="clueweb09",
)
print(contents)
plain_contents = cache_contents(
"clueweb09-en0051-90-00849",
index="clueweb09",
plain=True,
)
print(plain_contents)
Retrieve by ChatNoir-internal short UUID
For newer ChatNoir versions, you can also retrieve a document by its ChatNoir-internal short UUID like this:
from chatnoir_api import cache_contents, Index, ShortUUID
contents = cache_contents(
ShortUUID("MzOlTIayX9ub7c13GLPr_g"),
index="clueweb22/b",
)
print(contents)
plain_contents = cache_contents(
ShortUUID("MzOlTIayX9ub7c13GLPr_g"),
index="clueweb22/b",
plain=True,
)
print(plain_contents)
Indexing
Head over to the ChatNoir ir_datasets indexer to learn more on how new ir_datasets-compatible datasets are indexed into ChatNoir.
Citation
If you use this package, please cite the paper from the ChatNoir authors. You can use the following BibTeX information for citation:
@InProceedings{bevendorff:2018,
address = {Berlin Heidelberg New York},
author = {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
booktitle = {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},
editor = {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},
month = mar,
publisher = {Springer},
series = {Lecture Notes in Computer Science},
site = {Grenoble, France},
title = {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},
year = 2018
}
@InProceedings{merker:2025a,
address = {Cham, Switzerland},
author = {Jan Heinrich Merker and Janek Bevendorff and Maik Fr{\"o}be and Tim Hagen and Harrisen Scells and Matti Wiegmann and Benno Stein and Matthias Hagen and Martin Potthast},
booktitle = {Advances in Information Retrieval. 47th European Conference on IR Research (ECIR 2025)},
doi = {10.1007/978-3-031-88720-8_17},
editor = {Claudia Hauff and Craig Macdonal and Dietmar Jannach and Gabriella Kazai and Franco Maria Nardini and Fabio Pinelli and Fabrizio Silvestri and Nicola Tonellotto},
month = apr,
pages = {96--104},
publisher = {Springer Nature},
series = {Lecture Notes in Computer Science},
site = {Lucca, Italy},
title = {{Web-scale Retrieval Experimentation with chatnoir-pyterrier}},
volume = 15576,
year = 2025
}
Development
To build this package and contribute to its development you need to install the build, and setuptools and wheel packages:
pip install build setuptools wheel
(On most systems, these packages are already pre-installed.)
Developer installation
Install package and test dependencies:
pip install -e .[tests]
Testing
Configure the API keys for testing:
export CHATNOIR_API_KEY="<API_KEY>"
export CHATNOIR_API_KEY_CHAT="<API_KEY>"
Verify your changes against the test suite to verify.
ruff check . # Code format and LINT
mypy . # Static typing
bandit -c pyproject.toml -r . # Security
pytest . # Unit tests
Please also add tests for your newly developed code.
Build wheels
Wheels for this package can be built with:
python -m build
Support
If you hit any problems using this package, please file an issue. We're happy to help!
License
This repository is released under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chatnoir_api-3.4.2.tar.gz.
File metadata
- Download URL: chatnoir_api-3.4.2.tar.gz
- Upload date:
- Size: 23.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b4758dc163c29ec07a4b95649e2c5c39c276188199e2cbe84298e0f471f0c12
|
|
| MD5 |
7bde7fa65ff9166f70649e7ca2cc7bde
|
|
| BLAKE2b-256 |
0ee69f663f6478cb24ca9a1352f0c4c0b458da2d1aaf3dab04649d48e0fdeefd
|
Provenance
The following attestation bundles were made for chatnoir_api-3.4.2.tar.gz:
Publisher:
ci.yml on chatnoir-eu/chatnoir-api
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chatnoir_api-3.4.2.tar.gz -
Subject digest:
7b4758dc163c29ec07a4b95649e2c5c39c276188199e2cbe84298e0f471f0c12 - Sigstore transparency entry: 687473717
- Sigstore integration time:
-
Permalink:
chatnoir-eu/chatnoir-api@1e618cc068eee068958d87446bf78d6e4e38fc51 -
Branch / Tag:
refs/tags/3.4.2 - Owner: https://github.com/chatnoir-eu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@1e618cc068eee068958d87446bf78d6e4e38fc51 -
Trigger Event:
push
-
Statement type:
File details
Details for the file chatnoir_api-3.4.2-py3-none-any.whl.
File metadata
- Download URL: chatnoir_api-3.4.2-py3-none-any.whl
- Upload date:
- Size: 24.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
486e0a9612c47251cd7793d878a0b1927144a2cd64001d3c3b9acd00c530959c
|
|
| MD5 |
98efb11fb21ffb1cc583b9e417865cc3
|
|
| BLAKE2b-256 |
76d4b330dda4bc76ed953a85235206a08d0b8328b68eb54539dd4e615e96fb4d
|
Provenance
The following attestation bundles were made for chatnoir_api-3.4.2-py3-none-any.whl:
Publisher:
ci.yml on chatnoir-eu/chatnoir-api
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chatnoir_api-3.4.2-py3-none-any.whl -
Subject digest:
486e0a9612c47251cd7793d878a0b1927144a2cd64001d3c3b9acd00c530959c - Sigstore transparency entry: 687473766
- Sigstore integration time:
-
Permalink:
chatnoir-eu/chatnoir-api@1e618cc068eee068958d87446bf78d6e4e38fc51 -
Branch / Tag:
refs/tags/3.4.2 - Owner: https://github.com/chatnoir-eu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@1e618cc068eee068958d87446bf78d6e4e38fc51 -
Trigger Event:
push
-
Statement type: