Simple, type-safe access to the ChatNoir search API.
Project description
🔍 chatnoir-api
Simple, type-safe access to the ChatNoir search API.
Working with PyTerrier? Check out the chatnoir-pyterrier
package.
Installation
Install the package from PyPI:
pip install chatnoir-api
Usage
The ChatNoir API offers two main features: search with BM25F and retrieving document contents.
Search
To search with the ChatNoir API you need to request an API key.
Then you can use our Python client to search for documents.
The results
object is an iterable wrapper of the search results which handles pagination for you.
List-style indexing is supported to access individual results or sub-lists of results:
from chatnoir_api.v1 import search
api_key: str = "<API_KEY>"
results = search(api_key, "python library")
top10_results = results[:10]
print(top10_results)
result_1234 = results[1234]
print(result_1234)
Search the new ChatNoir
There's a new ChatNoir version with the same API interface. To run your search requests against the new API (e.g., if you want to search the ClueWeb22), set staging=True
like this:
from chatnoir_api import Index
from chatnoir_api.v1 import search
api_key: str = "<API_KEY>"
results = search(api_key, "python library", staging=True, index=Index.ClueWeb22)
Note for Touché 2023 participants: Set index=Index.ClueWeb22
to search the ClueWeb22 index. (Otherwise, results from the ClueWeb09 and ClueWeb12 indices will be included.)
Phrase Search
To search for phrases, use the search_phrases
method in the same way as normal search
:
from chatnoir_api.v1 import search_phrases
api_key: str = "<API_KEY>"
results = search_phrases(api_key, "python library", staging=True)
Retrieve Document Content
Often the title and ID of a document is not enough to effectively re-rank a list of search results.
To retrieve the full content or plain text for a given document you can use the html_contents
helper function.
The html_contents
function expects a ChatNoir-internal UUID, shorthand UUID, or a TREC ID
and the index from which to retrieve the document.
Retrieve by TREC ID
You can retrieve a document by its TREC ID like this:
from chatnoir_api import cache_contents, Index
contents = cache_contents(
"clueweb09-en0051-90-00849",
Index.ClueWeb09,
)
print(contents)
plain_contents = cache_contents(
"clueweb09-en0051-90-00849",
Index.ClueWeb09,
plain=True,
)
print(plain_contents)
Retrieve by ChatNoir-internal UUID
You can also retrieve a document by its ChatNoir-internal UUID like this:
from uuid import UUID
from chatnoir_api import cache_contents, Index
contents = cache_contents(
UUID("e635baa8-7341-596a-b3cf-b33c05954361"),
Index.CommonCrawl1511,
)
print(contents)
plain_contents = cache_contents(
UUID("e635baa8-7341-596a-b3cf-b33c05954361"),
Index.CommonCrawl1511,
plain=True,
)
print(plain_contents)
Retrieve by ChatNoir-internal short UUID
For newer ChatNoir versions, you can also retrieve a document by its ChatNoir-internal short UUID like this:
from chatnoir_api import cache_contents, Index, ShortUUID
contents = cache_contents(
ShortUUID("6svePe3PXteDeGPk1XqTLA"),
Index.ClueWeb22,
staging=True,
)
print(contents)
plain_contents = cache_contents(
ShortUUID("6svePe3PXteDeGPk1XqTLA"),
Index.ClueWeb22,
plain=True,
staging=True,
)
print(plain_contents)
Citation
If you use this package, please cite the paper from the ChatNoir authors. You can use the following BibTeX information for citation:
@InProceedings{bevendorff:2018,
address = {Berlin Heidelberg New York},
author = {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
booktitle = {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},
editor = {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},
ids = {potthast:2018c,stein:2018c},
month = mar,
publisher = {Springer},
series = {Lecture Notes in Computer Science},
site = {Grenoble, France},
title = {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},
year = 2018
}
Development
To build and develop this package you need to install the build
package:
pip install build
Installation
Install package dependencies:
pip install -e .
Testing
Install test dependencies:
pip install -e .[test]
Verify your changes against the test suite to verify.
flake8 chatnoir_api examples
pylint -E chatnoir_api examples
CHATNOIR_API_KEY="<API_KEY>" CHATNOIR_API_KEY_STAGING="<API_KEY>" pytest chatnoir_api examples
Please also add tests for your newly developed code.
Build wheels
Wheels for this package can be built by:
python -m build
License
This repository is released under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file chatnoir-api-2.0.7.tar.gz
.
File metadata
- Download URL: chatnoir-api-2.0.7.tar.gz
- Upload date:
- Size: 22.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b2e4a2ed2dc28232dee855e4ac5b1d7f75e512716ab033089213b2b6919f5b7 |
|
MD5 | 1ebdca311ad627bf1b8025d121aefaf5 |
|
BLAKE2b-256 | 1d51dbe08d63185b1657820a3e7721ea96acb75723c4e40347a730d3dc251509 |
File details
Details for the file chatnoir_api-2.0.7-py3-none-any.whl
.
File metadata
- Download URL: chatnoir_api-2.0.7-py3-none-any.whl
- Upload date:
- Size: 24.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d60f6b08b07bd73c6d0dcd06e9dbfa37426c75049bf86cdc40e4d8adb454dfe |
|
MD5 | 9d0c2bee4eb825ac292a38b8e8e03598 |
|
BLAKE2b-256 | e3bc78cfd8a2564bbaa1d76053f22ab339490df82fd5fa6370434cabd69a39fc |