Skip to main content

Simple, type-safe access to the ChatNoir search API.

Reason this release was yanked:

Includes no sources.

Project description

PyPi CI Code coverage Python Issues Commit activity Downloads License

🔍 chatnoir-api

Simple, type-safe access to the ChatNoir search API.

Working with PyTerrier? Check out the chatnoir-pyterrier package.

Installation

Install the package from PyPI:

pip install chatnoir-api

Usage

The ChatNoir API offers two main features: search with BM25F and retrieving document contents.

Search

To search with the ChatNoir API you need to request an API key. Then you can use our Python client to search for documents. The results object is an iterable wrapper of the search results which handles pagination for you. List-style indexing is supported to access individual results or sub-lists of results:

from chatnoir_api.v1 import search

api_key: str = "<API_KEY>"
results = search(api_key, "python library")

top10_results = results[:10]
print(top10_results)

result_1234 = results[1234]
print(result_1234)

Search the new ChatNoir

There's a new ChatNoir version with the same API interface. To run your search requests against the new API (e.g., if you want to search the ClueWeb22), set staging=True like this:

from chatnoir_api import Index
from chatnoir_api.v1 import search

api_key: str = "<API_KEY>"
results = search(api_key, "python library", staging=True, index=Index.ClueWeb22)

Note for Touché 2023 participants: Set index=Index.ClueWeb22 to search the ClueWeb22 index. (Otherwise, results from the ClueWeb09 and ClueWeb12 indices will be included.)

Phrase Search

To search for phrases, use the search_phrases method in the same way as normal search:

from chatnoir_api.v1 import search_phrases

api_key: str = "<API_KEY>"
response = search_phrases(api_key, "python library", staging=True)

Chat

To generate text with the ChatNoir Chat API you need to request an API key from the admins. With your API key, you can chat with the cat, like this:

from chatnoir_api.chat import chat

api_key: str = "<API_KEY>"
answer = chat(api_key, "how are you?")

Retrieve Document Contents

Often the title and ID of a document is not enough to effectively re-rank a list of search results. To retrieve the full content or plain text for a given document you can use the html_contents helper function. The html_contents function expects a ChatNoir-internal UUID, shorthand UUID, or a TREC ID and the index from which to retrieve the document.

Retrieve by TREC ID

You can retrieve a document by its TREC ID like this:

from chatnoir_api import cache_contents, Index

contents = cache_contents(
    "clueweb09-en0051-90-00849",
    Index.ClueWeb09,
)
print(contents)

plain_contents = cache_contents(
    "clueweb09-en0051-90-00849",
    Index.ClueWeb09,
    plain=True,
)
print(plain_contents)

Retrieve by ChatNoir-internal UUID

You can also retrieve a document by its ChatNoir-internal UUID like this:

from uuid import UUID

from chatnoir_api import cache_contents, Index

contents = cache_contents(
    UUID("e635baa8-7341-596a-b3cf-b33c05954361"),
    Index.CommonCrawl1511,
)
print(contents)

plain_contents = cache_contents(
    UUID("e635baa8-7341-596a-b3cf-b33c05954361"),
    Index.CommonCrawl1511,
    plain=True,
)
print(plain_contents)

Retrieve by ChatNoir-internal short UUID

For newer ChatNoir versions, you can also retrieve a document by its ChatNoir-internal short UUID like this:

from chatnoir_api import cache_contents, Index, ShortUUID

contents = cache_contents(
    ShortUUID("6svePe3PXteDeGPk1XqTLA"),
    Index.ClueWeb22,
    staging=True,
)
print(contents)

plain_contents = cache_contents(
    ShortUUID("6svePe3PXteDeGPk1XqTLA"),
    Index.ClueWeb22,
    plain=True,
    staging=True,
)
print(plain_contents)

Citation

If you use this package, please cite the paper from the ChatNoir authors. You can use the following BibTeX information for citation:

@InProceedings{bevendorff:2018,
  address =               {Berlin Heidelberg New York},
  author =                {Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
  booktitle =             {Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018)},
  editor =                {Leif Azzopardi and Allan Hanbury and Gabriella Pasi and Benjamin Piwowarski},
  ids =                   {potthast:2018c,stein:2018c},
  month =                 mar,
  publisher =             {Springer},
  series =                {Lecture Notes in Computer Science},
  site =                  {Grenoble, France},
  title =                 {{Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl}},
  year =                  2018
}

Development

To build and develop this package you need to install the build package:

pip install build

Installation

Install package dependencies:

pip install -e .

Testing

Install test dependencies:

pip install -e .[test]

Verify your changes against the test suite to verify.

flake8 chatnoir_api examples
pylint -E chatnoir_api examples
CHATNOIR_API_KEY="<API_KEY>" CHATNOIR_API_KEY_STAGING="<API_KEY>" pytest chatnoir_api examples

Please also add tests for your newly developed code.

Build wheels

Wheels for this package can be built by:

python -m build

License

This repository is released under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chatnoir-api-2.1.4.tar.gz (23.8 kB view details)

Uploaded Source

Built Distribution

chatnoir_api-2.1.4-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file chatnoir-api-2.1.4.tar.gz.

File metadata

  • Download URL: chatnoir-api-2.1.4.tar.gz
  • Upload date:
  • Size: 23.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for chatnoir-api-2.1.4.tar.gz
Algorithm Hash digest
SHA256 c42f7e1214be2e42d618a5ed12712ec9790b441fc03253d087297a1a390e593c
MD5 6a692bfffef191efa8b7386fdc3e8399
BLAKE2b-256 0ef57b52755129dcffaff0c2f97714b4bc8c1e34acfec1ac898ef91f0ada1665

See more details on using hashes here.

File details

Details for the file chatnoir_api-2.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for chatnoir_api-2.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7b90727ee1d0baf69050099b0e30ee662b6a4474706f81793b405044d2ba9e12
MD5 25ff3bb6cbaa8f99bed5cdd4173ac172
BLAKE2b-256 6161b6c63038efb2ca44d0d7c9acf11034e2157ac9539a286259886d6f6d3dec

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page