Skip to main content

Anserini + PyTerrier

Project description

Anserini is a retrieval toolkit built on top of Lucene. pyterrier-anserini provides a PyTerrier-compatible interface to Anserini, allowing you to easily run experiments and combine it with other systems.

Quick Start

You can install pyterrier-anserini with pip:

$ pip install pyterrier-anserini

pyterrier_anserini.AnseriniIndex is the main class for working with Anserini. For instance, you can download a pre-built index from HuggingFace and retrieve with BM25 using the following snippet:

>>> from pyterrier_anserini import AnseriniIndex
>>> index = AnseriniIndex.from_hf('macavaney/msmarco-passage.anserini')
>>> bm25 = index.bm25(include_fields=['contents'])
>>> bm25.search('terrier breeds')
  qid           query    docno    score  rank                                      contents
0   1  terrier breeds  5785957  11.9588     0  The Jack Russell Terrier and the Russell ...
1   1  terrier breeds  7455374  11.9343     1  FCI, ANKC, and IKC recognize the shorts a...
2   1  terrier breeds  1406578  11.8640     2  Norfolk terrier (English breed of small t...
3   1  terrier breeds  3984886  11.7518     3  Terrier Group is the name of a breed Grou...
4   1  terrier breeds  7728131  11.5660     4  The Yorkshire Terrier didn't begin as the...
...

Acknowledgements

This extension uses the Anserini package. If you use it, please be sure to cite Anserini:

@inproceedings{DBLP:conf/sigir/Yang0L17,
  author       = {Peilin Yang and
                  Hui Fang and
                  Jimmy Lin},
  title        = {Anserini: Enabling the Use of Lucene for Information Retrieval Research},
  booktitle    = {Proceedings of the 40th International {ACM} {SIGIR} Conference on
                  Research and Development in Information Retrieval, Shinjuku, Tokyo,
                  Japan, August 7-11, 2017},
  pages        = {1253--1256},
  publisher    = {{ACM}},
  year         = {2017},
  url          = {https://doi.org/10.1145/3077136.3080721},
  doi          = {10.1145/3077136.3080721}
}

This extension was built as part of the PyTerrier project. If you use it, please be sure to cite PyTerrier:

@inproceedings{DBLP:conf/cikm/MacdonaldTMO21,
  author       = {Craig Macdonald and
                  Nicola Tonellotto and
                  Sean MacAvaney and
                  Iadh Ounis},
  title        = {PyTerrier: Declarative Experimentation in Python from {BM25} to Dense
                  Retrieval},
  booktitle    = {{CIKM} '21: The 30th {ACM} International Conference on Information
                  and Knowledge Management, Virtual Event, Queensland, Australia, November
                  1 - 5, 2021},
  pages        = {4526--4533},
  publisher    = {{ACM}},
  year         = {2021},
  url          = {https://doi.org/10.1145/3459637.3482013},
  doi          = {10.1145/3459637.3482013}
}

This extension was written by Sean MacAvaney at the University of Glasgow and was based on an original implementation that was part of PyTerrier, written by Craig Macdonald. Check out the GitHub for a full list of contributors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyterrier_anserini-0.2.0.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyterrier_anserini-0.2.0-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file pyterrier_anserini-0.2.0.tar.gz.

File metadata

  • Download URL: pyterrier_anserini-0.2.0.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pyterrier_anserini-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f953b1d05677e1759d1971b55dcdba52b12998b6ff492a5bbd28ee517234b77e
MD5 bb19a5525466857099d18acd11f3845c
BLAKE2b-256 4bd182fa8b7c2f6a61253564495701ff1a7feb9f17bef6ed2cf615e25b3535e6

See more details on using hashes here.

File details

Details for the file pyterrier_anserini-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pyterrier_anserini-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff13092a15c2e4690acf8cf8ae11b304cc35f315e3d5b6e3884d5a8c7c503561
MD5 87618b3f1dace6ac8ee76e4a1c5a55b8
BLAKE2b-256 56ca9dc70385106184399c1af27d115a3ce2c29b652ab5b8a6d2c411bc410a77

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page