Skip to main content

Python interface to the Anserini IR toolkit built on Lucene

Project description

Pyserini is a Python toolkit designed to support replicable information retrieval research. It provides sparse retrieval (e.g., BM25 ranking using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well hybrid retrieval that integrates both approaches.

Installation

Install via PyPI:

pip install pyserini

Pyserini requires Python 3.6+.

Usage

The SimpleSearcher class provides the entry point for sparse retrieval using bag-of-words representations. Anserini supports a number of pre-built indexes for common collections that it'll automatically download for you and store in ~/.cache/pyserini/indexes/. Here's how to use a pre-built index for the MS MARCO passage ranking task and issue a query interactively (using BM25 ranking):

from pyserini.search import SimpleSearcher

searcher = SimpleSearcher.from_prebuilt_index('msmarco-passage')
hits = searcher.search('what is a lobster roll?')

for i in range(0, 10):
    print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')

The results should be as follows:

 1 7157707 11.00830
 2 6034357 10.94310
 3 5837606 10.81740
 4 7157715 10.59820
 5 6034350 10.48360
 6 2900045 10.31190
 7 7157713 10.12300
 8 1584344 10.05290
 9 533614  9.96350
10 6234461 9.92200

For information on dense and hybrid retrieval, as well as complete documentation, please refer to the Pyserini repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyserini-0.11.0.0.tar.gz (67.0 MB view details)

Uploaded Source

Built Distribution

pyserini-0.11.0.0-py3-none-any.whl (67.1 MB view details)

Uploaded Python 3

File details

Details for the file pyserini-0.11.0.0.tar.gz.

File metadata

  • Download URL: pyserini-0.11.0.0.tar.gz
  • Upload date:
  • Size: 67.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.6.12

File hashes

Hashes for pyserini-0.11.0.0.tar.gz
Algorithm Hash digest
SHA256 4ff93e607f570f0643321cb073aa571ee298a7eec20b21c4e13812e16a75a65b
MD5 36fb1791a2ce6d0b503c7b693f4ded21
BLAKE2b-256 94439a0354a34e3a243cb6cff814e8fc6572b85231477d6675f1f3b01805d777

See more details on using hashes here.

Provenance

File details

Details for the file pyserini-0.11.0.0-py3-none-any.whl.

File metadata

  • Download URL: pyserini-0.11.0.0-py3-none-any.whl
  • Upload date:
  • Size: 67.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.6.12

File hashes

Hashes for pyserini-0.11.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7ba4ef6da27733a36583389d7ad952dc738597f733de7ec0148f92852bda712d
MD5 124b889dd059d11fc2544f2ae9fdd781
BLAKE2b-256 8b1825db00ae1690e5d4f1d6f9c4528d27e31d060d3a468807b33d1bb000c002

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page