Python interface to the Anserini IR toolkit built on Lucene
Project description
Pyserini is a Python toolkit designed to support replicable information retrieval research. It provides sparse retrieval (e.g., BM25 ranking using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well hybrid retrieval that integrates both approaches.
Installation
Install via PyPI:
pip install pyserini
Pyserini requires Python 3.6+.
Usage
The SimpleSearcher
class provides the entry point for sparse retrieval using bag-of-words representations.
Anserini supports a number of pre-built indexes for common collections that it'll automatically download for you and store in ~/.cache/pyserini/indexes/
.
Here's how to use a pre-built index for the MS MARCO passage ranking task and issue a query interactively (using BM25 ranking):
from pyserini.search import SimpleSearcher
searcher = SimpleSearcher.from_prebuilt_index('msmarco-passage')
hits = searcher.search('what is a lobster roll?')
for i in range(0, 10):
print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')
The results should be as follows:
1 7157707 11.00830
2 6034357 10.94310
3 5837606 10.81740
4 7157715 10.59820
5 6034350 10.48360
6 2900045 10.31190
7 7157713 10.12300
8 1584344 10.05290
9 533614 9.96350
10 6234461 9.92200
For information on dense and hybrid retrieval, as well as complete documentation, please refer to the Pyserini repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyserini-0.11.0.0.tar.gz
.
File metadata
- Download URL: pyserini-0.11.0.0.tar.gz
- Upload date:
- Size: 67.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.6.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ff93e607f570f0643321cb073aa571ee298a7eec20b21c4e13812e16a75a65b |
|
MD5 | 36fb1791a2ce6d0b503c7b693f4ded21 |
|
BLAKE2b-256 | 94439a0354a34e3a243cb6cff814e8fc6572b85231477d6675f1f3b01805d777 |
Provenance
File details
Details for the file pyserini-0.11.0.0-py3-none-any.whl
.
File metadata
- Download URL: pyserini-0.11.0.0-py3-none-any.whl
- Upload date:
- Size: 67.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.6.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ba4ef6da27733a36583389d7ad952dc738597f733de7ec0148f92852bda712d |
|
MD5 | 124b889dd059d11fc2544f2ae9fdd781 |
|
BLAKE2b-256 | 8b1825db00ae1690e5d4f1d6f9c4528d27e31d060d3a468807b33d1bb000c002 |