Python interface to the Anserini IR toolkit built on Lucene
Project description
Pyserini provides a simple Python interface to the Anserini IR toolkit via pyjnius.
Installation
Install via PyPI
pip install pyserini
Usage
Here's a sample pre-built index on TREC Disks 4 & 5 to play with (used in the TREC 2004 Robust Track):
wget https://git.uwaterloo.ca/jimmylin/anserini-indexes/raw/master/index-robust04-20191213.tar.gz
tar xvfz index-robust04-20191213.tar.gz
Use the SimpleSearcher
for searching:
from pyserini.search import pysearch
searcher = pysearch.SimpleSearcher('index-robust04-20191213/')
hits = searcher.search('hubble space telescope')
# Print the first 10 hits:
for i in range(0, 10):
print(f'{i+1:2} {hits[i].docid:15} {hits[i].score:.5f}')
# Grab the actual text:
hits[0].raw
For additional information, please refer to the Pyserini repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyserini-0.9.1.0.tar.gz
(57.7 MB
view hashes)
Built Distribution
pyserini-0.9.1.0-py3-none-any.whl
(57.8 MB
view hashes)
Close
Hashes for pyserini-0.9.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | adffdd7f3a49397c851c6750f829563b90aa6cfe3a994e43bf3cfe9bb98cd577 |
|
MD5 | e89b0f8f6d1eb5188c02a6af806da54b |
|
BLAKE2b-256 | 8644ca41ea4f53ac7ae7a671c1266bf0b8c387c14febc6e5c24e4d7242f6e4ed |