Skip to main content

Python interface to the Anserini IR toolkit built on Lucene

Project description

Pyserini provides a simple Python interface to the Anserini IR toolkit via pyjnius.

Installation

Install via PyPI

pip install pyserini

Usage

Here's a sample pre-built index on TREC Disks 4 & 5 to play with (used in the TREC 2004 Robust Track):

wget https://git.uwaterloo.ca/jimmylin/anserini-indexes/raw/master/index-robust04-20191213.tar.gz
tar xvfz index-robust04-20191213.tar.gz

Use the SimpleSearcher for searching:

from pyserini.search import pysearch

searcher = pysearch.SimpleSearcher('index-robust04-20191213/')
hits = searcher.search('hubble space telescope')

# Prints the first 10 hits
for i in range(0, 10):
    print('{} {} {}'.format(i+1, hits[i].docid, hits[i].score))

# Grab the actual text
hits[0].content

For additional information, please refer to the Pyserini repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyserini-0.7.1.0.tar.gz (53.7 MB view hashes)

Uploaded Source

Built Distribution

pyserini-0.7.1.0-py3-none-any.whl (53.7 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page