Skip to main content

Python interface to the Anserini IR toolkit built on Lucene

Project description

Pyserini provides a simple Python interface to the Anserini IR toolkit via pyjnius.

Installation

Install via PyPI

pip install pyserini

Fetch the Anserini fatjar from Maven Central:

wget -O anserini-0.6.0-fatjar.jar https://search.maven.org/remotecontent?filepath=io/anserini/anserini/0.6.0/anserini-0.6.0-fatjar.jar

Set the environment variable ANSERINI_CLASSPATH to the directory where the fatjar is located:

export ANSERINI_CLASSPATH="/path/to/fatjar/directory"

Here's a sample pre-built index on TREC Disks 4 & 5 to play with (used in the TREC 2004 Robust Track):

wget https://www.dropbox.com/s/mdoly9sjdalh44x/lucene-index.robust04.pos%2Bdocvectors%2Brawdocs.tar.gz
tar xvfz lucene-index.robust04.pos+docvectors+rawdocs.tar.gz

Usage

Use the SimpleSearcher for searching:

from pyserini.search import pysearch

searcher = pysearch.SimpleSearcher('lucene-index.robust04.pos+docvectors+rawdocs')
hits = searcher.search('hubble space telescope')

# Prints the first 10 hits
for i in range(0, 10):
    print('{} {} {}'.format(i+1, hits[i].docid, hits[i].score))

# Grab the actual text
hits[0].content

Configure BM25 parameters and use RM3 query expansion:

searcher.set_bm25_similarity(0.9, 0.4)
searcher.set_rm3_reranker(10, 10, 0.5)

hits2 = searcher.search('hubble space telescope')

# Prints the first 10 hits
for i in range(0, 10):
    print('{} {} {}'.format(i+1, hits2[i].docid, hits2[i].score))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyserini-0.6.0.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyserini-0.6.0.0-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file pyserini-0.6.0.0.tar.gz.

File metadata

  • Download URL: pyserini-0.6.0.0.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.6.7

File hashes

Hashes for pyserini-0.6.0.0.tar.gz
Algorithm Hash digest
SHA256 c15598ff7da8d155aa8f5df1898365b192b4ce8650890e6294e744c8da0996a2
MD5 3c0353433acce07fd8d621d857c165e8
BLAKE2b-256 a5e7b583bd9d959ea263ba8caf08bf2a45f51943aaa49054cb50a088ee3af5ab

See more details on using hashes here.

File details

Details for the file pyserini-0.6.0.0-py3-none-any.whl.

File metadata

  • Download URL: pyserini-0.6.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.6.7

File hashes

Hashes for pyserini-0.6.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b0ffeefeccccc0d108ff12b78a658b1f4687ed8ef34c0f996088566ca71af36c
MD5 e2f00d28f9b3df0ebdbfa690e5d847f3
BLAKE2b-256 228b8a504df397c9a59c4ccfc1357cff95dff48ddfa0370532f42888c2ecb211

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page