Skip to main content

A Python toolkit for retrieval with relational database management systems.

Project description

QuackIR

LICENSE

QuackIR is a toolkit for reproducible information retrieval research with relational database management systems. Sparse retrieval is available with DuckDB, SQLite, and PostgreSQL. Dense and hybrid retrieval are available with DuckDB and PostgreSQL. Analysis with the porter tokenizer is provided via wrapping Pyserini's Lucene analyzer.

Installation

Clone Repository

git clone https://github.com/castorini/quackir.git --recurse-submodules

Install Dependencies

conda create -n quackir python=3.10
conda activate quackir
conda install -c conda-forge postgresql pgvector openjdk=21 maven -y
pip install -r requirements.txt

Initialize PostgreSQL

initdb -D mydb
pg_ctl -D mydb -l logfile start &
createdb quackir
psql quackir
create user postgres superuser;
create extension vector;
\q

Quick Start

To create a sparse index with DuckDB:

from quackir.index import DuckDBIndexer
from quackir import IndexType

table_name = "corpus"
index_type = IndexType.SPARSE

indexer = DuckDBIndexer()
indexer.init_table(table_name, index_type)
indexer.load_table(table_name, corpus_file)
indexer.fts_index(table_name)

indexer.close()

To perform sparse retrieval:

from quackir.search import DuckDBSearcher
from quackir import SearchType

table_name = "corpus"
query = "what is a lobster roll"
search_type = SearchType.SPARSE

searcher = DuckDBSearcher()
results = searcher.search(
    search_type, query_string=query, table_names=[table_name]
)
print(results)

searcher.close()

For using commands, see the documentation.

Reproduce

For step-by-step reproduction of BEIR experiments, see these docs.

To reproduce all BEIR experiments, run the following command and find the results in logs:

bash ./scripts/beir/run.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quackir-0.0.1.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quackir-0.0.1-py3-none-any.whl (39.4 kB view details)

Uploaded Python 3

File details

Details for the file quackir-0.0.1.tar.gz.

File metadata

  • Download URL: quackir-0.0.1.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for quackir-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9f0d787fb9acc5984526c87a3b8e044867821a359dde70e0bbb829c75c6b2670
MD5 959afce4d8225a6fe824ceccf3367497
BLAKE2b-256 a9507640e3f33f17baca284aafe177adcecf1fc9fc3b8edee9599682104910c9

See more details on using hashes here.

File details

Details for the file quackir-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: quackir-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 39.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for quackir-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ad9ee212e053eff1fb8f14148bcaca7a9c519d617eba780e4b8b76a0c58ca07a
MD5 f639aca65cd56f304ff5e4552ac51bc3
BLAKE2b-256 6868e72b8a56108107387051964bfd233d9afedbd66dc90dedd4faa126f27da3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page