A Python toolkit for retrieval with relational database management systems.
Project description
QuackIR
QuackIR is a toolkit for reproducible information retrieval research with relational database management systems. Sparse retrieval is available with DuckDB, SQLite, and PostgreSQL. Dense and hybrid retrieval are available with DuckDB and PostgreSQL. Analysis with the porter tokenizer is provided via wrapping Pyserini's Lucene analyzer.
Installation
Clone Repository
git clone https://github.com/castorini/quackir.git --recurse-submodules
Install Dependencies
conda create -n quackir python=3.10
conda activate quackir
conda install -c conda-forge postgresql pgvector openjdk=21 maven -y
pip install -r requirements.txt
Initialize PostgreSQL
initdb -D mydb
pg_ctl -D mydb -l logfile start &
createdb quackir
psql quackir
create user postgres superuser;
create extension vector;
\q
Quick Start
To create a sparse index with DuckDB:
from quackir.index import DuckDBIndexer
from quackir import IndexType
table_name = "corpus"
index_type = IndexType.SPARSE
indexer = DuckDBIndexer()
indexer.init_table(table_name, index_type)
indexer.load_table(table_name, corpus_file)
indexer.fts_index(table_name)
indexer.close()
To perform sparse retrieval:
from quackir.search import DuckDBSearcher
from quackir import SearchType
table_name = "corpus"
query = "what is a lobster roll"
search_type = SearchType.SPARSE
searcher = DuckDBSearcher()
results = searcher.search(
search_type, query_string=query, table_names=[table_name]
)
print(results)
searcher.close()
For using commands, see the documentation.
Reproduce
For step-by-step reproduction of BEIR experiments, see these docs.
To reproduce all BEIR experiments, run the following command and find the results in logs:
bash ./scripts/beir/run.sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quackir-0.0.1.tar.gz.
File metadata
- Download URL: quackir-0.0.1.tar.gz
- Upload date:
- Size: 22.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f0d787fb9acc5984526c87a3b8e044867821a359dde70e0bbb829c75c6b2670
|
|
| MD5 |
959afce4d8225a6fe824ceccf3367497
|
|
| BLAKE2b-256 |
a9507640e3f33f17baca284aafe177adcecf1fc9fc3b8edee9599682104910c9
|
File details
Details for the file quackir-0.0.1-py3-none-any.whl.
File metadata
- Download URL: quackir-0.0.1-py3-none-any.whl
- Upload date:
- Size: 39.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad9ee212e053eff1fb8f14148bcaca7a9c519d617eba780e4b8b76a0c58ca07a
|
|
| MD5 |
f639aca65cd56f304ff5e4552ac51bc3
|
|
| BLAKE2b-256 |
6868e72b8a56108107387051964bfd233d9afedbd66dc90dedd4faa126f27da3
|