Skip to main content

toolkit for creating and searching sparse representations

Project description

PyPI version fury.io License: MIT Worfklow Ruff

bsparse

bsparse is a toolkit for creating and searching learned sparse representations

Usage examples

# Recommended way to install requirements:
# (using pip only works too, but uv is much faster)
pipx install uv
# Create virtual environment
uv venv venv
# Activate
source venv/bin/activate
# Install requirements
uv pip install -r requirements.txt
# Request access to splade-v3: https://huggingface.co/naver/splade-v3
# Get your huggingface API token and then:
export HF_TOKEN="the token"

# load Python virtual environment
source venv/bin/activate

# optional: spot check output from a model
python -m bsparse.cli check --text "tesla net worth"

# create query representations:
python -m bsparse.cli encode --out nfcorpus-queries.jsonl \
  --dataset irds --type query --name beir/nfcorpus  --batch-size 64

# create doc representations:
python -m bsparse.cli encode --out nfcorpus-docs.jsonl \
  --dataset irds --type doc --name beir/nfcorpus  --batch-size 64

# search and evaluate without building an index:
python -m bsparse.cli memsearch --out nfcorpus.run --docs nfcorpus-docs.jsonl --queries nfcorpus-queries.jsonl --qrels beir/nfcorpus/test


# alternatively, you can build an index and search it

# 1) setup: compile ScaledJsonVectorCollection.java and add it to anserini-1.0.0-fatjar.jar
$ wget -c https://repo1.maven.org/maven2/io/anserini/anserini/1.0.0/anserini-1.0.0-fatjar.jar
$ cd java
$ javac -cp ../anserini-1.0.0-fatjar.jar io/anserini/collection/*.java
$ cp ../anserini-1.0.0-fatjar.jar ../anserini-1.0.0-fatjar-bsparse.jar
$ jar uf ../anserini-1.0.0-fatjar-bsparse.jar io/anserini/collection/*.class

# 2) build index
java -cp anserini-1.0.0-fatjar-AY.jar  io.anserini.index.IndexCollection \
  -generator DefaultLuceneDocumentGenerator -impact -pretokenized \
  -threads 16 -collection ScaledJsonVectorCollection \
  -input /path/to/encoded-text -index /path/to/encoded-text-index

# 3) search index
# Create sparse query representations in `$QUERY_VECTORS` and create an index in `$INDEX`, then:
python -m bsparse.cli search --index $INDEX --queries $QUERY_VECTORS --out test.run --topk 1000

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bsparse-0.1.0.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bsparse-0.1.0-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file bsparse-0.1.0.tar.gz.

File metadata

  • Download URL: bsparse-0.1.0.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bsparse-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5d775bc8095c5e981323eb58929dca4642228e0af5138e1783f357f8aaa221d6
MD5 852d3b7a612ecc00a475d21f7767c8ef
BLAKE2b-256 b3c43e031d443c26f1e8b0e2bae6047b6ea04d8eb4d74cd2a7455098317722ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for bsparse-0.1.0.tar.gz:

Publisher: publish-release.yml on hltcoe/bsparse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bsparse-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bsparse-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bsparse-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c512cfd44d51265f1504401cc9f037c321a162f171a3bd34dd3e18b47490ca75
MD5 8b4281ac3788fa5d50da456b08605198
BLAKE2b-256 a4d738d967459d8215d7205a13487f78c0563ea00f86e5827f42584937d551d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for bsparse-0.1.0-py3-none-any.whl:

Publisher: publish-release.yml on hltcoe/bsparse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page