Redis-backed Locality Sensitive Hashing toolkit for fast approximate nearest neighbor search

These details have not been verified by PyPI

Project links

Project description

LSHRS

Redis-backed locality-sensitive hashing toolkit that stores bucket membership in Redis while keeping the heavy vector payloads in your primary datastore.

Positioning
How It Works
Installation
Quick Start
Ingestion
Querying
Persistence & Lifecycle
API Surface
Benchmarks
Development & Testing
License

Positioning

LSHRS adds cosine approximate-nearest-neighbor (ANN) search on top of the datastore you already run. Redis holds only a compact bucket -> set-of-indices index — typically much smaller than the vectors it indexes, with the gap widening as embedding dimensionality grows. The vectors themselves stay external in your system of record (PostgreSQL/pgvector, Parquet, an object store, anything) and are fetched on demand, only when you opt into cosine reranking via vector_fetch_fn.

See docs/positioning.md for the full comparison (LSHRS vs datasketch vs RediSearch HNSW vs pgvector vs FAISS) and when not to use LSHRS.

How It Works

LSHRS runs the full LSH workflow: hash vectors into banded random-projection signatures, store only bucket membership in Redis for low-latency candidate enumeration, then optionally rerank candidates by cosine similarity using vectors fetched from your system of record. It auto-selects bands/rows, pipelines Redis operations, and exposes hooks for streaming ingestion, persistence, and maintenance.

Concern	Component
Hashing	`LSHHasher` — banded random-projection signatures
Storage	`RedisStorage` — bucket membership via Redis sets + pipelines
Ingestion	`LSHRS.create_signatures()` — streams from PostgreSQL or Parquet
Reranking	`top_k_cosine()` — cosine similarity for candidates
Configuration	`get_optimal_config()` — bands/rows for a target threshold

Installation

pip install lshrs              # core
pip install 'lshrs[postgres]'  # + PostgreSQL streaming (psycopg)
pip install 'lshrs[parquet]'   # + Parquet ingestion (pyarrow)

From source:

git clone https://github.com/mxngjxa/lshrs.git
cd lshrs
uv sync --dev

[!NOTE] Requires Python >= 3.10 (see pyproject.toml).

Quick Start

import numpy as np
from lshrs import LSHRS

def fetch_vectors(indices: list[int]) -> np.ndarray:
    # Replace with your vector store retrieval (PostgreSQL, disk, object store, etc.)
    embeddings = np.load("vectors.npy")
    return embeddings[indices]

lsh = LSHRS(
    dim=768,
    num_perm=256,
    redis_host="localhost",
    redis_prefix="demo",
    vector_fetch_fn=fetch_vectors,
)

# Stream index construction from PostgreSQL
lsh.create_signatures(
    format="postgres",
    dsn="postgresql://user:pass@localhost/db",
    table="documents",
    index_column="doc_id",
    vector_column="embedding",
)

# Insert an ad-hoc document
lsh.ingest(42, np.random.randn(768).astype(np.float32))

# Retrieve candidates
query = np.random.randn(768).astype(np.float32)
top10 = lsh.get_top_k(query, topk=10)         # fast collision lookup -> List[int]
reranked = lsh.get_above_p(query, p=0.2)      # cosine-reranked -> List[(index, score)]

Ingestion

LSHRS.create_signatures() streams batches from PostgreSQL (iter_postgres_vectors(), server-side cursors) or Parquet (iter_parquet_vectors()). Tune batch_size, filter with where_clause, or pass a custom connection_factory for pooling/TLS.
LSHRS.index() ingests in-memory batches; LSHRS.ingest() handles single realtime updates. Both coalesce writes through RedisStorage.batch_add().

[!IMPORTANT] Install pyarrow before using the Parquet loader, or iter_parquet_vectors() raises ImportError.

Querying

LSHRS.query() offers two retrieval modes:

Mode	When	Result
Top-k (`top_p=None`)	Latency-critical, coarse candidates	`List[int]` ordered by band collisions
Top-p (`top_p=0.0–1.0`)	Precision-sensitive, rerank by cosine	`List[Tuple[int, float]]` of `(index, similarity)`

Convenience wrappers: get_top_k() and get_above_p().

[!CAUTION] Top-p reranking requires vector_fetch_fn at construction; otherwise it raises RuntimeError.

Persistence & Lifecycle

Operation	Reference
Inspect runtime config / Redis namespace	`LSHRS.stats()`
Clear all buckets for the prefix (irreversible)	`LSHRS.clear()`
Hard-delete specific indices	`LSHRS.delete()`
Save / restore projection matrices	`save_to_disk()` / `load_from_disk()`

Because LSHRS stores only bucket -> set-of-indices membership, LSHRS.delete() makes per-id removal cheap and exact — an id is dropped from its bands with no rebuild or compaction, which suits GDPR / right-to-be-forgotten workflows. See docs/positioning.md for details.

[!WARNING] LSHRS.clear() deletes every key with the configured prefix. Back up with save_to_disk() first if you need to rebuild.

API Surface

Area	Primary Entry Point
Bulk streaming ingestion	`LSHRS.create_signatures()`
Batch / single ingestion	`LSHRS.index()` / `LSHRS.ingest()`
Search with optional reranking	`LSHRS.query()`
Hash persistence	`save_to_disk()` / `load_from_disk()`
Redis maintenance	`RedisStorage.clear()` / `RedisStorage.remove_indices()`
Probability utilities	`compute_collision_probability()` / `compute_false_rates()`

Benchmarks

A reproducible benchmark harness lives in benchmarks/. It generates synthetic random unit vectors and measures index build throughput, top-k / top-p query latency (p50/p95), the Redis footprint ratio (raw N x dim x 4 bytes vs the bucket index — the ratio grows with dim), and delete latency.

# Runs anywhere via in-memory fakeredis (functional timings, estimated footprint)
uv run python benchmarks/run_benchmark.py --n 5000 --dim 256 --num-perm 128

# Use a real Redis server for measured memory (MEMORY USAGE / INFO memory)
uv run python benchmarks/run_benchmark.py --real-redis --redis-host localhost

Results print as a markdown table and are written to benchmarks/RESULTS_TEMPLATE.md. See benchmarks/README.md for full details. All numbers are machine-dependent.

[!TIP] get_optimal_config() picks bands/rows for a target similarity threshold. Large buckets signal low selectivity — raise num_perm or the threshold to trade recall for precision. Inspect distribution with SCAN 0 MATCH lsh:*.

Development & Testing

git clone https://github.com/mxngjxa/lshrs.git
cd lshrs
uv sync --dev
uv run pytest                  # tests
uv run ruff check .            # lint
uv run ruff format --check .   # format check

License

Licensed under the terms of LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0b1 pre-release

Jun 8, 2026

0.1.1b2 pre-release

Feb 20, 2026

0.1.1b1 pre-release

Nov 28, 2025

0.1.1a4 pre-release

Nov 17, 2025

0.1.1a1 pre-release

Nov 16, 2025

0.0.0a1 pre-release

Oct 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lshrs-0.2.0b1.tar.gz (152.8 kB view details)

Uploaded Jun 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lshrs-0.2.0b1-py3-none-any.whl (52.5 kB view details)

Uploaded Jun 8, 2026 Python 3

File details

Details for the file lshrs-0.2.0b1.tar.gz.

File metadata

Download URL: lshrs-0.2.0b1.tar.gz
Upload date: Jun 8, 2026
Size: 152.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lshrs-0.2.0b1.tar.gz
Algorithm	Hash digest
SHA256	`60791db40f7d035f21f243069ca2e60a4269d3830274ce3ad7e6b3c8ee8a03ae`
MD5	`7360608aa78c79853767e228aa71d995`
BLAKE2b-256	`1b036d8129e8925f30a337c91b3ded7d587b1ac48d0aaad96b26ed05c298754c`

See more details on using hashes here.

File details

Details for the file lshrs-0.2.0b1-py3-none-any.whl.

File metadata

Download URL: lshrs-0.2.0b1-py3-none-any.whl
Upload date: Jun 8, 2026
Size: 52.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lshrs-0.2.0b1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`838063bf293659465ecaa0cf8dd0ac9e35d92ca7c526472e1b4810c5df8d0999`
MD5	`1a9af5eb9725afe4596fc06c328d547f`
BLAKE2b-256	`8a554bb5133679b635fc7e4322d9ad75f0329e9efc349a2d22d3153842006988`

See more details on using hashes here.

lshrs 0.2.0b1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LSHRS

Table of Contents

Positioning

How It Works

Installation

Quick Start

Ingestion

Querying

Persistence & Lifecycle

API Surface

Benchmarks

Development & Testing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes