Skip to main content

High-performance BM25 + HNSW vector search using category theory, written in Rust

Project description

Vajra Search Engine (vajra-search)

Rust-backed search framework with a Python interface for:

  • lexical BM25 search,
  • vector ANN search (HNSW),
  • hybrid BM25 + vector fusion.

The package ships with a compiled Rust extension (vajra_search._vajra_search) and Python orchestration layers for embeddings, vector indexing, and hybrid fusion.

Installation

Base install:

pip install vajra-search

Optional embedding-model dependency (for TextEmbeddingMorphism):

pip install "vajra-search[vector]"

Search Modes

1) Lexical Search (BM25)

from vajra_search import Document, DocumentCorpus, VajraSearch

docs = [
    Document("1", "Rust for Search", "Rust enables predictable low-level performance."),
    Document("2", "BM25 Overview", "BM25 is a lexical ranking algorithm for keyword search."),
    Document("3", "Hybrid Retrieval", "Hybrid retrieval combines lexical and vector signals."),
]

corpus = DocumentCorpus(docs)
engine = VajraSearch(corpus, k1=1.5, b=0.75)

results = engine.search("bm25 keyword ranking", top_k=3)
for r in results:
    # BM25 rank from the Rust layer is zero-based; display as one-based.
    print(f"rank={r.rank + 1} id={r.doc_id} score={r.score:.4f} title={r.title}")

2) Vector Search (HNSW)

The example below uses a tiny deterministic embedder so it runs without external model downloads.

from typing import List

import numpy as np

from vajra_search import (
    Document,
    NativeHNSWIndex,
    VajraVectorSearch,
)
from vajra_search.embeddings import EmbeddingMorphism


class TinyEmbedding(EmbeddingMorphism[str]):
    """Very small keyword-count embedder for demos/tests."""

    VOCAB = ("rust", "search", "bm25", "vector")

    @property
    def dimension(self) -> int:
        return len(self.VOCAB)

    def embed(self, text: str) -> np.ndarray:
        t = text.lower()
        vec = np.array([t.count(tok) for tok in self.VOCAB], dtype=np.float32)
        norm = np.linalg.norm(vec)
        return vec / norm if norm > 0 else vec

    def embed_batch(self, texts: List[str]) -> np.ndarray:
        return np.vstack([self.embed(t) for t in texts]).astype(np.float32)


docs = [
    Document("1", "Rust Search", "Rust vector search with HNSW."),
    Document("2", "Lexical BM25", "BM25 is strong for exact keyword matching."),
    Document("3", "Vector Retrieval", "Vector search captures semantic similarity."),
]

embedder = TinyEmbedding()
index = NativeHNSWIndex(dimension=embedder.dimension, metric="cosine", max_elements=100)
vsearch = VajraVectorSearch(embedder, index)
vsearch.index_documents(docs, show_progress=False)

results = vsearch.search("vector search in rust", top_k=3)
for r in results:
    print(f"rank={r.rank} id={r.id} score={r.score:.4f} title={r.document.title}")

3) Hybrid Search (BM25 + Vector)

from typing import List

import numpy as np

from vajra_search import (
    Document,
    DocumentCorpus,
    HybridSearchEngine,
    NativeHNSWIndex,
    VajraSearch,
    VajraVectorSearch,
)
from vajra_search.embeddings import EmbeddingMorphism


class TinyEmbedding(EmbeddingMorphism[str]):
    VOCAB = ("rust", "search", "bm25", "vector")

    @property
    def dimension(self) -> int:
        return len(self.VOCAB)

    def embed(self, text: str) -> np.ndarray:
        t = text.lower()
        vec = np.array([t.count(tok) for tok in self.VOCAB], dtype=np.float32)
        norm = np.linalg.norm(vec)
        return vec / norm if norm > 0 else vec

    def embed_batch(self, texts: List[str]) -> np.ndarray:
        return np.vstack([self.embed(t) for t in texts]).astype(np.float32)


docs = [
    Document("1", "Rust HNSW", "Rust implementation of HNSW vector search."),
    Document("2", "BM25 Fundamentals", "BM25 ranks documents by lexical relevance."),
    Document("3", "Hybrid Ranking", "Hybrid ranking combines BM25 and vector signals."),
]

corpus = DocumentCorpus(docs)
bm25 = VajraSearch(corpus)

embedder = TinyEmbedding()
index = NativeHNSWIndex(dimension=embedder.dimension, metric="cosine", max_elements=100)
vector = VajraVectorSearch(embedder, index)
vector.index_documents(docs, show_progress=False)

hybrid = HybridSearchEngine(bm25, vector, alpha=0.5, method="rrf")
results = hybrid.search("rust vector search ranking", top_k=3)
for r in results:
    print(f"rank={r.rank} id={r.id} score={r.score:.4f} title={r.document.title}")

Runnable Examples

These scripts are included in-repo:

  • examples/lexical_search.py
  • examples/vector_search.py
  • examples/hybrid_search.py

Run:

python examples/lexical_search.py
python examples/vector_search.py
python examples/hybrid_search.py

API Surface (Python)

Main exports:

  • BM25: Document, DocumentCorpus, BM25Params, VajraSearch, VajraSearchParallel
  • HNSW: HnswIndex (raw Rust binding), NativeHNSWIndex (Python wrapper)
  • Vector layer: VajraVectorSearch, VectorSearchResult
  • Hybrid layer: HybridSearchEngine
  • Embeddings: TextEmbeddingMorphism, PrecomputedEmbeddingMorphism, IdentityEmbeddingMorphism

Persistence

  • Vector index persistence is exposed via:
    • NativeHNSWIndex.save(path)
    • NativeHNSWIndex.load(path)
    • VajraVectorSearch.save(path) / VajraVectorSearch.load(path, embedder, index_class)

Reproducibility and Benchmarks

  • Repro steps: reproduction.md
  • Benchmark harness and datasets are documented in the companion benchmark repos referenced from the project documentation.

Benchmark Snapshot (Python Interface)

Measured on 2026-03-02 on Darwin arm64, Python 3.13.7 using:

  • query protocol: 10 warmup + 100 measured queries (top_k=10)
  • corpus: deterministic synthetic topic-keyword documents with mixed selectivity queries (broad + selective)
  • modes benchmarked through the Python API (VajraSearch, VajraSearchParallel, VajraVectorSearch, HybridSearchEngine)
  • lexical_parallel measures per-query latency from batched search_batch execution
  • vector numbers use a tiny deterministic embedder, so they represent index-path latency (not transformer inference latency)
Size Mode Build (s) p50 (ms) p95 (ms) p99 (ms) QPS
1,000 lexical 0.006 0.019 0.113 0.114 25799.8
1,000 lexical_parallel 0.004 0.020 0.024 0.024 49387.8
1,000 vector 0.055 0.016 0.019 0.019 62366.4
1,000 hybrid 0.060 0.057 0.158 0.223 11873.4
10,000 lexical 0.048 0.244 1.410 1.865 2018.7
10,000 lexical_parallel 0.044 0.157 0.171 0.171 6664.5
10,000 vector 0.508 0.017 0.018 0.019 61373.2
10,000 hybrid 0.566 0.279 1.651 1.740 2039.0
20,000 lexical 0.091 0.517 4.436 6.013 759.2
20,000 lexical_parallel 0.089 0.281 0.331 0.331 3512.9
20,000 vector 1.005 0.017 0.039 0.060 47738.4
20,000 hybrid 1.192 0.659 4.042 4.402 785.4
50,000 lexical 0.238 1.859 12.796 13.658 237.9
50,000 lexical_parallel 0.276 1.115 1.257 1.257 882.2
50,000 vector 2.625 0.017 0.026 0.035 55942.5
50,000 hybrid 2.815 1.837 12.550 13.674 237.9

Re-run this benchmark:

./.venv/bin/python scripts/benchmark_python_modes.py --sizes 1000 10000 20000 50000

Raw outputs are written to:

  • scripts/benchmark_python_modes_latest.json
  • scripts/benchmark_python_modes_latest.md

Wikipedia Vector Benchmark Snapshot (Companion Harness)

For production-style vector benchmarking against Wikipedia embeddings (1k/10k/20k/50k) and ZVec comparison, use the companion harness documented in reproduction.md.

One 50k snapshot from that track:

Engine/Profile Build (s) p50 (ms) QPS Recall@10
ZVec 2.481 0.774 1305.2 1.000
Vajra quality 81.343 0.235 4241.8 0.998
Vajra fast 28.512 0.184 5325.9 0.948
Vajra instant 6.777 0.113 8624.7 0.706

Release Checks

Before publishing to PyPI:

./scripts/release_check.sh

This validates:

  • Python tests
  • coverage threshold (>=80%)
  • Rust workspace tests (with pinned PyO3 interpreter)
  • wheel build + clean-venv install/import smoke test

Release Process

  • Tag push (v*) builds cross-platform wheels/sdist and creates a GitHub Release.
  • PyPI upload is a separate manual action via GitHub workflow (Publish PyPI), using the chosen tag.
  • Full runbook: RELEASING.md

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vajra_search-0.2.0.tar.gz (70.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vajra_search-0.2.0-cp314-cp314-macosx_11_0_arm64.whl (933.0 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

File details

Details for the file vajra_search-0.2.0.tar.gz.

File metadata

  • Download URL: vajra_search-0.2.0.tar.gz
  • Upload date:
  • Size: 70.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for vajra_search-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f1f8697d85884ba9f762aafd5f7ed20b4407843bd8b4ef663851ff84f83f668b
MD5 d9b49f43157cd31b2f04467e4adceef9
BLAKE2b-256 e3ebe2f9a6f603d68f5b558257a16069e8fe375dd922906f5731f4a53568055c

See more details on using hashes here.

File details

Details for the file vajra_search-0.2.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for vajra_search-0.2.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1d147683d073cfdaf3601dc9bcc873e327e18d4950e64bed5b1fd4198651af47
MD5 93fde29ecad53e79a15465a014b34670
BLAKE2b-256 69757e700283ae9be4b164a2f9c5841596984680c182621f8692b850fee0f263

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page