A tiny, SQLite-backed search library for small, local projects

These details have not been verified by PyPI

Project description

sqlitesearch

A tiny, SQLite-backed search library for small-scale projects with up to 100,000 documents. It provides text search, vector search, and hybrid search - all stored in a single .db file with zero infrastructure.

sqlitesearch is a persistent sibling of minsearch - same API, but stores data on disk.

Installation

uv add sqlitesearch

Text Search

Text search uses SQLite's FTS5 (Full-Text Search) extension with BM25 ranking.

Basic Usage

from sqlitesearch import TextSearchIndex

# Create an index
index = TextSearchIndex(
    text_fields=["title", "description"],
    keyword_fields=["category"],
    db_path="search.db"
)

# Index documents in bulk
documents = [
    {"id": 1, "title": "Python Tutorial", "description": "Learn Python basics", "category": "tutorial"},
    {"id": 2, "title": "Java Guide", "description": "Java programming guide", "category": "guide"},
]
index.fit(documents)

# Or add one at a time
index.add({"id": 3, "title": "Advanced Python", "description": "Deep dive into Python", "category": "tutorial"})

# Search
results = index.search("python programming")
for result in results:
    print(result["title"], result["score"])

Filtering

# Filter by keyword fields
results = index.search("python", filter_dict={"category": "tutorial"})

# Filter by numeric range
results = index.search("python", filter_dict={"price": [('>=', 50), ('<', 200)]})

# Exact numeric match
results = index.search("python", filter_dict={"price": 100})

# Filter by date range
from datetime import date
results = index.search("python", filter_dict={
    "created_at": [('>=', date(2024, 1, 1)), ('<', date(2024, 12, 31))]
})

Field Boosting

# Boost title matches higher than description
results = index.search("python", boost_dict={"title": 2.0, "description": 1.0})

Tokenizer & Stemming

sqlitesearch uses a Tokenizer class for query processing (same interface as minsearch.Tokenizer). By default, English stop words are removed.

from sqlitesearch import TextSearchIndex, Tokenizer

# Built-in Porter stemming: "running" matches "run", "courses" matches "course"
index = TextSearchIndex(
    text_fields=["title", "description"],
    stemming=True,  # disabled by default to match minsearch behavior
    db_path="search.db"
)

# Custom tokenizer: no stop words
index = TextSearchIndex(
    text_fields=["title", "description"],
    tokenizer=Tokenizer(),
    db_path="search.db"
)

# Custom tokenizer: custom stop words + custom stemmer (any callable(str) -> str)
from minsearch.stemmers import porter_stemmer  # pip install minsearch

index = TextSearchIndex(
    text_fields=["title", "description"],
    tokenizer=Tokenizer(stop_words={"custom", "words"}, stemmer=porter_stemmer),
    db_path="search.db"
)

Custom ID Field

index = TextSearchIndex(
    text_fields=["title", "description"],
    id_field="doc_id",
    db_path="search.db"
)

results = index.search("python", output_ids=True)
# Results will include 'id' field with the doc_id value

Vector Search

Vector search supports three modes for approximate nearest neighbor search, all followed by exact cosine similarity reranking:

Mode	Best for	How it works
LSH (default)	Up to 100K vectors	Random hyperplane projections + bucket lookup
IVF	10K-500K vectors	K-means clustering + nearest-cluster probe
HNSW	10K-1M+ vectors	Hierarchical proximity graph traversal

LSH (default)

Each vector is hashed into one bucket per table using random hyperplane projections. At query time, LSH looks up buckets matching the query's hash to find candidates, then reranks them by exact cosine similarity. With n_probe > 0 (multi-probe), it also checks neighboring buckets that differ by 1 or 2 bits — this dramatically improves recall because similar vectors that landed in an adjacent bucket (due to one projection going the other way) are still found.

import numpy as np
from sqlitesearch import VectorSearchIndex

index = VectorSearchIndex(
    keyword_fields=["category"],
    n_tables=8,      # Number of hash tables (more = better recall)
    hash_size=16,    # Bits per hash (more = better precision)
    n_probe=2,       # Multi-probe bit flips (0-2, higher = better recall)
    db_path="vectors.db"
)

vectors = np.random.rand(100, 384)
documents = [{"category": "test"} for _ in range(100)]
index.fit(vectors, documents)

query = np.random.rand(384)
results = index.search(query)

IVF (Inverted File Index)

Clusters vectors using k-means, then searches only the nearest clusters at query time. Good balance of build speed and recall.

index = VectorSearchIndex(
    mode="ivf",
    n_clusters=None,        # Auto-scales (sqrt(n), capped at 256)
    n_probe_clusters=8,     # Clusters to search (more = better recall, slower)
    db_path="vectors.db"
)

HNSW (Hierarchical Navigable Small World)

Builds a multi-layer proximity graph. Highest recall and fastest search, but slower to build.

index = VectorSearchIndex(
    mode="hnsw",
    m=16,                   # Max connections per node (more = better recall)
    ef_construction=200,    # Build-time beam width (more = better graph)
    ef_search=50,           # Search-time beam width (more = better recall)
    db_path="vectors.db"
)

Filtering works the same as text search - see the Filtering section.

Hybrid Search

Text and vector indexes can share the same database file, enabling hybrid search.

from sqlitesearch import TextSearchIndex, VectorSearchIndex

text_index = TextSearchIndex(text_fields=["title", "description"], db_path="hybrid.db")
vector_index = VectorSearchIndex(db_path="hybrid.db")

text_results = text_index.search("python tutorial")
vector_results = vector_index.search(query_vector)

# Combine and deduplicate results based on your ranking strategy

Index Management

Both index types automatically persist to disk. Reopen an existing index by creating it with the same db_path - it's ready to search immediately. Use index.clear() to remove all documents.

When to Use

sqlitesearch is ideal when you want:

Zero infrastructure (no external services)
Data persistence across restarts
Real search functionality for pet projects, demos, or prototypes
Simple deployment (just a Python file and a .db file)

Use case	Recommendation
In-memory / experiments	minsearch (e.g., in notebooks)
Local projects, up to 100K docs	sqlitesearch
Production / high traffic / 1M+	Elasticsearch, Qdrant, Milvus, etc.

Benchmarks

We benchmarked sqlitesearch on Simple English Wikipedia (291K articles) for text search and the Cohere-1M dataset (768d vectors) for vector search.

Type	1K	10K	100K
Text search QPS	970	604	179
Text search latency	1ms	2ms	6ms
Vector search QPS	333	39	6
Vector search latency	3ms	26ms	181ms
Vector recall@100	0.65	0.97	0.89

Vector search uses multi-probe LSH (n_probe=2) with in-memory vector cache for reranking. At 100K, recall (0.89) is competitive with cloud vector databases like ElasticCloud (0.90). For higher recall, use n_tables=16 (0.95 recall). See benchmark/WRITEUP.md for full results, recall tuning, and VDBBench leaderboard comparison.

Architecture

Everything lives in a single SQLite database file. Text search uses FTS5 with BM25 ranking. Vector search uses Locality-Sensitive Hashing (LSH) with random projections for fast candidate retrieval, followed by exact cosine similarity reranking via NumPy. No separate server process, no network communication - SQLite runs inside your Python process, reading and writing directly to the file.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.5

May 18, 2026

This version

0.0.4

May 11, 2026

0.0.3

Feb 26, 2026

0.0.2

Feb 7, 2026

0.0.1

Feb 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlitesearch-0.0.4.tar.gz (233.1 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sqlitesearch-0.0.4-py3-none-any.whl (31.4 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file sqlitesearch-0.0.4.tar.gz.

File metadata

Download URL: sqlitesearch-0.0.4.tar.gz
Upload date: May 11, 2026
Size: 233.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: Hatch/1.16.3 cpython/3.13.12 HTTPX/0.28.1

File hashes

Hashes for sqlitesearch-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`55109bf406422820b6f4108acc11124e001c15ce4a9db33c319008a88fd61b7c`
MD5	`0e33897792edc36e9b6b89cb5ca3ea1e`
BLAKE2b-256	`1e601128ccfa1896b21e6718a1c4bbd0206db3223b7e4f55d3eafc146067cd1d`

See more details on using hashes here.

File details

Details for the file sqlitesearch-0.0.4-py3-none-any.whl.

File metadata

Download URL: sqlitesearch-0.0.4-py3-none-any.whl
Upload date: May 11, 2026
Size: 31.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: Hatch/1.16.3 cpython/3.13.12 HTTPX/0.28.1

File hashes

Hashes for sqlitesearch-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2190ed8d833afa01d9f236131803b33087e0e6927cf59d8672787a2174aa9e4f`
MD5	`0751f05c951c2f201e7085cfb3a41734`
BLAKE2b-256	`bd13e8a812db2654144160996d97b3722c7262e5af7bec6178ef0fd22ab18843`

See more details on using hashes here.

sqlitesearch 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

sqlitesearch

Installation

Text Search

Basic Usage

Filtering

Field Boosting

Tokenizer & Stemming

Custom ID Field

Vector Search

LSH (default)

IVF (Inverted File Index)

HNSW (Hierarchical Navigable Small World)

Hybrid Search

Index Management

When to Use

Benchmarks

Architecture

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes