Skip to main content

Embedded, local, open-source hybrid search for AI agents — SQLite + FTS5 + ChromaDB with self-healing journal

Project description

HybridDB

Purposefully built for AI Agents. HybridDB gives agents persistent, searchable memory — every conversation turn is indexed and retrievable via keyword, vector, or hybrid search. Used in production by the Executive Assistant agent system.

Embedded. Local. Open source. No cloud APIs, no vector DB services, no internet connection required. Runs entirely on-device with SQLite + ChromaDB + your choice of local embedding model. Ships as a single Python package with zero external infrastructure dependencies.

SQLite + FTS5 + ChromaDB with a self-healing journal. One Python class that gives you keyword search, vector search, SQL queries, and structured filtering — all kept in sync automatically.

from hybriddb import HybridDB, LONGTEXT, TEXT

db = HybridDB("./my_data")
db.create_table("docs", {"title": TEXT, "body": LONGTEXT})

db.insert("docs", {"title": "Getting Started", "body": "A guide to using HybridDB..."})
db.insert("docs", {"title": "API Reference", "body": "Full API documentation..."})

# Search every text column
db.search("docs", "getting started")

# Search one column
db.search("docs", "body", "how do I begin", mode="hybrid")

# Structured query with parameters
db.query("docs", where="title LIKE ?", params=("%start%",))

Why HybridDB?

Every serious project that needs both keyword and semantic search ends up wiring SQLite + FTS5 + ChromaDB together. You handle schema creation, FTS5 triggers, ChromaDB collection management, keeping them in sync, recovering from crashes, rebuilding indexes...

HybridDB does all of that once, done right.

Feature Status
SQL CRUD (insert, update, delete, get, query)
FTS5 keyword search with BM25 scoring
ChromaDB semantic/vector search with HNSW
Hybrid search (RRF fusion of keyword + semantic)
Recency-weighted scoring
Schema management (create, add/drop/rename columns)
Self-healing journal (crash recovery)
Sync + async APIs
No external API dependencies (works offline)
Embedding model pluggable (sentence-transformers, OpenAI, custom)

Documentation

  • API reference — stable public methods, sync/async examples, graph and OLAP facades
  • Benchmarks — smoke vs full benchmark commands and expected runtime behavior
  • Release guide — local build, wheel smoke test, TestPyPI/PyPI publishing

Installation

pip install hybriddb

HybridDB uses ChromaDB's bundled local MiniLM embedding by default. No API key required.

Core Concepts

Column Types

HybridDB maps Python-friendly types to SQLite storage and automatically sets up the right search indexes:

Type SQLite FTS5 ChromaDB Use for
TEXT TEXT Names, titles, short strings
LONGTEXT TEXT Documents, messages, memory content
INTEGER INTEGER Counts, ages, IDs
REAL REAL Prices, scores, confidence values
BOOLEAN INTEGER Flags, status indicators
JSON TEXT Tags, metadata, structured data

TEXT columns get automated FTS5 keyword search. LONGTEXT columns get FTS5 + ChromaDB semantic search.

Search Modes

from hybriddb import HYBRID, LONGTEXT, TEXT, Column, SearchMode

db.create_table("docs", {"title": Column(TEXT), "body": LONGTEXT})

# Keyword only — fast, exact, great for names and titles
db.search("contacts", "name", "Alice", mode="keyword")

# Semantic only — finds "9am standup" when searching for "morning meetings"
db.search("memories", "content", "team rituals", mode=SearchMode.SEMANTIC)

# Hybrid — best of both, RRF fusion, the default
db.search("docs", "body", "getting started guide", mode=SearchMode.HYBRID)
db.search("docs", "body", "getting started guide", mode=HYBRID)

# Search across ALL text columns at once
db.search("contacts", "engineering manager")
db.search_columns("contacts", "engineering manager")

Async API

All core operations have async wrappers that run blocking SQLite/ChromaDB work in a worker thread:

await db.acreate_table("messages", {"content": LONGTEXT})
await db.ainsert("messages", {"content": "async-safe memory"})
results = await db.asearch("messages", "content", "memory")

Public Cursor

For small custom SQL reads or migrations, use the public cursor context manager:

with db.cursor() as cur:
    cur.execute("SELECT COUNT(*) FROM messages")
    count = cur.fetchone()[0]

Namespaced Advanced APIs

Graph and OLAP helpers remain available on HybridDB, with namespaced facades for discovery:

node_id = db.graph.add_node("Alice", type="person")
rows = db.olap.query("SELECT COUNT(*) AS total FROM messages")

Recency Scoring

Boost recent content over older content:

results = db.search(
    "messages", "content", "project update",
    recency_weight=0.3,        # 30% weight to recency
    recency_column="timestamp"
)

Self-Healing Journal

All ChromaDB mutations (adds, updates, deletes) are journaled in SQLite. On insert with sync=True (default), the journal is processed immediately. On sync=False, journal entries are deferred:

# Batch insert — defer ChromaDB sync for speed
db.insert_batch("contacts", big_list_of_rows, sync=False)
db.process_journal()  # Sync everything at once

If your process crashes mid-write, the journal replays pending entries on next startup. No ghosts, no drift.

Health & Maintenance

# Check if SQLite and ChromaDB are in sync
health = db.health("contacts")
# {"sqlite_rows": 5000, "chroma_docs": {"contacts_bio": 5000}, "status": "ok"}

# Reconcile: delete ghosts, add missing docs
result = db.reconcile("contacts")
# {"ghosts_deleted": 0, "missing_added": 3, "metadata_updated": 0}

Custom Embedding Models

By default, HybridDB uses ChromaDB's bundled local MiniLM embedding. Plug in any embedding function if you want a specific model or provider:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
db = HybridDB("./data", embedding_fn=lambda text: model.encode(text).tolist())

Works with any embedding provider — OpenAI, Cohere, Hugging Face, local models.

License

MIT — see LICENSE.

Author

Eddy Xu

Inspired by claude-mem by Matt Mack.

Status

Alpha — actively developed, API may evolve. Core CRUD and search are stable with full test coverage (35+ tests). Currently used in production in the Executive Assistant agent system.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hybriddb-0.4.0.tar.gz (248.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hybriddb-0.4.0-py3-none-any.whl (38.7 kB view details)

Uploaded Python 3

File details

Details for the file hybriddb-0.4.0.tar.gz.

File metadata

  • Download URL: hybriddb-0.4.0.tar.gz
  • Upload date:
  • Size: 248.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hybriddb-0.4.0.tar.gz
Algorithm Hash digest
SHA256 0d6639c9cb887b427d65a69281b0914ba4bd8eadf3d097456edf5f4642cdbcb1
MD5 96ea86fa474fc2371893aa20effccfdb
BLAKE2b-256 988944ed70adf51a4053b35cce57d0dbcb678889abd1ab17c5aaa793aee172a2

See more details on using hashes here.

File details

Details for the file hybriddb-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: hybriddb-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 38.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hybriddb-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 062130fdf7e69f2595aa66a6540b709bbd4966487cfb464237b0fda36657747e
MD5 19d3d9348e20920fd7fe8605f0720756
BLAKE2b-256 49e1dc49a4d5af6f4fdea4f531470c30cfb8d9fcb2758e84dc37c646245d595c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page