Embedded, local, open-source hybrid search for AI agents — SQLite + FTS5 + ChromaDB with self-healing journal
Project description
HybridDB
Purposefully built for AI Agents. HybridDB gives agents persistent, searchable memory — every conversation turn is indexed and retrievable via keyword, vector, or hybrid search. Used in production by the Executive Assistant agent system.
Embedded. Local. Open source. No cloud APIs, no vector DB services, no internet connection required. Runs entirely on-device with SQLite + ChromaDB + your choice of local embedding model. Ships as a single Python package with zero external infrastructure dependencies.
SQLite + FTS5 + ChromaDB with a self-healing journal. One Python class that gives you keyword search, vector search, SQL queries, and structured filtering — all kept in sync automatically.
from hybriddb import HybridDB, LONGTEXT, TEXT
db = HybridDB("./my_data")
db.create_table("docs", {"title": TEXT, "body": LONGTEXT})
db.insert("docs", {"title": "Getting Started", "body": "A guide to using HybridDB..."})
db.insert("docs", {"title": "API Reference", "body": "Full API documentation..."})
# Search every text column
db.search("docs", "getting started")
# Search one column
db.search("docs", "body", "how do I begin", mode="hybrid")
# Structured query with parameters
db.query("docs", where="title LIKE ?", params=("%start%",))
Why HybridDB?
Every serious project that needs both keyword and semantic search ends up wiring SQLite + FTS5 + ChromaDB together. You handle schema creation, FTS5 triggers, ChromaDB collection management, keeping them in sync, recovering from crashes, rebuilding indexes...
HybridDB does all of that once, done right.
| Feature | Status |
|---|---|
| SQL CRUD (insert, update, delete, get, query) | ✅ |
| FTS5 keyword search with BM25 scoring | ✅ |
| ChromaDB semantic/vector search with HNSW | ✅ |
| Hybrid search (RRF fusion of keyword + semantic) | ✅ |
| Recency-weighted scoring | ✅ |
| Schema management (create, add/drop/rename columns) | ✅ |
| Self-healing journal (crash recovery) | ✅ |
| Sync + async APIs | ✅ |
| No external API dependencies (works offline) | ✅ |
| Embedding model pluggable (sentence-transformers, OpenAI, custom) | ✅ |
Documentation
- API reference — stable public methods, sync/async examples, graph and OLAP facades
- Benchmarks — smoke vs full benchmark commands and expected runtime behavior
- Release guide — local build, wheel smoke test, TestPyPI/PyPI publishing
Installation
pip install hybriddb
HybridDB uses ChromaDB's bundled local MiniLM embedding by default. No API key required.
Core Concepts
Column Types
HybridDB maps Python-friendly types to SQLite storage and automatically sets up the right search indexes:
| Type | SQLite | FTS5 | ChromaDB | Use for |
|---|---|---|---|---|
TEXT |
TEXT | ✅ | — | Names, titles, short strings |
LONGTEXT |
TEXT | ✅ | ✅ | Documents, messages, memory content |
INTEGER |
INTEGER | — | — | Counts, ages, IDs |
REAL |
REAL | — | — | Prices, scores, confidence values |
BOOLEAN |
INTEGER | — | — | Flags, status indicators |
JSON |
TEXT | — | — | Tags, metadata, structured data |
TEXT columns get automated FTS5 keyword search. LONGTEXT columns get FTS5 + ChromaDB semantic search.
Search Modes
from hybriddb import HYBRID, LONGTEXT, TEXT, Column, SearchMode
db.create_table("docs", {"title": Column(TEXT), "body": LONGTEXT})
# Keyword only — fast, exact, great for names and titles
db.search("contacts", "name", "Alice", mode="keyword")
# Semantic only — finds "9am standup" when searching for "morning meetings"
db.search("memories", "content", "team rituals", mode=SearchMode.SEMANTIC)
# Hybrid — best of both, RRF fusion, the default
db.search("docs", "body", "getting started guide", mode=SearchMode.HYBRID)
db.search("docs", "body", "getting started guide", mode=HYBRID)
# Search across ALL text columns at once
db.search("contacts", "engineering manager")
db.search_columns("contacts", "engineering manager")
Async API
All core operations have async wrappers that run blocking SQLite/ChromaDB work in a worker thread:
await db.acreate_table("messages", {"content": LONGTEXT})
await db.ainsert("messages", {"content": "async-safe memory"})
results = await db.asearch("messages", "content", "memory")
Public Cursor
For small custom SQL reads or migrations, use the public cursor context manager:
with db.cursor() as cur:
cur.execute("SELECT COUNT(*) FROM messages")
count = cur.fetchone()[0]
Namespaced Advanced APIs
Graph and OLAP helpers remain available on HybridDB, with namespaced facades for discovery:
node_id = db.graph.add_node("Alice", type="person")
rows = db.olap.query("SELECT COUNT(*) AS total FROM messages")
Recency Scoring
Boost recent content over older content:
results = db.search(
"messages", "content", "project update",
recency_weight=0.3, # 30% weight to recency
recency_column="timestamp"
)
Self-Healing Journal
All ChromaDB mutations (adds, updates, deletes) are journaled in SQLite. On insert with sync=True (default), the journal is processed immediately. On sync=False, journal entries are deferred:
# Batch insert — defer ChromaDB sync for speed
db.insert_batch("contacts", big_list_of_rows, sync=False)
db.process_journal() # Sync everything at once
If your process crashes mid-write, the journal replays pending entries on next startup. No ghosts, no drift.
Health & Maintenance
# Check if SQLite and ChromaDB are in sync
health = db.health("contacts")
# {"sqlite_rows": 5000, "chroma_docs": {"contacts_bio": 5000}, "status": "ok"}
# Reconcile: delete ghosts, add missing docs
result = db.reconcile("contacts")
# {"ghosts_deleted": 0, "missing_added": 3, "metadata_updated": 0}
Custom Embedding Models
By default, HybridDB uses ChromaDB's bundled local MiniLM embedding. Plug in any embedding function if you want a specific model or provider:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
db = HybridDB("./data", embedding_fn=lambda text: model.encode(text).tolist())
Works with any embedding provider — OpenAI, Cohere, Hugging Face, local models.
License
MIT — see LICENSE.
Author
Eddy Xu
Inspired by claude-mem by Matt Mack.
Status
Alpha — actively developed, API may evolve. Core CRUD and search are stable with full test coverage (35+ tests). Currently used in production in the Executive Assistant agent system.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hybriddb-0.4.1.tar.gz.
File metadata
- Download URL: hybriddb-0.4.1.tar.gz
- Upload date:
- Size: 248.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01799168fd0e3b0a75e879eb68481b8ffeb0aeea5587e5bb2af4081e8054a1a1
|
|
| MD5 |
408aa3a6e2e10310cf40894c6e89491d
|
|
| BLAKE2b-256 |
ff8ee1a30feded8fa0a4732a1345f5d5ecebf51768e51c08dc3d861e148ff19b
|
File details
Details for the file hybriddb-0.4.1-py3-none-any.whl.
File metadata
- Download URL: hybriddb-0.4.1-py3-none-any.whl
- Upload date:
- Size: 38.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb458b14064bcf751fb11b3c4a73cdc4e95aa9349b77cb31dc32745aa96dd686
|
|
| MD5 |
94c94a2a091755cc9d65f0507a10a828
|
|
| BLAKE2b-256 |
0683b218fcd86db6e1051d977defc6ed3e8dccdd419df99351c95eb6c386c3c7
|