Skip to main content

Embedded, local, open-source hybrid search for AI agents — SQLite + FTS5 + ChromaDB with self-healing journal

Project description

HybridDB

Purposefully built for AI Agents. HybridDB gives agents persistent, searchable memory — every conversation turn is indexed and retrievable via keyword, vector, or hybrid search. Used in production by the Executive Assistant agent system.

Embedded. Local. Open source. No cloud APIs, no vector DB services, no internet connection required. Runs entirely on-device with SQLite + ChromaDB + your choice of local embedding model. Ships as a single Python package with zero external infrastructure dependencies.

SQLite + FTS5 + ChromaDB with a self-healing journal. One Python class that gives you keyword search, vector search, SQL queries, and structured filtering — all kept in sync automatically.

from hybriddb import HybridDB, LONGTEXT, TEXT

db = HybridDB("./my_data")
db.create_table("docs", {"title": TEXT, "body": LONGTEXT})

db.insert("docs", {"title": "Getting Started", "body": "A guide to using HybridDB..."})
db.insert("docs", {"title": "API Reference", "body": "Full API documentation..."})

# Search every text column
db.search("docs", "getting started")

# Search one column
db.search("docs", "body", "how do I begin", mode="hybrid")

# Structured query with parameters
db.query("docs", where="title LIKE ?", params=("%start%",))

Why HybridDB?

Every serious project that needs both keyword and semantic search ends up wiring SQLite + FTS5 + ChromaDB together. You handle schema creation, FTS5 triggers, ChromaDB collection management, keeping them in sync, recovering from crashes, rebuilding indexes...

HybridDB does all of that once, done right.

Feature Status
SQL CRUD (insert, update, delete, get, query)
FTS5 keyword search with BM25 scoring
ChromaDB semantic/vector search with HNSW
Hybrid search (RRF fusion of keyword + semantic)
Recency-weighted scoring
Schema management (create, add/drop/rename columns)
Self-healing journal (crash recovery)
Sync + async APIs
No external API dependencies (works offline)
Embedding model pluggable (sentence-transformers, OpenAI, custom)

Documentation

  • API reference — stable public methods, sync/async examples, graph and OLAP facades
  • Benchmarks — smoke vs full benchmark commands and expected runtime behavior
  • Release guide — local build, wheel smoke test, TestPyPI/PyPI publishing

Installation

pip install hybriddb

HybridDB uses ChromaDB's bundled local MiniLM embedding by default. No API key required.

Core Concepts

Column Types

HybridDB maps Python-friendly types to SQLite storage and automatically sets up the right search indexes:

Type SQLite FTS5 ChromaDB Use for
TEXT TEXT Names, titles, short strings
LONGTEXT TEXT Documents, messages, memory content
INTEGER INTEGER Counts, ages, IDs
REAL REAL Prices, scores, confidence values
BOOLEAN INTEGER Flags, status indicators
JSON TEXT Tags, metadata, structured data

TEXT columns get automated FTS5 keyword search. LONGTEXT columns get FTS5 + ChromaDB semantic search.

Search Modes

from hybriddb import HYBRID, LONGTEXT, TEXT, Column, SearchMode

db.create_table("docs", {"title": Column(TEXT), "body": LONGTEXT})

# Keyword only — fast, exact, great for names and titles
db.search("contacts", "name", "Alice", mode="keyword")

# Semantic only — finds "9am standup" when searching for "morning meetings"
db.search("memories", "content", "team rituals", mode=SearchMode.SEMANTIC)

# Hybrid — best of both, RRF fusion, the default
db.search("docs", "body", "getting started guide", mode=SearchMode.HYBRID)
db.search("docs", "body", "getting started guide", mode=HYBRID)

# Search across ALL text columns at once
db.search("contacts", "engineering manager")
db.search_columns("contacts", "engineering manager")

Async API

All core operations have async wrappers that run blocking SQLite/ChromaDB work in a worker thread:

await db.acreate_table("messages", {"content": LONGTEXT})
await db.ainsert("messages", {"content": "async-safe memory"})
results = await db.asearch("messages", "content", "memory")

Public Cursor

For small custom SQL reads or migrations, use the public cursor context manager:

with db.cursor() as cur:
    cur.execute("SELECT COUNT(*) FROM messages")
    count = cur.fetchone()[0]

Namespaced Advanced APIs

Graph and OLAP helpers remain available on HybridDB, with namespaced facades for discovery:

node_id = db.graph.add_node("Alice", type="person")
rows = db.olap.query("SELECT COUNT(*) AS total FROM messages")

Recency Scoring

Boost recent content over older content:

results = db.search(
    "messages", "content", "project update",
    recency_weight=0.3,        # 30% weight to recency
    recency_column="timestamp"
)

Self-Healing Journal

All ChromaDB mutations (adds, updates, deletes) are journaled in SQLite. On insert with sync=True (default), the journal is processed immediately. On sync=False, journal entries are deferred:

# Batch insert — defer ChromaDB sync for speed
db.insert_batch("contacts", big_list_of_rows, sync=False)
db.process_journal()  # Sync everything at once

If your process crashes mid-write, the journal replays pending entries on next startup. No ghosts, no drift.

Health & Maintenance

# Check if SQLite and ChromaDB are in sync
health = db.health("contacts")
# {"sqlite_rows": 5000, "chroma_docs": {"contacts_bio": 5000}, "status": "ok"}

# Reconcile: delete ghosts, add missing docs
result = db.reconcile("contacts")
# {"ghosts_deleted": 0, "missing_added": 3, "metadata_updated": 0}

Custom Embedding Models

By default, HybridDB uses ChromaDB's bundled local MiniLM embedding. Plug in any embedding function if you want a specific model or provider:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
db = HybridDB("./data", embedding_fn=lambda text: model.encode(text).tolist())

Works with any embedding provider — OpenAI, Cohere, Hugging Face, local models.

License

MIT — see LICENSE.

Author

Eddy Xu

Inspired by claude-mem by Matt Mack.

Status

Alpha — actively developed, API may evolve. Core CRUD and search are stable with full test coverage (35+ tests). Currently used in production in the Executive Assistant agent system.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hybriddb-0.4.2.tar.gz (249.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hybriddb-0.4.2-py3-none-any.whl (38.8 kB view details)

Uploaded Python 3

File details

Details for the file hybriddb-0.4.2.tar.gz.

File metadata

  • Download URL: hybriddb-0.4.2.tar.gz
  • Upload date:
  • Size: 249.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hybriddb-0.4.2.tar.gz
Algorithm Hash digest
SHA256 14840dca047d7f93ad97361d62252e75a43a522fa7ae5d03fee8a779f9d59fb7
MD5 cfb22619df4f46bf9be44bef4060e807
BLAKE2b-256 a4d40b2b6d4bc1c27ee14079af8ef0014a881ee7524da73e92f43e1e478010ed

See more details on using hashes here.

File details

Details for the file hybriddb-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: hybriddb-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 38.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hybriddb-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 337d111263e0f4338701c8aef6ed4a50f020d5fca1c8e43e2887f5a51269cb4e
MD5 966a2b523439ef3e8f1f716504f30dde
BLAKE2b-256 ca49741987b825c5beadf1fc3fbb2988ef8326b2baa6b6e24cd6163300b31f58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page