Skip to main content

Incremental hybrid retrieval (lexical + vector, RRF-fused) with identical semantics on server Postgres and on-device SQLite

Project description

faro-embedded-search

Incremental hybrid retrieval with identical semantics on the server and on-device.

faro-embedded-search is a small, dependency-light library for searching a continuously growing collection of heterogeneous objects — notes, contacts, emails, tasks, tools, anything — without ever rebuilding an index. It is built for the pattern modern assistant apps need:

Index server-side, retrieve on-device. Embeddings are computed once, centrally. Each user's slice of the index is replicated into a local SQLite shard, and the device answers queries locally — fast, offline, and private — with exactly the same ranking the server would produce.

Built and dogfooded by Faro; also the embedded retrieval engine of Scope.

Why another retrieval library?

Most RAG tooling assumes a batch world: ingest a corpus, build a tree/graph/index, query it. Real applications have object-level CRUD — a contact edited, a task created, an email deleted — and small/on-device context windows that punish irrelevant results. faro-embedded-search makes two opinionated choices:

  1. Incremental-first. One upsert per object write. Postgres (pgvector HNSW + tsvector GIN) and SQLite (FTS5 + vector blobs) both take per-row inserts natively, so there is no "rebuild" anywhere in the design. Hierarchical enrichment (summary and cluster nodes) lives in the same flat pool as extra rows — never on the write path.
  2. Rank-based fusion. Lexical and semantic retrievers run in parallel and are fused with Reciprocal Rank Fusion (RRF, K=60). Because RRF consumes ranks, not raw scores, the incomparable scoring scales of ts_rank_cd (Postgres) and bm25() (SQLite FTS5) don't matter: the same corpus and query rank identically on both backends. That is what makes "same engine, server and device" honest rather than aspirational.

Quick start

from faro_embedded_search import IndexDoc, SearchIndex, OpenAICompatibleEmbedder
from faro_embedded_search.backends.sqlite import SQLiteBackend

index = SearchIndex(
    SQLiteBackend("search.db"),
    OpenAICompatibleEmbedder("https://api.openai.com/v1", api_key, "text-embedding-3-small"),
)

await index.upsert(IndexDoc(
    object_type="note", object_id="n1",
    title="Quantum entanglement notes",
    body="spooky action at a distance",
    partition="user-42",                      # isolation + shard key
))

results = await index.search("entanglement", partition="user-42", k=5)
# SearchResult(object_id='n1', match_type='hybrid', score=..., ...)

Deleting and updating are first-class — deletes are tombstones so they propagate through shard sync:

await index.delete("note", "n1")

Server-side: Postgres

from faro_embedded_search.backends.postgres import PostgresBackend  # pip install faro-embedded-search[postgres]

backend = PostgresBackend("postgresql+asyncpg://...", table="faro_embedded_search_index", dim=1536)
await backend.create_schema()     # idempotent; or transcribe the DDL into your migrations
index = SearchIndex(backend, embedder)

On-device: replicate a shard, query locally

from faro_embedded_search import export_shard, replicate

# Full export of one user's partition into a SQLite file:
shard = await export_shard(server_backend, "user-42.db", partition="user-42")

# Later, incremental delta sync (inserts, updates, AND deletes):
cursor = await replicate(server_backend, shard, partition="user-42", cursor=cursor)

The shard is a plain SQLite file whose schema is the interchange format — any runtime that can read SQLite (a future Swift/Kotlin reader, for instance) can retrieve against it. Embeddings travel with the shard; the device never needs to re-embed the corpus. Query embedding on-device can use a local model, a cached vector, or one tiny server round-trip for the query alone.

Tiering without batch trees

RAPTOR-style hierarchies buy small-context windows a lot — but a recursive tree can't be rebuilt on every insert. faro-embedded-search keeps the index flat and gets the benefit through node kinds:

  • leaf — the object itself (default).
  • summary — an optional one-line abstract of an object, indexed alongside it. O(1) per object, generated on your write path or a background pass.
  • cluster — optional theme summaries produced by a periodic background sweep over existing embeddings.

All kinds live in the same pool and are retrieved by the same top-k ("collapsed" retrieval); by default multiple hits on one object collapse into its best row, with matched_node_kinds telling you which handles matched:

await index.upsert_many([
    IndexDoc(object_type="note", object_id="n9", title="Meeting notes", body=transcript),
    IndexDoc(object_type="note", object_id="n9", node_kind="summary",
             title="Summary: hiring sync", body="decided to open two roles"),
])

Heterogeneous objects

Per-type behavior lives in code, not schema, via a tiny registry:

from faro_embedded_search import register, docs_for

@register("contact")
def index_contact(c) -> IndexDoc:
    return IndexDoc(object_type="contact", object_id=str(c.id),
                    title=c.name, body=f"{c.role} at {c.company}",
                    payload={"avatar": c.avatar_url})

await index.upsert_many(docs_for("contact", some_contact))

payload is carried into results (and shards), so result lists render without joining back to your application database.

Design notes

  • Embedding failure is non-fatal. A row written without a vector still serves lexical queries and gains semantic retrieval after a backfill — availability over completeness.
  • Exact semantic scan on SQLite. Per-user shards are small (tens of thousands of rows); an exact cosine scan (numpy-accelerated when present) costs no index maintenance and returns exact results. ANN acceleration (e.g. sqlite-vec) can be added without changing the file format.
  • Diversity, not padding. An optional per-group cap (diversity_key) drops near-duplicate siblings instead of deferring them.
  • No framework. Plain SQL on both backends, zero required dependencies in the core.

Installation

pip install faro-embedded-search                # core (SQLite backend, stdlib only)
pip install faro-embedded-search[postgres]      # + Postgres/pgvector backend
pip install faro-embedded-search[http]          # + OpenAI-compatible embedder
pip install faro-embedded-search[numpy]         # + fast cosine on SQLite

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faro_embedded_search-0.1.0.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

faro_embedded_search-0.1.0-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file faro_embedded_search-0.1.0.tar.gz.

File metadata

  • Download URL: faro_embedded_search-0.1.0.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for faro_embedded_search-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4e8eef9cc8db978178e25e1ef2cfa1ce7918c633e04b20ac3c8dabc89bd2b51d
MD5 d261e3ddcb2c65572065b84215b40bbb
BLAKE2b-256 f6a7182c59d0842fbb8d69073a0ef1de92f18f27e61141609c876469b5604460

See more details on using hashes here.

File details

Details for the file faro_embedded_search-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for faro_embedded_search-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dcb8188b9ea57b040f340423abc8dbd147eebfd0d41406a965ba01d02789985a
MD5 223f4128c2a07e9952b66a3d4ad70406
BLAKE2b-256 a89f611e8af65c3bc151c9f30b6570ee0d9133b17adb1b65fc9ab0b3e34c53b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page