Skip to main content

Provider-agnostic embedding functions for ChromaDB with OpenRouter and local fallback support.

Project description

chromaroute

CI Python 3.11+ License: MIT

Provider-agnostic embedding functions for ChromaDB with automatic fallback support.

Features

  • ChromaDB-native interface: Drop-in EmbeddingFunction implementations
  • Provider fallback chain: OpenRouter → Local (SentenceTransformers)
  • OpenRouter integration: Full support for OpenRouter's embedding API with provider routing
  • Production-ready: Comprehensive error handling, configurable timeouts, actionable error messages

Installation

pip install chromaroute

# With local embeddings (SentenceTransformers)
pip install chromaroute[local]

Quick Start

from chromaroute import build_embedding_function, load_config

# Auto-detect available providers
config = load_config()
embed_fn = build_embedding_function(config)

# Or rely on environment auto-detection
embed_fn = build_embedding_function()

# Use with ChromaDB
import chromadb
client = chromadb.EphemeralClient()
collection = client.create_collection(
    name="my_collection",
    embedding_function=embed_fn,
)
collection.add(documents=["Hello world"], ids=["doc1"])

Configuration

Set environment variables:

# OpenRouter (primary)
OPENROUTER_API_KEY=sk-or-...
OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small
OPENROUTER_EMBED_PROVIDER_JSON='{"order":["openai","mistral"],"allow_fallbacks":true}'

# Local fallback uses sentence-transformers/all-MiniLM-L6-v2 by default

Direct OpenRouter Usage

from chromaroute import OpenRouterEmbeddingFunction

embed_fn = OpenRouterEmbeddingFunction(
    model="openai/text-embedding-3-small",
    api_key="sk-or-...",
)

# Use with ChromaDB
embeddings = embed_fn(["text to embed"])
# Returns list[list[float]] with one embedding per input text.

VectorStore (Optional)

For simplified collection management with automatic batching:

from chromaroute import VectorStore

store = VectorStore("my_docs", persist_path="./chroma_db")
store.add_documents(
    documents=["Hello world", "Goodbye world"],
    metadatas=[{"source_id": "doc_a"}, {"source_id": "doc_b"}],
)

# Flat single-query result (lists per field)
result = store.query(
    query_texts="greeting",  # Accepts string or list[str]
    n_results=2,
    include=["documents", "metadatas", "distances"],
)
# Access first (and only) query's rows from nested results
top_docs = result["documents"][0]

# Row-like convenience result
records = store.query_one_records(
    "greeting",
    n_results=2,
    where={"source_id": "doc_a"},  # Filter by metadata
    include=["documents", "metadatas", "distances"],
)

query()/get() pass include, where, and where_document directly to ChromaDB. If include is omitted, ChromaDB defaults are used.

Common Recipes

  1. Provenance-aware retrieval: Store source_id in metadata during ingest and include "metadatas" at query time. Filter by it using where={"source_id": "..."}.
  2. Application-ready rows: Use query_one_records() when you want id/document/distance/metadata bundled into one object per hit.
  3. Stale data removal: Use store.delete(where={"source_id": "..."}) before re-ingesting a previously processed document source to prevent duplicates.
  4. Advanced ChromaDB features: For partial updates or upserts, use the underlying Chroma collection directly via store.collection.

Environment Variables

Variable Default Description
OPENROUTER_API_KEY OpenRouter API key (enables OpenRouter provider)
OPENROUTER_BASE_URL https://openrouter.ai/api/v1 Override OpenRouter base URL (advanced)
OPENROUTER_EMBEDDINGS_MODEL openai/text-embedding-3-small Model for OpenRouter embeddings
OPENROUTER_EMBED_PROVIDER_JSON Provider routing config (JSON)
LOCAL_EMBEDDINGS_MODEL sentence-transformers/all-MiniLM-L6-v2 Model for local embeddings
EMBED_PROVIDER auto Force provider: auto, openrouter, or local

Advanced Usage (Best-Effort)

chromaroute is optimized for OpenRouter, but includes a few intentional escape hatches for custom setups. These are not the primary path and are supported on a best-effort basis. See docs/advanced.md.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chromaroute-0.4.0.tar.gz (152.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chromaroute-0.4.0-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file chromaroute-0.4.0.tar.gz.

File metadata

  • Download URL: chromaroute-0.4.0.tar.gz
  • Upload date:
  • Size: 152.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for chromaroute-0.4.0.tar.gz
Algorithm Hash digest
SHA256 903fabecfc99ab0e38a5fcc3d50765d5d855815b9665abb89535ef90864a973b
MD5 a66bb0065fa5f4947a72701869378593
BLAKE2b-256 42846575a2dd2f30848edd06a99e275a5ae0fcb4002f57f043ff0c900304cc2d

See more details on using hashes here.

File details

Details for the file chromaroute-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: chromaroute-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for chromaroute-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8e8ea7723bd78e04a06ae7c384ea14932f3751473591dfbf143ca7e4991ed6b2
MD5 0ed43f2b2c1fab932a455e1066163da4
BLAKE2b-256 f4be830c729f2ee0ecc02f3871afd770cf270899f10103c80082b76c0f9743c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page