Skip to main content

A chromadb embeddings plugin for OVOS

Project description

Ask DeepWiki

ovos-chromadb-embeddings-plugin

ChromaDB-backed EmbeddingsDB vector store plugin for OpenVoiceOS.

Install

pip install ovos-chromadb-embeddings-plugin

What is an EmbeddingsDB?

EmbeddingsDB is the abstract base class from ovos-plugin-manager for vector stores. Plugins implementing it are discovered automatically by OPM under the entry-point group opm.embeddings. This plugin registers as:

opm.embeddings → ovos-chromadb-embeddings-plugin → ChromaEmbeddingsDB

OVOS subsystems call OVOSPluginFactory.get_plugin("opm.embeddings") to obtain a configured store without coupling to a specific backend, so any EmbeddingsDB plugin (ChromaDB here, or e.g. qdrant) is a drop-in swap.

Where this fits in OVOS

This plugin is the vector store half of the stack — it stores and searches vectors but does not produce them. Pair it with an embedding producer such as ovos-gguf-embeddings-plugin (text → vectors), or the face / voice embedders.

Concrete consumers that can be backed by this store:

Consumer Uses the store for
ovos-persona-server RAG: the OpenAI-compatible Files / Vector-Stores / /search endpoints
ovos-memory-plugins long-term semantic memory for a persona
face / voice recognition nearest-neighbour identity lookup over enrolment vectors

It is local-first: in persistent mode it runs fully offline on a CPU with no server.

Quickstart

import tempfile, numpy as np
from ovos_chromadb_embeddings import ChromaEmbeddingsDB

with tempfile.TemporaryDirectory() as tmp:
    db = ChromaEmbeddingsDB(config={"path": tmp})

    # Store a few 4-d vectors
    db.add_embeddings("apple",  np.array([0.9, 0.1, 0.0, 0.0]))
    db.add_embeddings("banana", np.array([0.0, 0.9, 0.1, 0.0]))
    db.add_embeddings("cherry", np.array([0.0, 0.0, 0.9, 0.1]))

    # Nearest-neighbour query
    query = np.array([0.85, 0.15, 0.0, 0.0])
    results = db.query(query, top_k=2)
    # → [("apple", 0.003...), ("banana", 0.45...)]
    print(results[0][0])  # "apple"

query returns (id, distance) tuples ordered nearest-first. The score is a distance, not a similarity — lower is closer for the default cosine metric (and for l2). Change the metric with hnsw:space (see Configuration). The query vector must have the same dimensionality as the stored vectors.

Configuration

Pass a config dict to ChromaEmbeddingsDB(config=...) or set it in your OVOS configuration under the plugin key.

Key Type Default Description
path str "./chromadb_storage" Local persistence directory (PersistentClient mode).
host str Remote ChromaDB server host. When set, uses HttpClient instead of PersistentClient.
port int 8000 Port for the remote ChromaDB server (HttpClient mode only).
default_collection_name str "embeddings" Name of the collection created/used on init.
hnsw:space str "cosine" Distance metric for HNSW index. Accepted: "cosine", "l2", "ip". Set via collection metadata.

Local (persistent) mode

db = ChromaEmbeddingsDB(config={"path": "/var/lib/ovos/chromadb"})

Remote server mode

db = ChromaEmbeddingsDB(config={"host": "192.168.1.10", "port": 8000})

API overview

Method Description
add_embeddings(key, embedding, metadata, collection_name) Upsert a single vector.
add_embeddings_batch(keys, embeddings, metadata, collection_name) Upsert a list of vectors.
get_embeddings(key, collection_name, return_metadata) Retrieve a vector by key.
get_embeddings_batch(keys, collection_name, return_metadata) Retrieve multiple vectors.
delete_embeddings(key, collection_name) Delete a vector by key.
delete_embeddings_batch(keys, collection_name) Delete multiple vectors.
query(embedding, top_k, return_metadata, collection_name) ANN search; returns [(id, distance)].
create_collection(name, metadata) Create (or get) a named collection.
get_collection(name) Retrieve a collection handle (raises ValueError if absent).
delete_collection(name) Drop a collection.
list_collections() List all collections.
count_embeddings_in_collection(collection_name) Count stored vectors.

Documentation

Examples

Testing

pip install -e ".[test]"
pytest test/ -v

The test suite uses a temporary PersistentClient with no network access. test/test_e2e.py runs a real end-to-end flow (add → query → verify nearest neighbour) using a small deterministic local embedder so it passes in CI without model downloads.


Credits

Originally developed by TigreGótico for OpenVoiceOS, sponsored by VisioLab. Modernized under the NGI0 Commons Fund / NLnet.

VisioLab

This work was sponsored by VisioLab, part of Royal Dutch Visio, is the test, education, and research center in the field of (innovative) assistive technology for blind and visually impaired people and professionals. We explore (new) technological developments such as Voice, VR and AI and make the knowledge and expertise we gain available to everyone.

NGI0 Commons Fund

This project was funded through the NGI0 Commons Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101135429.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ovos_chromadb_embeddings_plugin-0.3.0a4.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file ovos_chromadb_embeddings_plugin-0.3.0a4.tar.gz.

File metadata

File hashes

Hashes for ovos_chromadb_embeddings_plugin-0.3.0a4.tar.gz
Algorithm Hash digest
SHA256 2a82e1c2ac300097097ef23f35473b71b635908fd22bec07c780ca30b1b089ea
MD5 e274030bae532f36a6f6edf1dca4cf0b
BLAKE2b-256 44f945dc82346a09834b2c9e242ee79173784d09bfdee26bd32a31f3167428ab

See more details on using hashes here.

File details

Details for the file ovos_chromadb_embeddings_plugin-0.3.0a4-py3-none-any.whl.

File metadata

File hashes

Hashes for ovos_chromadb_embeddings_plugin-0.3.0a4-py3-none-any.whl
Algorithm Hash digest
SHA256 88ec0579b908ee3c33d59f8b35aad37084099cb547ff81bed297635821fc142d
MD5 805103837d7a4e5f9b2f5c2419260cbb
BLAKE2b-256 7a2204cf3eb1b436452b4c32a9b94b41c98d4ce3f402bed73a1d44534690fc65

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page