Skip to main content

ChromaDB VectorStore plugin for refinire-rag

Project description

refinire-rag-chroma

ChromaDB VectorStore plugin for refinire-rag, providing seamless integration with ChromaDB for vector storage and similarity search.

Features

  • Zero Configuration: Works out of the box with sensible defaults
  • Environment Variable Configuration: Configure via REFINIRE_RAG_CHROMA_* environment variables
  • Full refinire-rag v0.1.1+ Compatibility: Implements the complete VectorStore interface
  • DocumentProcessor Integration: Supports refinire-rag processing pipelines
  • Persistent and In-Memory Storage: Choose between persistent disk storage or in-memory
  • Multiple Distance Metrics: Support for cosine, L2, and inner product distance
  • Production Ready: Comprehensive error handling, logging, and validation

Quick Start

Zero Configuration Usage

from refinire_rag_chroma import ChromaVectorStore

# Works immediately with default settings
vector_store = ChromaVectorStore()

# Add documents (requires embedder)
vector_store.set_embedder(your_embedder)
processed_docs = list(vector_store.process(documents))

Environment Variable Configuration

# Set environment variables
export REFINIRE_RAG_CHROMA_COLLECTION_NAME="my_documents"
export REFINIRE_RAG_CHROMA_PERSIST_DIRECTORY="/data/chroma"
export REFINIRE_RAG_CHROMA_DISTANCE_METRIC="cosine"
from refinire_rag_chroma import ChromaVectorStore

# Automatically uses environment variables
vector_store = ChromaVectorStore()

Installation

pip install refinire-rag-chroma

Or with uv:

uv add refinire-rag-chroma

Configuration

Environment Variables

Variable Default Description
REFINIRE_RAG_CHROMA_COLLECTION_NAME "refinire_documents" Collection name
REFINIRE_RAG_CHROMA_PERSIST_DIRECTORY None Storage directory (None = in-memory)
REFINIRE_RAG_CHROMA_DISTANCE_METRIC "cosine" Distance metric ("cosine", "l2", "ip")
REFINIRE_RAG_CHROMA_BATCH_SIZE 100 Batch size for operations
REFINIRE_RAG_CHROMA_MAX_RETRIES 3 Maximum retry attempts
REFINIRE_RAG_CHROMA_AUTO_CREATE_COLLECTION "true" Auto-create collection
REFINIRE_RAG_CHROMA_AUTO_CLEAR_ON_INIT "false" Clear on initialization

Parameter-based Configuration

from refinire_rag_chroma import ChromaVectorStore

# Override specific settings with parameters
vector_store = ChromaVectorStore(
    collection_name="custom_collection",
    persist_directory="/path/to/storage",
    distance_metric="l2"
)

Usage Examples

Basic Vector Operations

import numpy as np
from refinire_rag_chroma import ChromaVectorStore
from refinire_rag.storage import VectorEntry

vector_store = ChromaVectorStore()

# Add a vector
entry = VectorEntry(
    document_id="doc1",
    content="Sample document",
    embedding=np.array([0.1, 0.2, 0.3, 0.4, 0.5]),
    metadata={"source": "example"}
)
vector_store.add_vector(entry)

# Search similar vectors
query_vector = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
results = vector_store.search_similar(query_vector, limit=10)

Document Processing Pipeline

from refinire_rag_chroma import ChromaVectorStore
from refinire_rag.models.document import Document

# Set up vector store with embedder
vector_store = ChromaVectorStore()
vector_store.set_embedder(your_embedder)

# Process documents
documents = [
    Document(id="1", content="First document", metadata={}),
    Document(id="2", content="Second document", metadata={})
]

# Documents are automatically embedded and stored
processed_docs = list(vector_store.process(documents))

Metadata Filtering

# Search by metadata
results = vector_store.search_by_metadata(
    filters={"source": "wikipedia"},
    limit=50
)

# Count vectors with filters
count = vector_store.count_vectors(filters={"category": "science"})

Docker Usage

FROM python:3.10

ENV REFINIRE_RAG_CHROMA_COLLECTION_NAME=production_docs
ENV REFINIRE_RAG_CHROMA_PERSIST_DIRECTORY=/data/chroma
ENV REFINIRE_RAG_CHROMA_DISTANCE_METRIC=cosine

COPY . /app
WORKDIR /app
RUN pip install refinire-rag-chroma

CMD ["python", "app.py"]

Docker Compose

version: '3.8'
services:
  app:
    image: my-app:latest
    environment:
      - REFINIRE_RAG_CHROMA_COLLECTION_NAME=production_docs
      - REFINIRE_RAG_CHROMA_PERSIST_DIRECTORY=/data/chroma
      - REFINIRE_RAG_CHROMA_BATCH_SIZE=200
    volumes:
      - chroma_data:/data/chroma

volumes:
  chroma_data:

API Reference

ChromaVectorStore

The main vector store implementation that supports both the VectorStore and DocumentProcessor interfaces.

Methods

  • add_vector(entry: VectorEntry) -> str: Add a single vector
  • add_vectors(entries: List[VectorEntry]) -> List[str]: Add multiple vectors
  • get_vector(document_id: str) -> Optional[VectorEntry]: Retrieve a vector
  • update_vector(entry: VectorEntry) -> bool: Update a vector
  • delete_vector(document_id: str) -> bool: Delete a vector
  • search_similar(query_vector: np.ndarray, limit: int, threshold: Optional[float], filters: Optional[Dict]) -> List[VectorSearchResult]: Search similar vectors
  • search_by_metadata(filters: Dict, limit: int) -> List[VectorSearchResult]: Search by metadata only
  • count_vectors(filters: Optional[Dict]) -> int: Count vectors
  • get_stats() -> VectorStoreStats: Get store statistics
  • clear() -> bool: Clear all vectors
  • set_embedder(embedder: Any) -> None: Set embedder for processing
  • process(documents: Iterable[Document], config: Optional[Any]) -> Iterator[Document]: Process documents

Configuration

ChromaVectorStore automatically reads configuration from environment variables with sensible defaults. You can override specific settings by passing parameters to the constructor.

Development

Setup

git clone https://github.com/your-repo/refinire-rag-chroma
cd refinire-rag-chroma
uv install

Testing

uv run pytest

With Coverage

uv run pytest --cov=src

Requirements

  • Python 3.8+
  • refinire-rag >=0.1.1
  • chromadb >=0.4.0
  • numpy

License

MIT License - see LICENSE file for details.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Run tests and ensure they pass
  6. Submit a pull request

Changelog

v0.0.1

  • Initial release
  • Full refinire-rag v0.1.1+ compatibility
  • Environment variable configuration system
  • Zero-configuration deployment support
  • DocumentProcessor integration
  • Comprehensive test suite

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refinire_rag_chroma-0.0.5.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

refinire_rag_chroma-0.0.5-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file refinire_rag_chroma-0.0.5.tar.gz.

File metadata

  • Download URL: refinire_rag_chroma-0.0.5.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for refinire_rag_chroma-0.0.5.tar.gz
Algorithm Hash digest
SHA256 1442b740435cef367d036f4e1a8510ce2a7c2cba32dec68427fa5644ca07833b
MD5 6d59ef30045e75d346db852d150c8604
BLAKE2b-256 69fffccf0048f62ad7c3db9d42f4e370ad1727d11c1c90b39925648e5e93f32c

See more details on using hashes here.

File details

Details for the file refinire_rag_chroma-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for refinire_rag_chroma-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a8c32cf0768cdb1bd26c5f29bec1937c376e1032ebe90e47daed52da45efdaec
MD5 d0dcc2c50df1aa01033e35246f0b389b
BLAKE2b-256 f05f1a2418a088f0313ae0fc6b75077dd4ff91ce92b9cd481d37920c09b3ee35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page