Skip to main content

ChromaDB VectorStore plugin for refinire-rag

Project description

refinire-rag-chroma

ChromaDB VectorStore plugin for refinire-rag, providing seamless integration with ChromaDB for vector storage and similarity search.

Features

  • Zero Configuration: Works out of the box with sensible defaults
  • Environment Variable Configuration: Configure via REFINIRE_RAG_CHROMA_* environment variables
  • Full refinire-rag v0.1.1+ Compatibility: Implements the complete VectorStore interface
  • DocumentProcessor Integration: Supports refinire-rag processing pipelines
  • Persistent and In-Memory Storage: Choose between persistent disk storage or in-memory
  • Multiple Distance Metrics: Support for cosine, L2, and inner product distance
  • Production Ready: Comprehensive error handling, logging, and validation

Quick Start

Zero Configuration Usage

from refinire_rag_chroma import ChromaVectorStore

# Works immediately with default settings
vector_store = ChromaVectorStore()

# Add documents (requires embedder)
vector_store.set_embedder(your_embedder)
processed_docs = list(vector_store.process(documents))

Environment Variable Configuration

# Set environment variables
export REFINIRE_RAG_CHROMA_COLLECTION_NAME="my_documents"
export REFINIRE_RAG_CHROMA_PERSIST_DIRECTORY="/data/chroma"
export REFINIRE_RAG_CHROMA_DISTANCE_METRIC="cosine"
from refinire_rag_chroma import ChromaVectorStore

# Automatically uses environment variables
vector_store = ChromaVectorStore()

Installation

pip install refinire-rag-chroma

Or with uv:

uv add refinire-rag-chroma

Configuration

Environment Variables

Variable Default Description
REFINIRE_RAG_CHROMA_COLLECTION_NAME "refinire_documents" Collection name
REFINIRE_RAG_CHROMA_PERSIST_DIRECTORY None Storage directory (None = in-memory)
REFINIRE_RAG_CHROMA_DISTANCE_METRIC "cosine" Distance metric ("cosine", "l2", "ip")
REFINIRE_RAG_CHROMA_BATCH_SIZE 100 Batch size for operations
REFINIRE_RAG_CHROMA_MAX_RETRIES 3 Maximum retry attempts
REFINIRE_RAG_CHROMA_AUTO_CREATE_COLLECTION "true" Auto-create collection
REFINIRE_RAG_CHROMA_AUTO_CLEAR_ON_INIT "false" Clear on initialization

Parameter-based Configuration

from refinire_rag_chroma import ChromaVectorStore

# Override specific settings with parameters
vector_store = ChromaVectorStore(
    collection_name="custom_collection",
    persist_directory="/path/to/storage",
    distance_metric="l2"
)

Usage Examples

Basic Vector Operations

import numpy as np
from refinire_rag_chroma import ChromaVectorStore
from refinire_rag.storage import VectorEntry

vector_store = ChromaVectorStore()

# Add a vector
entry = VectorEntry(
    document_id="doc1",
    content="Sample document",
    embedding=np.array([0.1, 0.2, 0.3, 0.4, 0.5]),
    metadata={"source": "example"}
)
vector_store.add_vector(entry)

# Search similar vectors
query_vector = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
results = vector_store.search_similar(query_vector, limit=10)

Document Processing Pipeline

from refinire_rag_chroma import ChromaVectorStore
from refinire_rag.models.document import Document

# Set up vector store with embedder
vector_store = ChromaVectorStore()
vector_store.set_embedder(your_embedder)

# Process documents
documents = [
    Document(id="1", content="First document", metadata={}),
    Document(id="2", content="Second document", metadata={})
]

# Documents are automatically embedded and stored
processed_docs = list(vector_store.process(documents))

Metadata Filtering

# Search by metadata
results = vector_store.search_by_metadata(
    filters={"source": "wikipedia"},
    limit=50
)

# Count vectors with filters
count = vector_store.count_vectors(filters={"category": "science"})

Docker Usage

FROM python:3.10

ENV REFINIRE_RAG_CHROMA_COLLECTION_NAME=production_docs
ENV REFINIRE_RAG_CHROMA_PERSIST_DIRECTORY=/data/chroma
ENV REFINIRE_RAG_CHROMA_DISTANCE_METRIC=cosine

COPY . /app
WORKDIR /app
RUN pip install refinire-rag-chroma

CMD ["python", "app.py"]

Docker Compose

version: '3.8'
services:
  app:
    image: my-app:latest
    environment:
      - REFINIRE_RAG_CHROMA_COLLECTION_NAME=production_docs
      - REFINIRE_RAG_CHROMA_PERSIST_DIRECTORY=/data/chroma
      - REFINIRE_RAG_CHROMA_BATCH_SIZE=200
    volumes:
      - chroma_data:/data/chroma

volumes:
  chroma_data:

API Reference

ChromaVectorStore

The main vector store implementation that supports both the VectorStore and DocumentProcessor interfaces.

Methods

  • add_vector(entry: VectorEntry) -> str: Add a single vector
  • add_vectors(entries: List[VectorEntry]) -> List[str]: Add multiple vectors
  • get_vector(document_id: str) -> Optional[VectorEntry]: Retrieve a vector
  • update_vector(entry: VectorEntry) -> bool: Update a vector
  • delete_vector(document_id: str) -> bool: Delete a vector
  • search_similar(query_vector: np.ndarray, limit: int, threshold: Optional[float], filters: Optional[Dict]) -> List[VectorSearchResult]: Search similar vectors
  • search_by_metadata(filters: Dict, limit: int) -> List[VectorSearchResult]: Search by metadata only
  • count_vectors(filters: Optional[Dict]) -> int: Count vectors
  • get_stats() -> VectorStoreStats: Get store statistics
  • clear() -> bool: Clear all vectors
  • set_embedder(embedder: Any) -> None: Set embedder for processing
  • process(documents: Iterable[Document], config: Optional[Any]) -> Iterator[Document]: Process documents

Configuration

ChromaVectorStore automatically reads configuration from environment variables with sensible defaults. You can override specific settings by passing parameters to the constructor.

Development

Setup

git clone https://github.com/your-repo/refinire-rag-chroma
cd refinire-rag-chroma
uv install

Testing

uv run pytest

With Coverage

uv run pytest --cov=src

Requirements

  • Python 3.8+
  • refinire-rag >=0.1.1
  • chromadb >=0.4.0
  • numpy

License

MIT License - see LICENSE file for details.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Run tests and ensure they pass
  6. Submit a pull request

Changelog

v0.0.1

  • Initial release
  • Full refinire-rag v0.1.1+ compatibility
  • Environment variable configuration system
  • Zero-configuration deployment support
  • DocumentProcessor integration
  • Comprehensive test suite

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refinire_rag_chroma-0.0.3.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

refinire_rag_chroma-0.0.3-py3-none-any.whl (29.6 kB view details)

Uploaded Python 3

File details

Details for the file refinire_rag_chroma-0.0.3.tar.gz.

File metadata

  • Download URL: refinire_rag_chroma-0.0.3.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for refinire_rag_chroma-0.0.3.tar.gz
Algorithm Hash digest
SHA256 3ac0a67804da21f1cb54481a7a7f4b8440e562ca8317a3eb0c3e81e2b62a696e
MD5 fd4030a56779d112054fe2b328876322
BLAKE2b-256 674ffc657201bd6db470d0035b7c0e903c93624e770e7dd1e71d5c492b75b583

See more details on using hashes here.

File details

Details for the file refinire_rag_chroma-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for refinire_rag_chroma-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c9ef1f81ab2f4200ab96a22c09369d67002e7e28a0cc2f92c2f40187a3bbf368
MD5 7044c646277c4a3fa42f483ae2aca069
BLAKE2b-256 6b3087c36f56453c77b8cb71b872f5b2d8f1618c19c081495f4e075b43c2a9f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page