ChromaDB VectorStore plugin for refinire-rag
Project description
refinire-rag-chroma
ChromaDB VectorStore plugin for refinire-rag, providing seamless integration with ChromaDB for vector storage and similarity search.
Features
- ✅ Zero Configuration: Works out of the box with sensible defaults
- ✅ Environment Variable Configuration: Configure via
REFINIRE_RAG_CHROMA_*environment variables - ✅ Full refinire-rag v0.1.1+ Compatibility: Implements the complete VectorStore interface
- ✅ DocumentProcessor Integration: Supports refinire-rag processing pipelines
- ✅ Persistent and In-Memory Storage: Choose between persistent disk storage or in-memory
- ✅ Multiple Distance Metrics: Support for cosine, L2, and inner product distance
- ✅ Production Ready: Comprehensive error handling, logging, and validation
Quick Start
Zero Configuration Usage
from refinire_rag_chroma import ChromaVectorStore
# Works immediately with default settings
vector_store = ChromaVectorStore()
# Add documents (requires embedder)
vector_store.set_embedder(your_embedder)
processed_docs = list(vector_store.process(documents))
Environment Variable Configuration
# Set environment variables
export REFINIRE_RAG_CHROMA_COLLECTION_NAME="my_documents"
export REFINIRE_RAG_CHROMA_PERSIST_DIRECTORY="/data/chroma"
export REFINIRE_RAG_CHROMA_DISTANCE_METRIC="cosine"
from refinire_rag_chroma import ChromaVectorStore
# Automatically uses environment variables
vector_store = ChromaVectorStore()
Installation
pip install refinire-rag-chroma
Or with uv:
uv add refinire-rag-chroma
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
REFINIRE_RAG_CHROMA_COLLECTION_NAME |
"refinire_documents" |
Collection name |
REFINIRE_RAG_CHROMA_PERSIST_DIRECTORY |
None |
Storage directory (None = in-memory) |
REFINIRE_RAG_CHROMA_DISTANCE_METRIC |
"cosine" |
Distance metric ("cosine", "l2", "ip") |
REFINIRE_RAG_CHROMA_BATCH_SIZE |
100 |
Batch size for operations |
REFINIRE_RAG_CHROMA_MAX_RETRIES |
3 |
Maximum retry attempts |
REFINIRE_RAG_CHROMA_AUTO_CREATE_COLLECTION |
"true" |
Auto-create collection |
REFINIRE_RAG_CHROMA_AUTO_CLEAR_ON_INIT |
"false" |
Clear on initialization |
Parameter-based Configuration
from refinire_rag_chroma import ChromaVectorStore
# Override specific settings with parameters
vector_store = ChromaVectorStore(
collection_name="custom_collection",
persist_directory="/path/to/storage",
distance_metric="l2"
)
Usage Examples
Basic Vector Operations
import numpy as np
from refinire_rag_chroma import ChromaVectorStore
from refinire_rag.storage import VectorEntry
vector_store = ChromaVectorStore()
# Add a vector
entry = VectorEntry(
document_id="doc1",
content="Sample document",
embedding=np.array([0.1, 0.2, 0.3, 0.4, 0.5]),
metadata={"source": "example"}
)
vector_store.add_vector(entry)
# Search similar vectors
query_vector = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
results = vector_store.search_similar(query_vector, limit=10)
Document Processing Pipeline
from refinire_rag_chroma import ChromaVectorStore
from refinire_rag.models.document import Document
# Set up vector store with embedder
vector_store = ChromaVectorStore()
vector_store.set_embedder(your_embedder)
# Process documents
documents = [
Document(id="1", content="First document", metadata={}),
Document(id="2", content="Second document", metadata={})
]
# Documents are automatically embedded and stored
processed_docs = list(vector_store.process(documents))
Metadata Filtering
# Search by metadata
results = vector_store.search_by_metadata(
filters={"source": "wikipedia"},
limit=50
)
# Count vectors with filters
count = vector_store.count_vectors(filters={"category": "science"})
Docker Usage
FROM python:3.10
ENV REFINIRE_RAG_CHROMA_COLLECTION_NAME=production_docs
ENV REFINIRE_RAG_CHROMA_PERSIST_DIRECTORY=/data/chroma
ENV REFINIRE_RAG_CHROMA_DISTANCE_METRIC=cosine
COPY . /app
WORKDIR /app
RUN pip install refinire-rag-chroma
CMD ["python", "app.py"]
Docker Compose
version: '3.8'
services:
app:
image: my-app:latest
environment:
- REFINIRE_RAG_CHROMA_COLLECTION_NAME=production_docs
- REFINIRE_RAG_CHROMA_PERSIST_DIRECTORY=/data/chroma
- REFINIRE_RAG_CHROMA_BATCH_SIZE=200
volumes:
- chroma_data:/data/chroma
volumes:
chroma_data:
API Reference
ChromaVectorStore
The main vector store implementation that supports both the VectorStore and DocumentProcessor interfaces.
Methods
add_vector(entry: VectorEntry) -> str: Add a single vectoradd_vectors(entries: List[VectorEntry]) -> List[str]: Add multiple vectorsget_vector(document_id: str) -> Optional[VectorEntry]: Retrieve a vectorupdate_vector(entry: VectorEntry) -> bool: Update a vectordelete_vector(document_id: str) -> bool: Delete a vectorsearch_similar(query_vector: np.ndarray, limit: int, threshold: Optional[float], filters: Optional[Dict]) -> List[VectorSearchResult]: Search similar vectorssearch_by_metadata(filters: Dict, limit: int) -> List[VectorSearchResult]: Search by metadata onlycount_vectors(filters: Optional[Dict]) -> int: Count vectorsget_stats() -> VectorStoreStats: Get store statisticsclear() -> bool: Clear all vectorsset_embedder(embedder: Any) -> None: Set embedder for processingprocess(documents: Iterable[Document], config: Optional[Any]) -> Iterator[Document]: Process documents
Configuration
ChromaVectorStore automatically reads configuration from environment variables with sensible defaults. You can override specific settings by passing parameters to the constructor.
Development
Setup
git clone https://github.com/your-repo/refinire-rag-chroma
cd refinire-rag-chroma
uv install
Testing
uv run pytest
With Coverage
uv run pytest --cov=src
Requirements
- Python 3.8+
- refinire-rag >=0.1.1
- chromadb >=0.4.0
- numpy
License
MIT License - see LICENSE file for details.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Run tests and ensure they pass
- Submit a pull request
Changelog
v0.0.1
- Initial release
- Full refinire-rag v0.1.1+ compatibility
- Environment variable configuration system
- Zero-configuration deployment support
- DocumentProcessor integration
- Comprehensive test suite
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file refinire_rag_chroma-0.0.3.tar.gz.
File metadata
- Download URL: refinire_rag_chroma-0.0.3.tar.gz
- Upload date:
- Size: 25.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ac0a67804da21f1cb54481a7a7f4b8440e562ca8317a3eb0c3e81e2b62a696e
|
|
| MD5 |
fd4030a56779d112054fe2b328876322
|
|
| BLAKE2b-256 |
674ffc657201bd6db470d0035b7c0e903c93624e770e7dd1e71d5c492b75b583
|
File details
Details for the file refinire_rag_chroma-0.0.3-py3-none-any.whl.
File metadata
- Download URL: refinire_rag_chroma-0.0.3-py3-none-any.whl
- Upload date:
- Size: 29.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9ef1f81ab2f4200ab96a22c09369d67002e7e28a0cc2f92c2f40187a3bbf368
|
|
| MD5 |
7044c646277c4a3fa42f483ae2aca069
|
|
| BLAKE2b-256 |
6b3087c36f56453c77b8cb71b872f5b2d8f1618c19c081495f4e075b43c2a9f3
|