ChromaDB VectorStore plugin for refinire-rag
Project description
refinire-rag-chroma
ChromaDB VectorStore plugin for refinire-rag - A production-ready vector database integration for retrieval-augmented generation systems.
Overview
refinire-rag-chroma provides a seamless ChromaDB integration for the refinire-rag ecosystem. This plugin implements the VectorStore interface, enabling efficient vector storage, similarity search, and metadata filtering using ChromaDB as the backend.
Features
- Complete refinire-rag VectorStore Implementation: Inherits from
refinire_rag.VectorStore - Document Storage & Retrieval: Save and search documents with embeddings
- Multiple Search Types:
- Similarity search with query embeddings
- Document-to-document similarity search
- Metadata-based filtering and search
- Flexible Storage Options: In-memory or persistent storage
- Production Ready: Comprehensive error handling and logging
- Full Test Coverage: Unit tests and integration examples
Installation
Requirements
- Python 3.10+
- refinire-rag >= 0.0.1
- chromadb >= 0.4.0
Install Dependencies
# Clone the repository
git clone https://github.com/your-org/refinire-rag-chroma.git
cd refinire-rag-chroma
# Install with uv (recommended)
uv sync
# Or with pip
pip install -e .
Quick Start
Basic Usage
from refinire_rag import Document, TFIDFEmbedder, TFIDFEmbeddingConfig
from src.chroma_vector_store import ChromaVectorStore
# Initialize embedder
embedding_config = TFIDFEmbeddingConfig(max_features=1000, ngram_range=(1, 2))
embedder = TFIDFEmbedder(config=embedding_config)
# Initialize ChromaDB vector store
vector_store = ChromaVectorStore(
collection_name="my_documents",
persist_directory="./chroma_db", # None for in-memory
distance_metric="cosine"
)
# Create documents
documents = [
Document(
id="doc_001",
content="_�f�o����n\x00�gY",
metadata={"category": "ai", "language": "japanese"}
),
Document(
id="doc_002",
content="ChromaDBo�'�jٯ��������gY",
metadata={"category": "database", "language": "japanese"}
)
]
# Train embedder and generate embeddings
embedder.fit([doc.content for doc in documents])
embedding_results = embedder.embed_documents(documents)
embeddings = [result.vector.tolist() for result in embedding_results]
# Store documents with embeddings
vector_store.add_documents_with_embeddings(documents, embeddings)
# Search similar documents
query_result = embedder.embed_text("AIkdDfYHf")
search_results = vector_store.search_similar(
query_embedding=query_result.vector.tolist(),
top_k=5
)
# Print results
for result in search_results:
print(f"Document: {result.document_id}")
print(f"Score: {result.score:.4f}")
print(f"Content: {result.content}")
print("---")
Advanced Search with Metadata Filtering
# Search with metadata filter
filtered_results = vector_store.search_similar(
query_embedding=query_embedding,
top_k=3,
metadata_filter={"category": "ai", "language": "japanese"}
)
# Document-to-document similarity
similar_docs = vector_store.search_similar_to_document(
document_id="doc_001",
top_k=3
)
# Metadata-only search
metadata_results = vector_store.search_by_metadata(
metadata_filter={"category": "database"}
)
API Reference
ChromaVectorStore
Constructor
ChromaVectorStore(
collection_name: str = "refinire_documents",
persist_directory: Optional[str] = None,
distance_metric: str = "cosine"
)
Parameters:
collection_name: Name of the ChromaDB collectionpersist_directory: Directory for persistent storage (None for in-memory)distance_metric: Distance metric ("cosine", "l2", "ip")
Methods
add_documents_with_embeddings(documents, embeddings)
Store documents with precomputed embeddings.
search_similar(query_embedding, top_k=10, metadata_filter=None)
Search for similar vectors using query embedding.
search_similar_to_document(document_id, top_k=10, metadata_filter=None)
Find documents similar to a specific document.
search_by_metadata(metadata_filter)
Search documents by metadata conditions only.
get_stats()
Get vector store statistics (total vectors, dimensions, etc.).
Metadata Filter Syntax
ChromaDB supports the following filter operators:
# Basic equality
{"category": "ai"}
# Comparison operators
{"size_bytes": {"$gt": 100}} # Greater than
{"size_bytes": {"$gte": 100}} # Greater than or equal
{"size_bytes": {"$lt": 1000}} # Less than
{"size_bytes": {"$lte": 1000}} # Less than or equal
{"status": {"$ne": "deleted"}} # Not equal
# Array operators
{"tags": {"$in": ["ai", "ml"]}} # In array
{"category": {"$nin": ["test", "demo"]}} # Not in array
# Logical operators
{"$and": [{"category": "ai"}, {"language": "japanese"}]}
{"$or": [{"category": "ai"}, {"category": "ml"}]}
{"$not": {"category": "test"}}
# Multiple conditions (automatically combined with $and)
{"category": "ai", "language": "japanese"}
Examples
Complete Integration Example
See src/examples/real_refinire_rag_example.py for a comprehensive example demonstrating:
- Document embedding with TF-IDF
- Multiple search scenarios
- Metadata filtering
- Error handling
Run the example:
uv run python src/examples/real_refinire_rag_example.py
Legacy Integration Example
See src/examples/refinire_rag_integration.py for a mock integration example.
Development
Project Structure
refinire-rag-chroma/
\x00\x00 src/
\x00\x00 chroma_vector_store.py # Main implementation
\x00\x00 examples/ # Usage examples
\x00\x00 ...
\x00\x00 tests/
\x00\x00 unit/ # Unit tests
\x00\x00 e2e/ # E2E tests
\x00\x00 docs/ # Documentation
\x00\x00 pyproject.toml # Project configuration
\x00\x00 README.md # This file
Running Tests
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=src --cov-report=html
# Run specific test category
uv run pytest tests/unit/
Code Quality
The project follows strict code quality guidelines:
- Single Responsibility Principle: Each class has one responsibility
- DRY Principle: No code duplication
- Domain-Driven Design: Clear separation between models, services, and controllers
- Comprehensive Testing: Unit tests with high coverage
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name - Make your changes and add tests
- Run tests:
uv run pytest - Commit your changes:
git commit -am 'Add some feature' - Push to the branch:
git push origin feature/your-feature-name - Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Related Projects
- refinire-rag - The main RAG framework
- ChromaDB - The vector database backend
Support
For questions and support:
- Check the documentation
- Review existing issues
- Create a new issue if needed
Changelog
v0.0.1
- Initial release
- Complete refinire-rag VectorStore implementation
- Support for all major search operations
- Comprehensive test suite
- Production-ready error handling
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file refinire_rag_chroma-0.0.1.tar.gz.
File metadata
- Download URL: refinire_rag_chroma-0.0.1.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8125fb8b7537fe8cb06b61289da7074d031ec330eeb9ca625bba7bc788867447
|
|
| MD5 |
6451e7004395e283ffcae52bd475499d
|
|
| BLAKE2b-256 |
ca1fe58498f61d654c4e7b5beecc60b5298e5f430a1ba1e1334d6dd9245c6f22
|
File details
Details for the file refinire_rag_chroma-0.0.1-py3-none-any.whl.
File metadata
- Download URL: refinire_rag_chroma-0.0.1-py3-none-any.whl
- Upload date:
- Size: 21.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f34cb2f79e870fcf78b9780170cb9fb5b1892e7a3dcb9de352a31e1a55b06ef
|
|
| MD5 |
606569d0b5cdd353d242e6cbe02744ae
|
|
| BLAKE2b-256 |
4f6bab4bbbf7b6383d3f731a8d530db6a22ff2c274ca6c536c9774e4086f6367
|