Skip to main content

ChromaDB VectorStore plugin for refinire-rag

Project description

refinire-rag-chroma

ChromaDB VectorStore plugin for refinire-rag - A production-ready vector database integration for retrieval-augmented generation systems.

Overview

refinire-rag-chroma provides a seamless ChromaDB integration for the refinire-rag ecosystem. This plugin implements the VectorStore interface, enabling efficient vector storage, similarity search, and metadata filtering using ChromaDB as the backend.

Features

  •  Complete refinire-rag VectorStore Implementation: Inherits from refinire_rag.VectorStore
  •  Document Storage & Retrieval: Save and search documents with embeddings
  •  Multiple Search Types:
    • Similarity search with query embeddings
    • Document-to-document similarity search
    • Metadata-based filtering and search
  •  Flexible Storage Options: In-memory or persistent storage
  •  Production Ready: Comprehensive error handling and logging
  •  Full Test Coverage: Unit tests and integration examples

Installation

Requirements

  • Python 3.10+
  • refinire-rag >= 0.0.1
  • chromadb >= 0.4.0

Install Dependencies

# Clone the repository
git clone https://github.com/your-org/refinire-rag-chroma.git
cd refinire-rag-chroma

# Install with uv (recommended)
uv sync

# Or with pip
pip install -e .

Quick Start

Basic Usage

from refinire_rag import Document, TFIDFEmbedder, TFIDFEmbeddingConfig
from src.chroma_vector_store import ChromaVectorStore

# Initialize embedder
embedding_config = TFIDFEmbeddingConfig(max_features=1000, ngram_range=(1, 2))
embedder = TFIDFEmbedder(config=embedding_config)

# Initialize ChromaDB vector store
vector_store = ChromaVectorStore(
    collection_name="my_documents",
    persist_directory="./chroma_db",  # None for in-memory
    distance_metric="cosine"
)

# Create documents
documents = [
    Document(
        id="doc_001",
        content="_�f�o����n\x00�gY",
        metadata={"category": "ai", "language": "japanese"}
    ),
    Document(
        id="doc_002", 
        content="ChromaDBo�'�jٯ��������gY",
        metadata={"category": "database", "language": "japanese"}
    )
]

# Train embedder and generate embeddings
embedder.fit([doc.content for doc in documents])
embedding_results = embedder.embed_documents(documents)
embeddings = [result.vector.tolist() for result in embedding_results]

# Store documents with embeddings
vector_store.add_documents_with_embeddings(documents, embeddings)

# Search similar documents
query_result = embedder.embed_text("AIkdDfYHf")
search_results = vector_store.search_similar(
    query_embedding=query_result.vector.tolist(),
    top_k=5
)

# Print results
for result in search_results:
    print(f"Document: {result.document_id}")
    print(f"Score: {result.score:.4f}")
    print(f"Content: {result.content}")
    print("---")

Advanced Search with Metadata Filtering

# Search with metadata filter
filtered_results = vector_store.search_similar(
    query_embedding=query_embedding,
    top_k=3,
    metadata_filter={"category": "ai", "language": "japanese"}
)

# Document-to-document similarity
similar_docs = vector_store.search_similar_to_document(
    document_id="doc_001",
    top_k=3
)

# Metadata-only search
metadata_results = vector_store.search_by_metadata(
    metadata_filter={"category": "database"}
)

API Reference

ChromaVectorStore

Constructor

ChromaVectorStore(
    collection_name: str = "refinire_documents",
    persist_directory: Optional[str] = None,
    distance_metric: str = "cosine"
)

Parameters:

  • collection_name: Name of the ChromaDB collection
  • persist_directory: Directory for persistent storage (None for in-memory)
  • distance_metric: Distance metric ("cosine", "l2", "ip")

Methods

add_documents_with_embeddings(documents, embeddings)

Store documents with precomputed embeddings.

search_similar(query_embedding, top_k=10, metadata_filter=None)

Search for similar vectors using query embedding.

search_similar_to_document(document_id, top_k=10, metadata_filter=None)

Find documents similar to a specific document.

search_by_metadata(metadata_filter)

Search documents by metadata conditions only.

get_stats()

Get vector store statistics (total vectors, dimensions, etc.).

Metadata Filter Syntax

ChromaDB supports the following filter operators:

# Basic equality
{"category": "ai"}

# Comparison operators
{"size_bytes": {"$gt": 100}}     # Greater than
{"size_bytes": {"$gte": 100}}    # Greater than or equal
{"size_bytes": {"$lt": 1000}}    # Less than
{"size_bytes": {"$lte": 1000}}   # Less than or equal
{"status": {"$ne": "deleted"}}   # Not equal

# Array operators
{"tags": {"$in": ["ai", "ml"]}}           # In array
{"category": {"$nin": ["test", "demo"]}}  # Not in array

# Logical operators
{"$and": [{"category": "ai"}, {"language": "japanese"}]}
{"$or": [{"category": "ai"}, {"category": "ml"}]}
{"$not": {"category": "test"}}

# Multiple conditions (automatically combined with $and)
{"category": "ai", "language": "japanese"}

Examples

Complete Integration Example

See src/examples/real_refinire_rag_example.py for a comprehensive example demonstrating:

  • Document embedding with TF-IDF
  • Multiple search scenarios
  • Metadata filtering
  • Error handling

Run the example:

uv run python src/examples/real_refinire_rag_example.py

Legacy Integration Example

See src/examples/refinire_rag_integration.py for a mock integration example.

Development

Project Structure

refinire-rag-chroma/
\x00\x00 src/
   \x00\x00 chroma_vector_store.py    # Main implementation
   \x00\x00 examples/                 # Usage examples
   \x00\x00 ...
\x00\x00 tests/
   \x00\x00 unit/                     # Unit tests
   \x00\x00 e2e/                      # E2E tests
\x00\x00 docs/                         # Documentation
\x00\x00 pyproject.toml               # Project configuration
\x00\x00 README.md                    # This file

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src --cov-report=html

# Run specific test category
uv run pytest tests/unit/

Code Quality

The project follows strict code quality guidelines:

  • Single Responsibility Principle: Each class has one responsibility
  • DRY Principle: No code duplication
  • Domain-Driven Design: Clear separation between models, services, and controllers
  • Comprehensive Testing: Unit tests with high coverage

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/your-feature-name
  3. Make your changes and add tests
  4. Run tests: uv run pytest
  5. Commit your changes: git commit -am 'Add some feature'
  6. Push to the branch: git push origin feature/your-feature-name
  7. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Related Projects

Support

For questions and support:

  1. Check the documentation
  2. Review existing issues
  3. Create a new issue if needed

Changelog

v0.0.1

  • Initial release
  • Complete refinire-rag VectorStore implementation
  • Support for all major search operations
  • Comprehensive test suite
  • Production-ready error handling

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refinire_rag_chroma-0.0.1.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

refinire_rag_chroma-0.0.1-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file refinire_rag_chroma-0.0.1.tar.gz.

File metadata

  • Download URL: refinire_rag_chroma-0.0.1.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for refinire_rag_chroma-0.0.1.tar.gz
Algorithm Hash digest
SHA256 8125fb8b7537fe8cb06b61289da7074d031ec330eeb9ca625bba7bc788867447
MD5 6451e7004395e283ffcae52bd475499d
BLAKE2b-256 ca1fe58498f61d654c4e7b5beecc60b5298e5f430a1ba1e1334d6dd9245c6f22

See more details on using hashes here.

File details

Details for the file refinire_rag_chroma-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for refinire_rag_chroma-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9f34cb2f79e870fcf78b9780170cb9fb5b1892e7a3dcb9de352a31e1a55b06ef
MD5 606569d0b5cdd353d242e6cbe02744ae
BLAKE2b-256 4f6bab4bbbf7b6383d3f731a8d530db6a22ff2c274ca6c536c9774e4086f6367

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page