Skip to main content

Unified retrieval module for RAG system with multiple vector database support

Project description

Retriever

Unified retrieval module for RAG system with support for multiple vector databases.

Features

  • Multiple vector database backends: Qdrant, ChromaDB, Milvus
  • Filename search: Separate collection for efficient filename-based search
  • Context enrichment: Fetch neighboring chunks for better context
  • Category filtering: Filter results by accessible categories
  • Unified interface: Single API for all vector stores

Installation

poetry add donkit-retriever

Usage

Basic Setup

from donkit.retriever import create_vectorstore_service, RetrievalConfig
from langchain.embeddings import OpenAIEmbeddings

# Configure retrieval options
config = RetrievalConfig(
    vector_database="qdrant",
    retriever_options={
        "filename_search": True,
        "partial_search": True,
        "max_retrieved_docs": 10,
    }
)

# Create service
embeddings = OpenAIEmbeddings()
service = create_vectorstore_service(
    db_type="qdrant",
    embeddings=embeddings,
    config=config,
    collection_name="my_collection",
    database_uri="http://localhost:6333",
)

# Search documents
documents = await service.search_documents(
    query="What is RAG?",
    k=5
)

Supported Vector Databases

Qdrant

service = create_vectorstore_service(
    db_type="qdrant",
    embeddings=embeddings,
    config=config,
    database_uri="http://localhost:6333",
)

ChromaDB

service = create_vectorstore_service(
    db_type="chroma",
    embeddings=embeddings,
    config=config,
    database_uri="http://localhost:8000",
)

Milvus

service = create_vectorstore_service(
    db_type="milvus",
    embeddings=embeddings,
    config=config,
    database_uri="http://localhost:19530",
)

Configuration Options

from donkit.retriever import RetrievalConfig, RetrieverOptions

config = RetrievalConfig(
    vector_database="qdrant",  # qdrant | chroma | milvus
    retriever_options=RetrieverOptions(
        filename_search=True,  # Enable filename-based search
        partial_search=True,   # Fetch neighboring chunks
        max_retrieved_docs=10, # Max documents to retrieve
    ),
    ranker="http://ranker-service:8000",  # Optional reranker URL
)

Architecture

VectorstoreModule

Each database has its own module implementing VectorstoreModuleAbstract:

  • QdrantVectorstoreModule
  • ChromaVectorstoreModule
  • MilvusVectorstoreModule

VectorstoreService

Orchestrates search operations:

  1. Filename search (if enabled)
  2. Vector search
  3. Neighbor fetching (if partial_search enabled)
  4. Document combination and deduplication

Development

# Install dependencies
poetry install

# Run tests
poetry run pytest

# Run linter
poetry run ruff check .

License

Proprietary

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

donkit_retriever-0.1.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

donkit_retriever-0.1.0-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file donkit_retriever-0.1.0.tar.gz.

File metadata

  • Download URL: donkit_retriever-0.1.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.13.0 Linux/6.14.0-1014-gcp

File hashes

Hashes for donkit_retriever-0.1.0.tar.gz
Algorithm Hash digest
SHA256 70907735fd48a1b3a6be9de00bdbccf9edb67669cc097d8a6689f732e77808d2
MD5 ac0e37a01f9a5dec7d5f32c9d9be121c
BLAKE2b-256 f3866f8d46ab32c22ab42dafaebc28fc621c3d1f4ad23a00c2718caae7f299e6

See more details on using hashes here.

File details

Details for the file donkit_retriever-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: donkit_retriever-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.13.0 Linux/6.14.0-1014-gcp

File hashes

Hashes for donkit_retriever-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd3cf612c79293a8c94d2ccee77b67e7d3bc76f8849543b8bdc5bc2267ddb6bc
MD5 1803b786c32894320cccf014db6d7e7f
BLAKE2b-256 942860643395cef1654d3b3489862e77e174fc7b707dfa1b9079c995cfafce01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page