Skip to main content

A production-ready, provider-agnostic Python SDK for End-to-End RAG pipelines.

Project description

Vectra (Python)

Vectra is a production-grade, provider-agnostic Python SDK for building end-to-end Retrieval-Augmented Generation (RAG) systems. It is designed for teams that need correctness, extensibility, async performance, and observability across embeddings, vector databases, retrieval strategies, and LLM providers.

PyPI - Downloads GitHub Release Quality Gate Status

If you find this project useful, consider supporting it:
Star this project on GitHub Sponsor me on GitHub Buy me a Coffee

Table of Contents


1. Overview

Vectra implements a fully modular RAG pipeline:

Load → Chunk → Embed → Store → Retrieve → Rerank → Plan → Ground → Generate → Stream

Vectra SDK Architecture

Vectra SDK – End-to-End RAG Architecture

All stages are explicitly configured, async-first, and observable.

Key Characteristics

  • Async-first API (asyncio)
  • Provider-agnostic embeddings & LLMs
  • Multiple vector backends (Postgres, Chroma, Qdrant, Milvus)
  • Advanced retrieval (HyDE, Multi-Query, Hybrid RRF, MMR)
  • Unified streaming interface
  • Built-in evaluation and observability
  • CLI + SDK parity

2. Design Goals & Philosophy

Explicitness over Magic

Vectra avoids hidden defaults. Chunking, retrieval, grounding, memory, and generation behavior are always explicit and validated.

Production-First

Index helpers, rate limiting, embedding cache, observability, and evaluation are first-class features.

Provider Neutrality

Switching providers (OpenAI ↔ Gemini ↔ Anthropic ↔ Ollama) requires no application code changes.

Extensibility

All major subsystems are interface-driven and designed to be extended safely.


3. Feature Matrix

Providers

  • Embeddings: OpenAI, Gemini, Ollama, HuggingFace
  • Generation: OpenAI, Gemini, Anthropic, Ollama, OpenRouter, HuggingFace
  • Streaming: Async generators with normalized output

Vector Stores

  • PostgreSQL (Prisma + pgvector)
  • ChromaDB
  • Qdrant
  • Milvus

Retrieval Strategies

  • Naive cosine similarity
  • HyDE (Hypothetical Document Embeddings)
  • Multi-Query expansion (RRF)
  • Hybrid semantic + lexical (RRF)
  • MMR diversification

4. Installation

Library

pip install vectra-py
# or
uv pip install vectra-py

Backends

# Prisma Client Python – https://prisma.brendonovich.dev
pip install prisma-client-py
# ChromaDB – https://docs.trychroma.com
pip install chromadb
# Qdrant Python Client – https://qdrant.tech/documentation
pip install qdrant-client
# Milvus Python SDK – https://milvus.io/docs
pip install pymilvus

CLI

vectra --help
# alternative
python -m vectra.cli --help

Requirements

Vectra depends on: pydantic, asyncio, prisma-client-py, chromadb, openai, google-generativeai, anthropic, pypdf, mammoth, openpyxl


5. Quick Start

import asyncpg
from vectra import VectraClient, VectraConfig, ProviderType

pool = await asyncpg.create_pool(os.getenv('DATABASE_URL'))

config = VectraConfig(
    embedding={
        'provider': ProviderType.OPENAI,
        'api_key': os.getenv('OPENAI_API_KEY'),
        'model_name': 'text-embedding-3-small'
    },
    llm={
        'provider': ProviderType.GEMINI,
        'api_key': os.getenv('GOOGLE_API_KEY'),
        'model_name': 'gemini-2.5-flash'
    },
    database={
        'type': 'postgres',
        'client_instance': pool,
        'table_name': 'document',
        'column_map': { 'content': 'content', 'metadata': 'metadata', 'vector': 'vector' }
    }
)

client = VectraClient(config)
await client.ingest_documents('./docs')
result = await client.query_rag('What is the vacation policy?')
print(result['answer'])

6. Core Concepts

Providers

Providers implement embeddings, generation, or both. Vectra normalizes responses and streaming across providers.

Vector Stores

Vector stores persist embeddings and metadata. Backends are swappable via configuration.

Chunking

  • Recursive: Token-aware, separator-aware splitting
  • Agentic: LLM-driven semantic propositions

Retrieval

Configurable strategies to balance recall, precision, and latency.

Reranking

Optional LLM-based reordering of candidate chunks.

Metadata Enrichment

Optional per-chunk summaries, keywords, and hypothetical questions generated during ingestion.

Query Planning & Grounding

Controls context assembly and factual grounding constraints.

Conversation Memory

Persist multi-turn chat history across sessions.


7. Configuration Reference (Usage-Driven)

All configuration is validated using Pydantic at runtime.

Embedding

embedding={
  'provider': ProviderType.OPENAI,
  'api_key': os.getenv('OPENAI_API_KEY'),
  'model_name': 'text-embedding-3-small',
  'dimensions': 1536
}

Use dimensions when using pgvector to avoid runtime mismatches.


LLM

llm={
  'provider': ProviderType.GEMINI,
  'api_key': os.getenv('GOOGLE_API_KEY'),
  'model_name': 'gemini-2.5-flash',
  'temperature': 0.3,
  'max_tokens': 1024
}

Used for generation


Database

Supports Prisma, Chroma, Qdrant, Milvus.

# PostgreSQL (native asyncpg)
database={
  'type': 'postgres',
  'client_instance': pg_pool,  # asyncpg.Pool or Connection
  'table_name': 'document',
  'column_map': { 'content': 'content', 'metadata': 'metadata', 'vector': 'vector' }
}
# Prisma (Postgres via prisma-client-py)
database={
  'type': 'prisma',
  'client_instance': prisma,
  'table_name': 'Document',
  'column_map': { 'content': 'content', 'metadata': 'metadata', 'vector': 'embedding' }
}
# ChromaDB
database={
  'type': 'chroma',
  'client_instance': chroma_client,  # chromadb.Client or PersistentClient
  'table_name': 'rag_collection',
  'column_map': { 'content': 'content', 'metadata': 'metadata', 'vector': 'embedding' }
}
# Qdrant
database={
  'type': 'qdrant',
  'client_instance': qdrant_client,  # qdrant_client.QdrantClient
  'table_name': 'rag_collection',
  'column_map': { 'content': 'content', 'metadata': 'metadata', 'vector': 'embedding' }
}
# Milvus
database={
  'type': 'milvus',
  'client_instance': milvus_client,  # pymilvus client
  'table_name': 'rag_collection',
  'column_map': { 'content': 'content', 'metadata': 'metadata', 'vector': 'embedding' }
}

Chunking

chunking={
  'strategy': ChunkingStrategy.RECURSIVE,
  'chunk_size': 1000,
  'chunk_overlap': 200
}

Agentic:

chunking={
  'strategy': ChunkingStrategy.AGENTIC,
  'agentic_llm': {
    'provider': ProviderType.OPENAI,
    'api_key': os.getenv('OPENAI_API_KEY'),
    'model_name': 'gpt-4o-mini'
  }
}

Retrieval

retrieval={ 'strategy': RetrievalStrategy.HYBRID }

Hybrid is recommended for production workloads.


Reranking

reranking={
  'enabled': True,
  'window_size': 20,
  'top_n': 5
}

Memory

memory={ 'enabled': True, 'type': 'in-memory', 'max_messages': 20 }

Redis and Postgres are supported.

# Redis
memory={
  'enabled': True,
  'type': 'redis',
  'max_messages': 20,
  'redis': {
    'client_instance': redis_client,
    'key_prefix': 'vectra:chat:'
  }
}
# Postgres
memory={
  'enabled': True,
  'type': 'postgres',
  'max_messages': 20,
  'postgres': {
    'client_instance': pg_pool,  # asyncpg.Pool or Connection
    'table_name': 'ChatMessage',
    'column_map': {
      'sessionId': 'sessionId',
      'role': 'role',
      'content': 'content',
      'createdAt': 'createdAt'
    }
  }
}

Observability

observability={
  'enabled': True,
  'sqlite_path': 'vectra-observability.db'
}

8. Ingestion Pipeline

await client.ingest_documents('./documents')
  • Files or directories supported
  • Recursive traversal
  • Embedding cache via SHA256
  • Optional rate limiting

Supported formats: PDF, DOCX, XLSX, TXT, Markdown


9. Querying & Streaming

Standard:

res = await client.query_rag('Refund policy?')

Streaming:

stream = await client.query_rag('Draft email', stream=True)
async for chunk in stream:
    print(chunk.get('delta', ''), end='')

10. Conversation Memory

Pass a session_id to preserve multi-turn context.


11. Evaluation & Quality Measurement

await client.evaluate([
  { 'question': 'Capital of France?', 'expected_ground_truth': 'Paris' }
])

Metrics: Faithfulness, Relevance


12. CLI

Ingest & Query

vectra ingest ./docs --config=./config.json
vectra query "What are the payment terms?" --config=./config.json --stream

WebConfig (Config Generator UI)

vectra webconfig

Launches a local web UI to interactively generate and validate vectra.config.json.


Observability Dashboard

vectra dashboard

Launches a local dashboard for metrics, traces, and session analysis.


13. Observability & Callbacks

Tracks metrics, traces, and chat sessions when enabled.

Callbacks allow hooking into ingestion, retrieval, reranking, and generation stages.


14. Telemetry

Vectra collects anonymous usage data to help us improve the SDK, prioritize features, and detect broken versions.

What we track

  • Identity: A random UUID (distinct_id) stored locally in ~/.vectra/telemetry.json. No PII, emails, IPs, or hostnames.
  • Events:
    • sdk_initialized: Config shape (providers used), OS/Runtime version, session type (api/cli/chat).
    • ingest_started/completed: Source type, chunking strategy, duration bucket, chunk count bucket.
    • query_executed: Retrieval strategy, query mode (rag), result count, latency bucket.
    • feature_used: WebConfig/Dashboard usage.
    • evaluation_run: Dataset size bucket.
    • error_occurred: Error type and stage (no stack traces).
    • cli_command_used: Command name and flags.

Why we track it

  • Detect broken versions: Spikes in error_occurred help us find bugs.
  • Measure adoption: Helps us understand which providers (OpenAI vs Gemini) and vector stores are most popular.
  • Drop support safely: We can see if anyone is still using Python 3.8 before dropping it.

How to opt-out

Telemetry is enabled by default. To disable it:

Option 1: Config

client = VectraClient(
    VectraConfig(
        # ...
        telemetry={'enabled': False}
    )
)

Option 2: Environment Variable Set VECTRA_TELEMETRY_DISABLED=1 or DO_NOT_TRACK=1.


15. Database Schemas & Indexing

model Document {
  id        String   @id @default(uuid())
  content   String
  metadata  Json
  embedding Unsupported("vector")?
  createdAt DateTime @default(now())
}

16. Extending Vectra

Implement custom vector stores by extending VectorStore.


17. Architecture Overview

Vectra follows a modular, provider-agnostic RAG architecture with clear separation of ingestion, retrieval, and generation pipelines.


18. Development & Contribution Guide

  • Python 3.8+
  • Async-first (asyncio)
  • Pydantic-based configuration

19. Production Best Practices

  • Match embedding dimensions to pgvector
  • Prefer Hybrid retrieval
  • Enable observability in staging
  • Evaluate before changing chunk sizes

Vectra (Python) scales cleanly from local prototypes to production-grade RAG platforms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectra_rag_py-1.0.0.tar.gz (59.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vectra_rag_py-1.0.0-py3-none-any.whl (65.2 kB view details)

Uploaded Python 3

File details

Details for the file vectra_rag_py-1.0.0.tar.gz.

File metadata

  • Download URL: vectra_rag_py-1.0.0.tar.gz
  • Upload date:
  • Size: 59.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vectra_rag_py-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2f78e3b0eac73e4c1cae34f3172e994c36ba732d52d270150309f8773f79057a
MD5 e467ddbfb867273cd0da223b462b2e6f
BLAKE2b-256 41be2da9bae0417cdfd9fde821acbc8a53705d5931122ca305fd27d130761edd

See more details on using hashes here.

Provenance

The following attestation bundles were made for vectra_rag_py-1.0.0.tar.gz:

Publisher: python-publish.yml on iamabhishek-n/vectra-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vectra_rag_py-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: vectra_rag_py-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 65.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vectra_rag_py-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 072e845ffd614dddda0ccd0671485f2e46869dc89b4c3f634257ee9ab966f20f
MD5 ce7ca2a93de3573a91a9630439f0325f
BLAKE2b-256 d1a4087c31fe4ff9dac60f8b46ebd6b363c2f712f0762f1967e52b96b46be7ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for vectra_rag_py-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on iamabhishek-n/vectra-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page