A standalone RAG package for Pydantic AI with support for multiple vector stores.

Project description

Pydantic AI RAG Stack

pydantic-ai-ragstack is a robust and structured framework designed for Retrieval Augmented Generation (RAG) systems. It leverages Python's pydantic library to enforce strict data schemas across the entire document ingestion, embedding, storage, and retrieval pipeline. This ensures high reliability and predictability when building AI applications based on proprietary knowledge bases.

🚀 Features

Structured Data Modeling: Uses Pydantic models (Document, DocumentPage, etc.) to define explicit schemas for all data components (metadata, content, images).
End-to-End Ingestion Pipeline: Handles document loading, chunking, image extraction, and metadata enrichment.
Vector Store Integration: Manages interaction with various vector databases (pydantic_ai_ragstack/vectorstore.py).
Advanced Retrieval: Includes capabilities for content re-ranking (pydantic_ai_ragstack/reranker.py) to improve search relevance.

🛠️ Installation

This project requires Python 3.8+ and several dependencies (e.g., LangChain, PyMuPDF, vector store clients).

# Assuming requirements.txt exists or needs manual installation
pip install -r requirements.txt

📂 Project Structure Overview

The core logic is organized within the pydantic_ai_ragstack package:

models.py: Defines all core data structures (e.g., Document, DocumentPageChunk, SearchResult) using Pydantic models.
ingestion.py: Manages the entire lifecycle of ingesting raw documents into structured chunks and vector embeddings.
documents.py: Handles document loading and parsing (PDF, etc.).
embeddings.py: Manages the generation of vector embeddings for content chunks.
retrieval.py: Contains logic for querying the vector store and retrieving relevant documents/chunks.
vectorstore.py: Abstract layer for interacting with underlying vector databases (e.g., Chroma, Pinecone).

🧠 Usage Example: Ingestion and Retrieval Flow

The typical workflow involves three main stages: Load -> Index -> Query.

Step 1: Document Loading & Processing

Raw documents are loaded and parsed into structured Document objects.

from pydantic_ai_ragstack import documents, ingestion
# Assume 'path/to/document.pdf' exists
raw_docs = documents.load("path/to/document.pdf")
structured_data = raw_docs # Contains list[Document]

Step 2: Indexing (Embedding and Storing)

The structured data is chunked, embedded, and stored in the vector store.

from pydantic_ai_ragstack import ingestion
# Assuming an embedding model is configured
status = ingestion.index_documents(structured_data, embedder=MyEmbeddings())

if status == "done":
    print("Indexing complete. Data is ready for retrieval.")

Step 3: Retrieval and Question Answering (RAG)

A query is executed against the indexed documents to retrieve relevant context before generating a final answer using an LLM (not included in this core module).

from pydantic_ai_ragstack import retrieval
# Query the system
query = "What are the main features of the RAG framework?"
search_results = retrieval.search(query, collection_name="my_knowledge_base")

if search_results:
    print("--- Retrieved Context ---")
    for result in search_results:
        print(f"Score: {result.score:.2f}\nContent: {result.content[:100]}...")
    
    # Pass context and original query to an LLM call here

🧪 Testing

To run unit tests, use pytest:

pip install pytest
pytest tests/

Project details

Release history Release notifications | RSS feed

This version

0.1.0

May 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantic_ai_ragstack-0.1.0.tar.gz (339.9 kB view details)

Uploaded May 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pydantic_ai_ragstack-0.1.0-py3-none-any.whl (43.9 kB view details)

Uploaded May 30, 2026 Python 3

File details

Details for the file pydantic_ai_ragstack-0.1.0.tar.gz.

File metadata

Download URL: pydantic_ai_ragstack-0.1.0.tar.gz
Upload date: May 30, 2026
Size: 339.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pydantic_ai_ragstack-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e656672599fae303a8537f8ab0234ac51ca4403baa79a68ecefee72ad8908754`
MD5	`9598674c10a80c5fd1fab537cc21d053`
BLAKE2b-256	`6b6d62ae8912793f46187712c64bf90bd5d0faaf561f14be897f705fc23e017b`

See more details on using hashes here.

File details

Details for the file pydantic_ai_ragstack-0.1.0-py3-none-any.whl.

File metadata

Download URL: pydantic_ai_ragstack-0.1.0-py3-none-any.whl
Upload date: May 30, 2026
Size: 43.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pydantic_ai_ragstack-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`88966734089a403e480fa58184312120e3b5dbd29e6a4cda97194272d2844ad5`
MD5	`077d026fdd585349c81f96c888c45c99`
BLAKE2b-256	`2f288e481f8062f2cb880f3e1c294465b5d6a13912ca4f0f3d5df28021ddef11`

See more details on using hashes here.

pydantic-ai-ragstack 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Pydantic AI RAG Stack

🚀 Features

🛠️ Installation

📂 Project Structure Overview

🧠 Usage Example: Ingestion and Retrieval Flow

Step 1: Document Loading & Processing

Step 2: Indexing (Embedding and Storing)

Step 3: Retrieval and Question Answering (RAG)

🧪 Testing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes