A standalone RAG package for Pydantic AI with support for multiple vector stores.
Project description
Pydantic AI RAG Stack
pydantic-ai-ragstack is a robust and structured framework designed for Retrieval Augmented Generation (RAG) systems. It leverages Python's pydantic library to enforce strict data schemas across the entire document ingestion, embedding, storage, and retrieval pipeline. This ensures high reliability and predictability when building AI applications based on proprietary knowledge bases.
🚀 Features
- Structured Data Modeling: Uses Pydantic models (
Document,DocumentPage, etc.) to define explicit schemas for all data components (metadata, content, images). - End-to-End Ingestion Pipeline: Handles document loading, chunking, image extraction, and metadata enrichment.
- Vector Store Integration: Manages interaction with various vector databases (
pydantic_ai_ragstack/vectorstore.py). - Advanced Retrieval: Includes capabilities for content re-ranking (
pydantic_ai_ragstack/reranker.py) to improve search relevance.
🛠️ Installation
This project requires Python 3.8+ and several dependencies (e.g., LangChain, PyMuPDF, vector store clients).
# Assuming requirements.txt exists or needs manual installation
pip install -r requirements.txt
📂 Project Structure Overview
The core logic is organized within the pydantic_ai_ragstack package:
models.py: Defines all core data structures (e.g.,Document,DocumentPageChunk,SearchResult) using Pydantic models.ingestion.py: Manages the entire lifecycle of ingesting raw documents into structured chunks and vector embeddings.documents.py: Handles document loading and parsing (PDF, etc.).embeddings.py: Manages the generation of vector embeddings for content chunks.retrieval.py: Contains logic for querying the vector store and retrieving relevant documents/chunks.vectorstore.py: Abstract layer for interacting with underlying vector databases (e.g., Chroma, Pinecone).
🧠 Usage Example: Ingestion and Retrieval Flow
The typical workflow involves three main stages: Load -> Index -> Query.
Step 1: Document Loading & Processing
Raw documents are loaded and parsed into structured Document objects.
from pydantic_ai_ragstack import documents, ingestion
# Assume 'path/to/document.pdf' exists
raw_docs = documents.load("path/to/document.pdf")
structured_data = raw_docs # Contains list[Document]
Step 2: Indexing (Embedding and Storing)
The structured data is chunked, embedded, and stored in the vector store.
from pydantic_ai_ragstack import ingestion
# Assuming an embedding model is configured
status = ingestion.index_documents(structured_data, embedder=MyEmbeddings())
if status == "done":
print("Indexing complete. Data is ready for retrieval.")
Step 3: Retrieval and Question Answering (RAG)
A query is executed against the indexed documents to retrieve relevant context before generating a final answer using an LLM (not included in this core module).
from pydantic_ai_ragstack import retrieval
# Query the system
query = "What are the main features of the RAG framework?"
search_results = retrieval.search(query, collection_name="my_knowledge_base")
if search_results:
print("--- Retrieved Context ---")
for result in search_results:
print(f"Score: {result.score:.2f}\nContent: {result.content[:100]}...")
# Pass context and original query to an LLM call here
🧪 Testing
To run unit tests, use pytest:
pip install pytest
pytest tests/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydantic_ai_ragstack-0.1.0.tar.gz.
File metadata
- Download URL: pydantic_ai_ragstack-0.1.0.tar.gz
- Upload date:
- Size: 339.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e656672599fae303a8537f8ab0234ac51ca4403baa79a68ecefee72ad8908754
|
|
| MD5 |
9598674c10a80c5fd1fab537cc21d053
|
|
| BLAKE2b-256 |
6b6d62ae8912793f46187712c64bf90bd5d0faaf561f14be897f705fc23e017b
|
File details
Details for the file pydantic_ai_ragstack-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pydantic_ai_ragstack-0.1.0-py3-none-any.whl
- Upload date:
- Size: 43.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88966734089a403e480fa58184312120e3b5dbd29e6a4cda97194272d2844ad5
|
|
| MD5 |
077d026fdd585349c81f96c888c45c99
|
|
| BLAKE2b-256 |
2f288e481f8062f2cb880f3e1c294465b5d6a13912ca4f0f3d5df28021ddef11
|