Skip to main content

A standalone RAG package for Pydantic AI with support for multiple vector stores.

Project description

Pydantic AI RAG Stack

pydantic-ai-ragstack is a robust and structured framework designed for Retrieval Augmented Generation (RAG) systems. It leverages Python's pydantic library to enforce strict data schemas across the entire document ingestion, embedding, storage, and retrieval pipeline. This ensures high reliability and predictability when building AI applications based on proprietary knowledge bases.

🚀 Features

  • Structured Data Modeling: Uses Pydantic models (Document, DocumentPage, etc.) to define explicit schemas for all data components (metadata, content, images).
  • End-to-End Ingestion Pipeline: Handles document loading, chunking, image extraction, and metadata enrichment.
  • Vector Store Integration: Manages interaction with various vector databases (pydantic_ai_ragstack/vectorstore.py).
  • Advanced Retrieval: Includes capabilities for content re-ranking (pydantic_ai_ragstack/reranker.py) to improve search relevance.

🛠️ Installation

This project requires Python 3.8+ and several dependencies (e.g., LangChain, PyMuPDF, vector store clients).

# Assuming requirements.txt exists or needs manual installation
pip install -r requirements.txt

📂 Project Structure Overview

The core logic is organized within the pydantic_ai_ragstack package:

  • models.py: Defines all core data structures (e.g., Document, DocumentPageChunk, SearchResult) using Pydantic models.
  • ingestion.py: Manages the entire lifecycle of ingesting raw documents into structured chunks and vector embeddings.
  • documents.py: Handles document loading and parsing (PDF, etc.).
  • embeddings.py: Manages the generation of vector embeddings for content chunks.
  • retrieval.py: Contains logic for querying the vector store and retrieving relevant documents/chunks.
  • vectorstore.py: Abstract layer for interacting with underlying vector databases (e.g., Chroma, Pinecone).

🧠 Usage Example: Ingestion and Retrieval Flow

The typical workflow involves three main stages: Load -> Index -> Query.

Step 1: Document Loading & Processing

Raw documents are loaded and parsed into structured Document objects.

from pydantic_ai_ragstack import documents, ingestion
# Assume 'path/to/document.pdf' exists
raw_docs = documents.load("path/to/document.pdf")
structured_data = raw_docs # Contains list[Document]

Step 2: Indexing (Embedding and Storing)

The structured data is chunked, embedded, and stored in the vector store.

from pydantic_ai_ragstack import ingestion
# Assuming an embedding model is configured
status = ingestion.index_documents(structured_data, embedder=MyEmbeddings())

if status == "done":
    print("Indexing complete. Data is ready for retrieval.")

Step 3: Retrieval and Question Answering (RAG)

A query is executed against the indexed documents to retrieve relevant context before generating a final answer using an LLM (not included in this core module).

from pydantic_ai_ragstack import retrieval
# Query the system
query = "What are the main features of the RAG framework?"
search_results = retrieval.search(query, collection_name="my_knowledge_base")

if search_results:
    print("--- Retrieved Context ---")
    for result in search_results:
        print(f"Score: {result.score:.2f}\nContent: {result.content[:100]}...")
    
    # Pass context and original query to an LLM call here

🧪 Testing

To run unit tests, use pytest:

pip install pytest
pytest tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantic_ai_ragstack-0.1.0.tar.gz (339.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydantic_ai_ragstack-0.1.0-py3-none-any.whl (43.9 kB view details)

Uploaded Python 3

File details

Details for the file pydantic_ai_ragstack-0.1.0.tar.gz.

File metadata

  • Download URL: pydantic_ai_ragstack-0.1.0.tar.gz
  • Upload date:
  • Size: 339.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pydantic_ai_ragstack-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e656672599fae303a8537f8ab0234ac51ca4403baa79a68ecefee72ad8908754
MD5 9598674c10a80c5fd1fab537cc21d053
BLAKE2b-256 6b6d62ae8912793f46187712c64bf90bd5d0faaf561f14be897f705fc23e017b

See more details on using hashes here.

File details

Details for the file pydantic_ai_ragstack-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pydantic_ai_ragstack-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 43.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pydantic_ai_ragstack-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 88966734089a403e480fa58184312120e3b5dbd29e6a4cda97194272d2844ad5
MD5 077d026fdd585349c81f96c888c45c99
BLAKE2b-256 2f288e481f8062f2cb880f3e1c294465b5d6a13912ca4f0f3d5df28021ddef11

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page