Skip to main content

Practical, robust structured generation with retries using Pydantic schemas

Project description

schema_agent

Practical, robust structured generation for LLMs using Pydantic schemas. Provide a schema, a prompt, and a model; get back a validated BaseModel instance with automatic retries when validation fails.

Note: This is minimalist experimental package, and it does nearly the same as Instructor, with some slight differences in implementation design. However, if you need this for production, I recommend using Instructor.

Screenshot 2025-09-22 at 13 24 17

Features

  • Schema-first: define your output as a Pydantic model
  • Automatic retries: validates via a tool call and re-prompts on failure
  • Provider-agnostic: accepts LangChain-compatible models or a provider string (e.g. "openai:gpt-4o-mini")
  • Strong typing: returns a Pydantic instance alongside raw agent traces
  • Simple API: one function generate_with_schema(...)

Usage

Install from PyPI:

pip install schema_agent
# Optional OpenAI support (needed to run scripts/demo.py as-is)
pip install "schema_agent[openai]"

Basic example:

from pydantic import BaseModel, Field
from schema_agent import generate_with_schema

class Person(BaseModel):
    name: str = Field(description="Full name")
    age: int = Field(description="Age in years")

resp = generate_with_schema(
    user_prompt="Hos name was John Doe and he was 42 years old",
    llm="openai:gpt-4o-mini",   # or pass a LangChain model instance
    schema=Person,
    max_retries=2,
)

# Validated Pydantic instance
print(resp["output"])        # -> Person(name='John Doe', age=42)
print(resp["success"])      # -> True/False
print(resp["retries"])      # -> number of retries performed

With a LangChain model object:

from langchain_openai import ChatOpenAI
from schema_agent import generate_with_schema

llm = ChatOpenAI(model="gpt-4o-mini")
resp = generate_with_schema(
    user_prompt="Hos name was John Doe and he was 42 years old",
    llm=llm,
    schema=Person,
    max_retries=2,
)

With a validation callback (example that extracts a phone number from a large text):

def validate_output(x: str | dict) -> None:
    if x["name"] != "John Doe":
        raise ValueError("Name is not John Doe")

class PhoneNumber(BaseModel):
    phone_number: str = Field(description="Phone number")

def check_phone_number_in_data(x: str | dict) -> None:
    if x["phone_number"] not in large_text:
        raise ValueError("Extracted attribute 'phone_number' not found in data")

resp = generate_with_schema(
    user_prompt=large_text,  # large text that contains a phone number
    llm=llm,
    schema=Person,
    max_retries=2,
    validation_callback=check_phone_number_in_data,
)

Run the demo script:

pixi run demo

Notes:

  • Set OPENAI_API_KEY in your environment if using OpenAI (e.g., via a .env file when installing the openai extra).
  • On unexpected tool errors the call raises an exception; expected validation failures are retried up to max_retries.

Project Structure

  • schema_agent/: Package logic
    • llm.py: generate_with_schema agent orchestration and validation tool
    • str.py: schema-to-example string utilities
    • utils.py, errors.py, consts.py, types.py: helpers, exceptions, prompts, typings
  • tests/: Unit tests for all modules
  • scripts/: demo.py script

Development

This package has been created with pymc-labs/project-starter. It features:

  • 📦 pixi for dependency and environment management.
  • 🧹 pre-commit for formatting, spellcheck, etc. If everyone uses the same standard formatting, then PRs won't have flaky formatting updates that distract from the actual contribution. Reviewing code will be much easier.
  • 🧪 pytest for testing.
  • 🔄 Github Actions for running the pre-commit checks on each PR, automated testing and dependency management (dependabot). Merges to main publish to PyPI via trusted publishing.

Prerequisites

Get started

  1. Run pixi install to install the dependencies.
  2. Run pixi r test to run the tests.
  3. Run pre-commit install to set up pre-commit hooks.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schema_agent-0.1.4.tar.gz (55.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

schema_agent-0.1.4-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file schema_agent-0.1.4.tar.gz.

File metadata

  • Download URL: schema_agent-0.1.4.tar.gz
  • Upload date:
  • Size: 55.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for schema_agent-0.1.4.tar.gz
Algorithm Hash digest
SHA256 19cb8fad144b7b2bd36dc5fdf7739d6bf375ed229e33cc9a3ce4c33f213614d4
MD5 49ccafd94aec7351dde6e6d47c1f9ceb
BLAKE2b-256 5ec4b49f62b4fdc967c1ab688985bc4dfbedf75d94a188a52e0d63e81b56ab6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for schema_agent-0.1.4.tar.gz:

Publisher: publish.yml on ulfaslak/schema-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file schema_agent-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: schema_agent-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for schema_agent-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 0c88ef2eb75efb6a58c7f96a83dea26278c03cf71859535a153591e9e16ad9cb
MD5 772a4bb5e047812484b9fa97defbe074
BLAKE2b-256 025174aef56914891632206702dd225393ed97c769ed2fac3909339bdd016596

See more details on using hashes here.

Provenance

The following attestation bundles were made for schema_agent-0.1.4-py3-none-any.whl:

Publisher: publish.yml on ulfaslak/schema-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page