Skip to main content

Batteries-included agent harness for Python — tool-calling, sandboxed execution, multi-agent teams, and unlimited context on Pydantic AI

Project description

Pydantic Deep Agents

Pydantic Deep Agents CLI demo

The batteries-included deep agent harness for Python.
Terminal AI assistant out of the box — or build production agents with one function call.

Docs · PyPI · CLI · Framework · DeepResearch · Examples

PyPI version Python 3.10+ License: MIT Coverage Status CI Pydantic AI


What's New

  • 2026-04-12  v0.3.8 — Stuck loop detection, context limit warnings for the model, expanded context file discovery (CLAUDE.md, .cursorrules, etc.), eviction & orphan repair migrated to capabilities hooks.
  • 2026-04-11  v0.3.6 — One-command installer + self-update: curl -fsSL .../install.sh | bash installs everything automatically. New pydantic-deep update command. Startup update notifications with 24-hour PyPI cache.
  • 2026-04-10  v0.3.5 — Headless runner (pydantic-deep run), Docker sandbox with named workspaces, browser automation via Playwright, Harbor adapter for Terminal Bench evaluation.

Full history: CHANGELOG.md


The Agent Harness

Pydantic Deep Agents is an agent harness — the complete infrastructure that wraps an LLM and makes it a functional autonomous agent. The model provides intelligence; the harness provides planning, tools, memory, sandboxed execution, and unlimited context.

🔧 Tool-calling File read/write/edit, shell execution, glob, grep, web search, web fetch, browser automation — wired up and ready.
🧠 Persistent memory MEMORY.md persists across sessions. Auto-injected into the system prompt. Each agent has isolated memory by default.
♾️ Unlimited context Auto-summarization when approaching the token budget. LLM-based or zero-cost sliding window. Never hits a context wall.
🤝 Multi-agent / swarm Spawn subagents for parallel workstreams. Shared TODO lists with claiming. Peer-to-peer message bus. Full team coordination.
🐳 Sandboxed execution Docker sandbox with named workspaces. Installed packages persist between sessions. Project dir mounted at /workspace.
🗂️ Plan Mode Dedicated planner subagent asks clarifying questions and structures the work before execution begins. Headless-compatible.
🔖 Checkpoints Save conversation state at any point. Rewind to any checkpoint. Fork sessions to explore alternative approaches.
📚 Skills system Domain-specific knowledge loaded on demand from SKILL.md files. Built-in skills: code-review, refactor, test-writer, git-workflow, and more.
🔌 MCP Connect any Model Context Protocol server via pydantic-ai's native MCP capability.
⚡ Lifecycle hooks Claude Code-style PRE_TOOL_USE / POST_TOOL_USE hooks. Shell commands or Python handlers. Audit logging, safety gates.
📐 Structured output Type-safe Pydantic model responses via output_type. No JSON parsing. No dict["key"]. Full IDE autocomplete.
🔄 Stuck loop detection Detects repeated identical tool calls, A-B-A-B alternating patterns, and no-op calls. Warns the model or stops the run.
⚠️ Context limit warnings Model receives URGENT/CRITICAL warnings when approaching context limits (70%), well before auto-compression (90%).
💰 Cost tracking Real-time token and USD cost tracking per run and cumulative. Hard budget limits with BudgetExceededError.
✨ Self-improving /improve analyzes past sessions and proposes updates to MEMORY.md, SOUL.md, and AGENTS.md.
🏷️ 100% type-safe Pyright strict + MyPy strict. 100% test coverage. Every public API is fully typed — safe to use in production.

Built natively on pydantic-ai — uses the Capabilities API directly, inherits all pydantic-ai streaming, multi-model support, and Pydantic validation automatically.


🖥️ CLI — Terminal AI Assistant

A Claude Code-style terminal AI assistant that works with any model and any provider.

Install (macOS & Linux)

curl -fsSL https://raw.githubusercontent.com/vstorm-co/pydantic-deep/main/install.sh | bash

No Python setup required — the script installs uv and the CLI automatically. Then:

export ANTHROPIC_API_KEY=sk-ant-...
pydantic-deep

Windows / manual: pip install "pydantic-deep[cli]"  ·  Update: pydantic-deep update

Model & Provider Support

Works with any model that supports tool-calling:

Provider Example models
Anthropic anthropic:claude-opus-4-6, claude-sonnet-4-6
OpenAI openai:gpt-5.4, gpt-4.1
OpenRouter openrouter:anthropic/claude-opus-4-6 (200+ models)
Google Gemini google-gla:gemini-2.5-pro
Ollama (local) ollama:qwen3, ollama:llama3.3
Any OpenAI-compatible Custom base URL via env

Switch model anytime: pydantic-deep config set model openai:gpt-5.4 or /model in the TUI.

What you get in the TUI

Feature
💬 Streaming chat with tool call visualization
📁 File read / write / edit, shell execution, glob, grep
🧠 Persistent memory and self-improvement across sessions
🗂️ Task planning, plan mode, and subagent delegation
♾️ Context compression for unlimited conversations
🔖 Checkpoints — save, rewind, and fork any session
🌐 Web search & fetch built-in
🖥️ Browser automation via Playwright (--browser)
🐳 Docker sandbox — sandboxed execution with named workspaces
💭 Extended thinking — minimal / low / medium / high / xhigh
💰 Real-time cost and token tracking per session
🛡️ Tool approval dialogs — approve, auto-approve, or deny per tool call
@ @filename file references · !command shell passthrough
/improve, /skills, /diff, /model, /theme, /compact, and more

Usage

# Interactive TUI (default)
pydantic-deep
pydantic-deep tui --model openrouter:anthropic/claude-opus-4-6

# Headless deep agent — benchmarks, CI/CD, scripted automation
pydantic-deep run "Fix the failing test in test_auth.py"
pydantic-deep run --task-file task.md --json
pydantic-deep run "Refactor utils.py" --no-web-search --thinking false

# Docker sandbox — sandboxed execution, project dir mounted at /workspace
pydantic-deep tui --sandbox docker
pydantic-deep tui --workspace ml-env     # named workspace, packages persist

# Browser automation (requires pydantic-deep[browser])
pydantic-deep tui --browser
pydantic-deep run "Go to example.com and summarize the content" --browser

# Config & skills
pydantic-deep config set model anthropic:claude-sonnet-4-6
pydantic-deep skills list
pydantic-deep update                     # update to latest version

See CLI docs for the full reference.


🐍 Framework — Build Your Own Agent

pip install pydantic-deep

One function call gives you a production deep agent with planning, tool-calling, multi-agent delegation, persistent memory, unlimited context, and cost tracking. Everything is a toggle:

from pydantic_ai_backends import StateBackend
from pydantic_deep import create_deep_agent, create_default_deps

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    include_todo=True,          # Task planning with subtasks and dependencies
    include_subagents=True,     # Multi-agent swarm — delegate to subagents
    include_skills=True,        # Domain-specific skills from SKILL.md files
    include_memory=True,        # Persistent memory across sessions
    include_plan=True,          # Structured planning before execution
    include_teams=True,         # Agent teams with shared TODO lists + message bus
    web_search=True,            # Tool-calling: web search
    web_fetch=True,             # Tool-calling: web fetch
    thinking="high",            # Extended thinking / reasoning effort
    context_manager=True,       # Unlimited context via auto-summarization
    cost_tracking=True,         # Token/USD budget enforcement
    include_checkpoints=True,   # Save, rewind, and fork conversations
)

deps = create_default_deps(StateBackend())
result = await agent.run("Build a REST API for user auth", deps=deps)

Structured Output

Type-safe responses with Pydantic models — no JSON parsing, no dict["key"]:

from pydantic import BaseModel

class CodeReview(BaseModel):
    summary: str
    issues: list[str]
    score: int

agent = create_deep_agent(output_type=CodeReview)
result = await agent.run("Review the auth module", deps=deps)
print(result.output.score)  # fully typed

Multi-Agent Swarm

Spawn isolated subagents for parallel workstreams. Each subagent is a full deep agent with its own tool-calling, memory, and context:

agent = create_deep_agent(
    subagents=[
        {
            "name": "researcher",
            "description": "Researches topics using web search",
            "instructions": "Search the web, synthesize findings, cite sources.",
        },
        {
            "name": "code-reviewer",
            "description": "Reviews code for quality, security, and performance",
            "instructions": "Check for security issues, N+1 queries, missing tests...",
        },
    ],
)
# Main agent delegates: task(description="Review auth.py", subagent_type="code-reviewer")

Unlimited Context

Auto-summarization keeps long-running agents within the token budget:

from pydantic_deep import create_summarization_processor

processor = create_summarization_processor(
    trigger=("tokens", 100000),  # compress at 100k tokens
    keep=("messages", 20),       # keep last 20 messages verbatim
)
agent = create_deep_agent(history_processors=[processor])

Claude Code-Style Lifecycle Hooks

from pydantic_deep import Hook, HookEvent

agent = create_deep_agent(
    hooks=[
        Hook(
            event=HookEvent.PRE_TOOL_USE,
            command="echo 'Tool: $TOOL_NAME args: $TOOL_INPUT' >> /tmp/audit.log",
        ),
    ],
)

MCP Servers

from pydantic_ai.capabilities import MCP

agent = create_deep_agent(
    capabilities=[MCP(url="https://mcp.example.com/api")],
)

Context Files

Pydantic Deep Agents auto-discovers and injects project-specific context into every conversation:

File Purpose Who Sees It
AGENTS.md Project conventions, architecture, instructions Main agent + all subagents
CLAUDE.md Claude Code project instructions Main agent + all subagents
SOUL.md Agent personality, style, communication preferences Main agent only
.cursorrules Cursor editor conventions Main agent only
.github/copilot-instructions.md GitHub Copilot instructions Main agent only
CONVENTIONS.md Project coding conventions Main agent only
CODING_GUIDELINES.md Coding guidelines Main agent only
MEMORY.md Persistent memory — read/write/update tools Per-agent (isolated)

Compatible with Claude Code, Cursor, GitHub Copilot, and other agent frameworks. AGENTS.md follows the agents.md spec.

See the full API reference for all options.


🔬 DeepResearch — Reference App

A full-featured research deep agent with web UI — built entirely on Pydantic Deep Agents.

Planner subagent asks clarifying questions

Plan Mode — planner asks clarifying questions

Parallel subagent research

Multi-Agent Swarm — 5 subagents researching in parallel

Excalidraw canvas

Excalidraw Canvas — live diagrams synced with agent

File browser

File Browser — workspace files with inline preview

Web search (Tavily, Brave, Jina), sandboxed code execution, Excalidraw diagrams, plan mode, report export.

cd apps/deepresearch && uv sync && cp .env.example .env
uv run deepresearch    # → http://localhost:8080

See apps/deepresearch/README.md for full setup.


Architecture

Pydantic Deep Agents uses pydantic-ai's native Capabilities API for all cross-cutting concerns — hooks, memory, skills, context files, teams, and plan mode are all first-class pydantic-ai capabilities.

Capabilities

Capability Package What It Does
CostTracking pydantic-ai-shields Token/USD budget enforcement and real-time cost callbacks
ContextManagerCapability summarization-pydantic-ai Unlimited context via auto-summarization
LimitWarnerCapability summarization-pydantic-ai URGENT/CRITICAL warnings when context limits approach
StuckLoopDetection pydantic-deep Detects and breaks repetitive agent loops
EvictionCapability pydantic-deep Intercepts large tool outputs before they enter history
PatchToolCallsCapability pydantic-deep Fixes orphaned tool calls/results in history
HooksCapability pydantic-deep Claude Code-style PRE/POST_TOOL_USE lifecycle hooks
CheckpointMiddleware pydantic-deep Save, rewind, and fork conversation state
WebSearch / WebFetch pydantic-ai built-in Tool-calling: web search and URL fetching
SkillsCapability pydantic-deep Domain-specific skills from SKILL.md files
MemoryCapability pydantic-deep Persistent memory across sessions
TeamCapability pydantic-deep Multi-agent swarm — shared TODOs, message bus
PlanCapability pydantic-deep Structured planning before execution

Modular Packages

Every component is a standalone package — use only what you need:

Package What It Does
pydantic-ai-backend File storage, Docker sandbox, console toolset
pydantic-ai-todo Task planning with subtasks and dependencies
subagents-pydantic-ai Sync/async delegation, background tasks, cancellation
summarization-pydantic-ai LLM summaries or zero-cost sliding window
pydantic-ai-shields Cost tracking, input/output/tool blocking
                         Pydantic Deep Agents
+---------------------------------------------------------------------+
|                                                                     |
|   +----------+ +----------+ +----------+ +----------+ +---------+   |
|   | Planning | |Filesystem| | Subagents| |  Skills  | |  Teams  |   |
|   +----+-----+ +----+-----+ +----+-----+ +----+-----+ +----+----+   |
|        |            |            |            |            |        |
|        +------------+-----+------+------------+------------+        |
|                           |                                         |
|                           v                                         |
|  Summarization --> +------------------+ <-- Capabilities            |
|  Checkpointing --> |    Deep Agent    | <-- Hooks                   |
|  Cost Tracking --> |   (pydantic-ai)  | <-- Memory                  |
|  Loop Detect   --> |                  | <-- Limit Warner            |
|                    +--------+---------+                             |
|                             |                                       |
|           +-----------------+-----------------+                     |
|           v                 v                 v                     |
|    +------------+    +------------+    +------------+               |
|    |   State    |    |   Local    |    |   Docker   |               |
|    |  Backend   |    |  Backend   |    |  Sandbox   |               |
|    +------------+    +------------+    +------------+               |
|                                                                     |
+---------------------------------------------------------------------+

Full Feature List

Expand

Tool-Calling

  • ls, read_file, write_file, edit_file, glob, grep, execute — full filesystem access
  • Docker sandbox with named workspaces — sandboxed execution, packages persist between sessions
  • Web search (DuckDuckGo, Tavily, Brave) and web fetch
  • Browser automation via Playwright — navigate, click, type_text, screenshot, execute_js, and more

Deep Agent Architecture

  • Planning — Task tracking with subtasks, dependencies, and cycle detection
  • Subagents / Multi-agent swarm — Sync/async delegation, background task management, soft/hard cancellation
  • Agent Teams — Shared TODO lists with claiming and dependency tracking, peer-to-peer message bus
  • Plan Mode — Dedicated planner subagent for structured planning before execution
  • Persistent memory — MEMORY.md that persists across sessions, auto-injected into system prompt
  • Self-improving/improve analyzes past sessions, proposes updates to context files

Context & Memory

  • Unlimited context — Auto-summarization when approaching token budget (LLM-based or sliding window)
  • Context limit warnings — Model receives URGENT/CRITICAL messages when approaching 70% context usage
  • Eviction capability — Intercepts large tool outputs via after_tool_execute before they enter history
  • Context files — Auto-discover and inject AGENTS.md, CLAUDE.md, SOUL.md, .cursorrules, copilot-instructions, CONVENTIONS.md, CODING_GUIDELINES.md
  • Checkpoints — Save state, rewind or fork conversations. In-memory and file-based stores. Per-run isolation via for_run()

Reliability

  • Stuck loop detection — Detects repeated identical calls, A-B-A-B alternating, and no-op patterns. Warns or stops the agent
  • Orphan repair — Fixes orphaned tool calls/results in conversation history before each model request
  • Context limit warnings — Injects URGENT/CRITICAL messages so the model knows to wrap up

Production Features

  • MCP — Connect any Model Context Protocol server
  • Lifecycle hooks — Claude Code-style PRE/POST_TOOL_USE. Shell commands or Python handlers
  • Structured output — Type-safe responses with Pydantic models via output_type
  • Cost tracking — Token/USD budgets with automatic enforcement and real-time callbacks
  • Streaming — Full streaming support for real-time responses
  • Image support — Multi-modal analysis with image inputs
  • Human-in-the-loop — Confirmation workflows for sensitive operations
  • Output styles — Built-in (concise, explanatory, formal, conversational) or custom

CLI

  • Interactive TUI (Textual) with streaming, tool visualization, session management
  • Headless runner (pydantic-deep run) for CI/CD, benchmarks, scripted automation
  • 20+ slash commands: /improve, /compact, /diff, /model, /provider, /skills, /theme, and more
  • @filename file references, !command shell passthrough
  • Tool approval dialogs with auto-approve
  • Debug logging per session

Contributing

git clone https://github.com/vstorm-co/pydantic-deepagents.git
cd pydantic-deepagents
make install
make test   # 100% coverage required
make all    # lint + typecheck + test

Star History

Star History


License

MIT — see LICENSE


Need help shipping AI agents in production?

We're Vstorm — an Applied Agentic AI Engineering Consultancy
with 30+ production agent implementations. Pydantic Deep Agents is what we build them with.

Talk to us



Made with care by Vstorm

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantic_deep-0.3.9.tar.gz (22.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydantic_deep-0.3.9-py3-none-any.whl (2.1 MB view details)

Uploaded Python 3

File details

Details for the file pydantic_deep-0.3.9.tar.gz.

File metadata

  • Download URL: pydantic_deep-0.3.9.tar.gz
  • Upload date:
  • Size: 22.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pydantic_deep-0.3.9.tar.gz
Algorithm Hash digest
SHA256 f448a74dca5f8af18dd89494edcbe25d2611b5c22616940afca9eff39259611b
MD5 ada09dd94c9ce5af748080326c6e04f8
BLAKE2b-256 7460f38f6a9bdba584a49d2ec88a196d64700d879bb0704198b8642fcb5b864f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pydantic_deep-0.3.9.tar.gz:

Publisher: publish.yml on vstorm-co/pydantic-deepagents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pydantic_deep-0.3.9-py3-none-any.whl.

File metadata

  • Download URL: pydantic_deep-0.3.9-py3-none-any.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pydantic_deep-0.3.9-py3-none-any.whl
Algorithm Hash digest
SHA256 3b1f2a8c7f5cfec2a1176a939695bae6a212da346bde5929ccc4735f465486c8
MD5 1b9d995326de7acfec3ba432eec81d65
BLAKE2b-256 7963356266eeaf2f751848ae3d2b9db81c67c95b25502d7e1e377c596c9e718a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pydantic_deep-0.3.9-py3-none-any.whl:

Publisher: publish.yml on vstorm-co/pydantic-deepagents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page