Batteries-included agent harness for Python — tool-calling, sandboxed execution, multi-agent teams, and unlimited context on Pydantic AI
Project description
Pydantic Deep Agents
The batteries-included deep agent harness for Python.
Terminal AI assistant out of the box — or build production agents with one function call.
Docs · PyPI · CLI · Framework · DeepResearch · Examples
What's New
- 2026-04-12 v0.3.8 — Stuck loop detection, context limit warnings for the model, expanded context file discovery (CLAUDE.md, .cursorrules, etc.), eviction & orphan repair migrated to capabilities hooks.
- 2026-04-11 v0.3.6 — One-command installer + self-update:
curl -fsSL .../install.sh | bashinstalls everything automatically. Newpydantic-deep updatecommand. Startup update notifications with 24-hour PyPI cache. - 2026-04-10 v0.3.5 — Headless runner (
pydantic-deep run), Docker sandbox with named workspaces, browser automation via Playwright, Harbor adapter for Terminal Bench evaluation.
Full history: CHANGELOG.md
The Agent Harness
Pydantic Deep Agents is an agent harness — the complete infrastructure that wraps an LLM and makes it a functional autonomous agent. The model provides intelligence; the harness provides planning, tools, memory, sandboxed execution, and unlimited context.
| 🔧 Tool-calling | File read/write/edit, shell execution, glob, grep, web search, web fetch, browser automation — wired up and ready. |
| 🧠 Persistent memory | MEMORY.md persists across sessions. Auto-injected into the system prompt. Each agent has isolated memory by default. |
| ♾️ Unlimited context | Auto-summarization when approaching the token budget. LLM-based or zero-cost sliding window. Never hits a context wall. |
| 🤝 Multi-agent / swarm | Spawn subagents for parallel workstreams. Shared TODO lists with claiming. Peer-to-peer message bus. Full team coordination. |
| 🐳 Sandboxed execution | Docker sandbox with named workspaces. Installed packages persist between sessions. Project dir mounted at /workspace. |
| 🗂️ Plan Mode | Dedicated planner subagent asks clarifying questions and structures the work before execution begins. Headless-compatible. |
| 🔖 Checkpoints | Save conversation state at any point. Rewind to any checkpoint. Fork sessions to explore alternative approaches. |
| 📚 Skills system | Domain-specific knowledge loaded on demand from SKILL.md files. Built-in skills: code-review, refactor, test-writer, git-workflow, and more. |
| 🔌 MCP | Connect any Model Context Protocol server via pydantic-ai's native MCP capability. |
| ⚡ Lifecycle hooks | Claude Code-style PRE_TOOL_USE / POST_TOOL_USE hooks. Shell commands or Python handlers. Audit logging, safety gates. |
| 📐 Structured output | Type-safe Pydantic model responses via output_type. No JSON parsing. No dict["key"]. Full IDE autocomplete. |
| 🔄 Stuck loop detection | Detects repeated identical tool calls, A-B-A-B alternating patterns, and no-op calls. Warns the model or stops the run. |
| ⚠️ Context limit warnings | Model receives URGENT/CRITICAL warnings when approaching context limits (70%), well before auto-compression (90%). |
| 💰 Cost tracking | Real-time token and USD cost tracking per run and cumulative. Hard budget limits with BudgetExceededError. |
| ✨ Self-improving | /improve analyzes past sessions and proposes updates to MEMORY.md, SOUL.md, and AGENTS.md. |
| 🏷️ 100% type-safe | Pyright strict + MyPy strict. 100% test coverage. Every public API is fully typed — safe to use in production. |
Built natively on pydantic-ai — uses the Capabilities API directly, inherits all pydantic-ai streaming, multi-model support, and Pydantic validation automatically.
🖥️ CLI — Terminal AI Assistant
A Claude Code-style terminal AI assistant that works with any model and any provider.
Install (macOS & Linux)
curl -fsSL https://raw.githubusercontent.com/vstorm-co/pydantic-deep/main/install.sh | bash
No Python setup required — the script installs uv and the CLI automatically. Then:
export ANTHROPIC_API_KEY=sk-ant-...
pydantic-deep
Windows / manual:
pip install "pydantic-deep[cli]"· Update:pydantic-deep update
Model & Provider Support
Works with any model that supports tool-calling:
| Provider | Example models |
|---|---|
| Anthropic | anthropic:claude-opus-4-6, claude-sonnet-4-6 |
| OpenAI | openai:gpt-5.4, gpt-4.1 |
| OpenRouter | openrouter:anthropic/claude-opus-4-6 (200+ models) |
| Google Gemini | google-gla:gemini-2.5-pro |
| Ollama (local) | ollama:qwen3, ollama:llama3.3 |
| Any OpenAI-compatible | Custom base URL via env |
Switch model anytime: pydantic-deep config set model openai:gpt-5.4 or /model in the TUI.
What you get in the TUI
| Feature | |
|---|---|
| 💬 | Streaming chat with tool call visualization |
| 📁 | File read / write / edit, shell execution, glob, grep |
| 🧠 | Persistent memory and self-improvement across sessions |
| 🗂️ | Task planning, plan mode, and subagent delegation |
| ♾️ | Context compression for unlimited conversations |
| 🔖 | Checkpoints — save, rewind, and fork any session |
| 🌐 | Web search & fetch built-in |
| 🖥️ | Browser automation via Playwright (--browser) |
| 🐳 | Docker sandbox — sandboxed execution with named workspaces |
| 💭 | Extended thinking — minimal / low / medium / high / xhigh |
| 💰 | Real-time cost and token tracking per session |
| 🛡️ | Tool approval dialogs — approve, auto-approve, or deny per tool call |
| @ | @filename file references · !command shell passthrough |
| ✨ | /improve, /skills, /diff, /model, /theme, /compact, and more |
Usage
# Interactive TUI (default)
pydantic-deep
pydantic-deep tui --model openrouter:anthropic/claude-opus-4-6
# Headless deep agent — benchmarks, CI/CD, scripted automation
pydantic-deep run "Fix the failing test in test_auth.py"
pydantic-deep run --task-file task.md --json
pydantic-deep run "Refactor utils.py" --no-web-search --thinking false
# Docker sandbox — sandboxed execution, project dir mounted at /workspace
pydantic-deep tui --sandbox docker
pydantic-deep tui --workspace ml-env # named workspace, packages persist
# Browser automation (requires pydantic-deep[browser])
pydantic-deep tui --browser
pydantic-deep run "Go to example.com and summarize the content" --browser
# Config & skills
pydantic-deep config set model anthropic:claude-sonnet-4-6
pydantic-deep skills list
pydantic-deep update # update to latest version
See CLI docs for the full reference.
🐍 Framework — Build Your Own Agent
pip install pydantic-deep
One function call gives you a production deep agent with planning, tool-calling, multi-agent delegation, persistent memory, unlimited context, and cost tracking. Everything is a toggle:
from pydantic_ai_backends import StateBackend
from pydantic_deep import create_deep_agent, create_default_deps
agent = create_deep_agent(
model="anthropic:claude-sonnet-4-6",
include_todo=True, # Task planning with subtasks and dependencies
include_subagents=True, # Multi-agent swarm — delegate to subagents
include_skills=True, # Domain-specific skills from SKILL.md files
include_memory=True, # Persistent memory across sessions
include_plan=True, # Structured planning before execution
include_teams=True, # Agent teams with shared TODO lists + message bus
web_search=True, # Tool-calling: web search
web_fetch=True, # Tool-calling: web fetch
thinking="high", # Extended thinking / reasoning effort
context_manager=True, # Unlimited context via auto-summarization
cost_tracking=True, # Token/USD budget enforcement
include_checkpoints=True, # Save, rewind, and fork conversations
)
deps = create_default_deps(StateBackend())
result = await agent.run("Build a REST API for user auth", deps=deps)
Structured Output
Type-safe responses with Pydantic models — no JSON parsing, no dict["key"]:
from pydantic import BaseModel
class CodeReview(BaseModel):
summary: str
issues: list[str]
score: int
agent = create_deep_agent(output_type=CodeReview)
result = await agent.run("Review the auth module", deps=deps)
print(result.output.score) # fully typed
Multi-Agent Swarm
Spawn isolated subagents for parallel workstreams. Each subagent is a full deep agent with its own tool-calling, memory, and context:
agent = create_deep_agent(
subagents=[
{
"name": "researcher",
"description": "Researches topics using web search",
"instructions": "Search the web, synthesize findings, cite sources.",
},
{
"name": "code-reviewer",
"description": "Reviews code for quality, security, and performance",
"instructions": "Check for security issues, N+1 queries, missing tests...",
},
],
)
# Main agent delegates: task(description="Review auth.py", subagent_type="code-reviewer")
Unlimited Context
Auto-summarization keeps long-running agents within the token budget:
from pydantic_deep import create_summarization_processor
processor = create_summarization_processor(
trigger=("tokens", 100000), # compress at 100k tokens
keep=("messages", 20), # keep last 20 messages verbatim
)
agent = create_deep_agent(history_processors=[processor])
Claude Code-Style Lifecycle Hooks
from pydantic_deep import Hook, HookEvent
agent = create_deep_agent(
hooks=[
Hook(
event=HookEvent.PRE_TOOL_USE,
command="echo 'Tool: $TOOL_NAME args: $TOOL_INPUT' >> /tmp/audit.log",
),
],
)
MCP Servers
from pydantic_ai.capabilities import MCP
agent = create_deep_agent(
capabilities=[MCP(url="https://mcp.example.com/api")],
)
Context Files
Pydantic Deep Agents auto-discovers and injects project-specific context into every conversation:
| File | Purpose | Who Sees It |
|---|---|---|
AGENTS.md |
Project conventions, architecture, instructions | Main agent + all subagents |
CLAUDE.md |
Claude Code project instructions | Main agent + all subagents |
SOUL.md |
Agent personality, style, communication preferences | Main agent only |
.cursorrules |
Cursor editor conventions | Main agent only |
.github/copilot-instructions.md |
GitHub Copilot instructions | Main agent only |
CONVENTIONS.md |
Project coding conventions | Main agent only |
CODING_GUIDELINES.md |
Coding guidelines | Main agent only |
MEMORY.md |
Persistent memory — read/write/update tools | Per-agent (isolated) |
Compatible with Claude Code, Cursor, GitHub Copilot, and other agent frameworks. AGENTS.md follows the agents.md spec.
See the full API reference for all options.
🔬 DeepResearch — Reference App
A full-featured research deep agent with web UI — built entirely on Pydantic Deep Agents.
Web search (Tavily, Brave, Jina), sandboxed code execution, Excalidraw diagrams, plan mode, report export.
cd apps/deepresearch && uv sync && cp .env.example .env
uv run deepresearch # → http://localhost:8080
See apps/deepresearch/README.md for full setup.
Architecture
Pydantic Deep Agents uses pydantic-ai's native Capabilities API for all cross-cutting concerns — hooks, memory, skills, context files, teams, and plan mode are all first-class pydantic-ai capabilities.
Capabilities
| Capability | Package | What It Does |
|---|---|---|
| CostTracking | pydantic-ai-shields | Token/USD budget enforcement and real-time cost callbacks |
| ContextManagerCapability | summarization-pydantic-ai | Unlimited context via auto-summarization |
| LimitWarnerCapability | summarization-pydantic-ai | URGENT/CRITICAL warnings when context limits approach |
| StuckLoopDetection | pydantic-deep | Detects and breaks repetitive agent loops |
| EvictionCapability | pydantic-deep | Intercepts large tool outputs before they enter history |
| PatchToolCallsCapability | pydantic-deep | Fixes orphaned tool calls/results in history |
| HooksCapability | pydantic-deep | Claude Code-style PRE/POST_TOOL_USE lifecycle hooks |
| CheckpointMiddleware | pydantic-deep | Save, rewind, and fork conversation state |
| WebSearch / WebFetch | pydantic-ai built-in | Tool-calling: web search and URL fetching |
| SkillsCapability | pydantic-deep | Domain-specific skills from SKILL.md files |
| MemoryCapability | pydantic-deep | Persistent memory across sessions |
| TeamCapability | pydantic-deep | Multi-agent swarm — shared TODOs, message bus |
| PlanCapability | pydantic-deep | Structured planning before execution |
Modular Packages
Every component is a standalone package — use only what you need:
| Package | What It Does |
|---|---|
| pydantic-ai-backend | File storage, Docker sandbox, console toolset |
| pydantic-ai-todo | Task planning with subtasks and dependencies |
| subagents-pydantic-ai | Sync/async delegation, background tasks, cancellation |
| summarization-pydantic-ai | LLM summaries or zero-cost sliding window |
| pydantic-ai-shields | Cost tracking, input/output/tool blocking |
Pydantic Deep Agents
+---------------------------------------------------------------------+
| |
| +----------+ +----------+ +----------+ +----------+ +---------+ |
| | Planning | |Filesystem| | Subagents| | Skills | | Teams | |
| +----+-----+ +----+-----+ +----+-----+ +----+-----+ +----+----+ |
| | | | | | |
| +------------+-----+------+------------+------------+ |
| | |
| v |
| Summarization --> +------------------+ <-- Capabilities |
| Checkpointing --> | Deep Agent | <-- Hooks |
| Cost Tracking --> | (pydantic-ai) | <-- Memory |
| Loop Detect --> | | <-- Limit Warner |
| +--------+---------+ |
| | |
| +-----------------+-----------------+ |
| v v v |
| +------------+ +------------+ +------------+ |
| | State | | Local | | Docker | |
| | Backend | | Backend | | Sandbox | |
| +------------+ +------------+ +------------+ |
| |
+---------------------------------------------------------------------+
Full Feature List
Expand
Tool-Calling
ls,read_file,write_file,edit_file,glob,grep,execute— full filesystem access- Docker sandbox with named workspaces — sandboxed execution, packages persist between sessions
- Web search (DuckDuckGo, Tavily, Brave) and web fetch
- Browser automation via Playwright —
navigate,click,type_text,screenshot,execute_js, and more
Deep Agent Architecture
- Planning — Task tracking with subtasks, dependencies, and cycle detection
- Subagents / Multi-agent swarm — Sync/async delegation, background task management, soft/hard cancellation
- Agent Teams — Shared TODO lists with claiming and dependency tracking, peer-to-peer message bus
- Plan Mode — Dedicated planner subagent for structured planning before execution
- Persistent memory — MEMORY.md that persists across sessions, auto-injected into system prompt
- Self-improving —
/improveanalyzes past sessions, proposes updates to context files
Context & Memory
- Unlimited context — Auto-summarization when approaching token budget (LLM-based or sliding window)
- Context limit warnings — Model receives URGENT/CRITICAL messages when approaching 70% context usage
- Eviction capability — Intercepts large tool outputs via
after_tool_executebefore they enter history - Context files — Auto-discover and inject AGENTS.md, CLAUDE.md, SOUL.md, .cursorrules, copilot-instructions, CONVENTIONS.md, CODING_GUIDELINES.md
- Checkpoints — Save state, rewind or fork conversations. In-memory and file-based stores. Per-run isolation via
for_run()
Reliability
- Stuck loop detection — Detects repeated identical calls, A-B-A-B alternating, and no-op patterns. Warns or stops the agent
- Orphan repair — Fixes orphaned tool calls/results in conversation history before each model request
- Context limit warnings — Injects URGENT/CRITICAL messages so the model knows to wrap up
Production Features
- MCP — Connect any Model Context Protocol server
- Lifecycle hooks — Claude Code-style PRE/POST_TOOL_USE. Shell commands or Python handlers
- Structured output — Type-safe responses with Pydantic models via
output_type - Cost tracking — Token/USD budgets with automatic enforcement and real-time callbacks
- Streaming — Full streaming support for real-time responses
- Image support — Multi-modal analysis with image inputs
- Human-in-the-loop — Confirmation workflows for sensitive operations
- Output styles — Built-in (concise, explanatory, formal, conversational) or custom
CLI
- Interactive TUI (Textual) with streaming, tool visualization, session management
- Headless runner (
pydantic-deep run) for CI/CD, benchmarks, scripted automation - 20+ slash commands:
/improve,/compact,/diff,/model,/provider,/skills,/theme, and more @filenamefile references,!commandshell passthrough- Tool approval dialogs with auto-approve
- Debug logging per session
Contributing
git clone https://github.com/vstorm-co/pydantic-deepagents.git
cd pydantic-deepagents
make install
make test # 100% coverage required
make all # lint + typecheck + test
Vstorm OSS Ecosystem
pydantic-deepagents is part of a broader open-source ecosystem for production AI agents:
| Project | Description | |
|---|---|---|
| full-stack-ai-agent-template | Zero to production AI app in 30 minutes. FastAPI + Next.js 15, 6 AI frameworks (incl. pydantic-deep), RAG pipeline, 75+ config options. | |
| pydantic-ai-shields | Drop-in guardrails for Pydantic AI agents. 5 infra + 5 content shields. | |
| pydantic-ai-subagents | Declarative multi-agent orchestration with token tracking. | |
| pydantic-ai-summarization | Smart context compression for long-running agents. | |
| pydantic-ai-backend | Sandboxed execution for AI agents. Docker + Daytona. | |
| content-skills | Claude Code content studio — blog, social, slides, video, infographics — all brand-aware. | |
| production-stack-skills | Claude Code skills for production-grade FastAPI, PostgreSQL, Docker, and observability. |
Want the full stack? Use full-stack-ai-agent-template — it ships pydantic-deep integrated with FastAPI, Next.js, auth, WebSocket streaming, and RAG out of the box.
Browse all projects at oss.vstorm.co
Star History
License
MIT — see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydantic_deep-0.3.14.tar.gz.
File metadata
- Download URL: pydantic_deep-0.3.14.tar.gz
- Upload date:
- Size: 22.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf50141ea90e6657571f488cacccb30d6499c89cf319e9789e77707be5ee6dd5
|
|
| MD5 |
4c9a64ab7e79fc85d27a958b076f8b48
|
|
| BLAKE2b-256 |
af0079cd2a9b332f7f17e8ddf7bd78f7f2cba6bb058a0c93433ea0a709261837
|
Provenance
The following attestation bundles were made for pydantic_deep-0.3.14.tar.gz:
Publisher:
publish.yml on vstorm-co/pydantic-deepagents
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pydantic_deep-0.3.14.tar.gz -
Subject digest:
cf50141ea90e6657571f488cacccb30d6499c89cf319e9789e77707be5ee6dd5 - Sigstore transparency entry: 1319296204
- Sigstore integration time:
-
Permalink:
vstorm-co/pydantic-deepagents@b1883f276a9baf672ed004dafba9fca7beecb960 -
Branch / Tag:
refs/tags/0.3.14 - Owner: https://github.com/vstorm-co
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b1883f276a9baf672ed004dafba9fca7beecb960 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pydantic_deep-0.3.14-py3-none-any.whl.
File metadata
- Download URL: pydantic_deep-0.3.14-py3-none-any.whl
- Upload date:
- Size: 2.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
676491dcfcbfd7f33b0b0a53a215ab345b651d6bcd647bed0b612a2664e7d3d1
|
|
| MD5 |
875375b0ab50bbd3fc046177551151ef
|
|
| BLAKE2b-256 |
9babfb7723c376adb4539183a3888392d9828272aa4f5628a3a25d0520ed1204
|
Provenance
The following attestation bundles were made for pydantic_deep-0.3.14-py3-none-any.whl:
Publisher:
publish.yml on vstorm-co/pydantic-deepagents
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pydantic_deep-0.3.14-py3-none-any.whl -
Subject digest:
676491dcfcbfd7f33b0b0a53a215ab345b651d6bcd647bed0b612a2664e7d3d1 - Sigstore transparency entry: 1319296321
- Sigstore integration time:
-
Permalink:
vstorm-co/pydantic-deepagents@b1883f276a9baf672ed004dafba9fca7beecb960 -
Branch / Tag:
refs/tags/0.3.14 - Owner: https://github.com/vstorm-co
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b1883f276a9baf672ed004dafba9fca7beecb960 -
Trigger Event:
release
-
Statement type: