Skip to main content

Automatic Conversation Summarization and History Management for Pydantic AI

Project description

Context Management for Pydantic AI

Automatic Conversation Summarization and History Management

PyPI version Python 3.10+ License: MIT CI Pydantic AI

Intelligent Summarization — LLM-powered context compression  •  Sliding Window — zero-cost message trimming  •  Limit Warnings — finish-soon guidance before hard caps  •  Context Manager — real-time token tracking + tool truncation  •  Safe Cutoff — preserves tool call pairs


Context Management for Pydantic AI helps your Pydantic AI agents handle long conversations without exceeding model context limits. Choose between intelligent LLM summarization or fast sliding window trimming.

Full framework? Check out Pydantic Deep Agents — complete agent framework with planning, filesystem, subagents, and skills.

Use Cases

What You Want to Build How This Library Helps
Long-Running Agent Automatically compress history when context fills up
Customer Support Bot Preserve key details while discarding routine exchanges
Code Assistant Keep recent code context, summarize older discussions
High-Throughput App Zero-cost sliding window for maximum speed
Cost-Sensitive App Choose between quality (summarization) or free (sliding window)

Installation

pip install summarization-pydantic-ai

Or with uv:

uv add summarization-pydantic-ai

For accurate token counting:

pip install summarization-pydantic-ai[tiktoken]

Quick Start — Capabilities (Recommended)

The recommended way to add context management is via pydantic-ai's native Capabilities API:

from pydantic_ai import Agent
from pydantic_ai_summarization import ContextManagerCapability

agent = Agent(
    "openai:gpt-4.1",
    capabilities=[ContextManagerCapability(max_tokens=100_000)],
)

result = await agent.run("Hello!")

That's it. Your agent now:

  • Tracks token usage on every turn
  • Auto-compresses when approaching the limit (90% by default)
  • Truncates large tool outputs
  • Auto-detects context window size from the model
  • Preserves tool call/response pairs (never breaks them)

Combine with Limit Warnings

from pydantic_ai_summarization import ContextManagerCapability, LimitWarnerCapability

agent = Agent(
    "openai:gpt-4.1",
    capabilities=[
        LimitWarnerCapability(max_iterations=40, max_context_tokens=100_000),
        ContextManagerCapability(max_tokens=100_000),
    ],
)

Alternative: Processor API

For standalone use without capabilities:

from pydantic_ai import Agent
from pydantic_ai_summarization import create_summarization_processor

processor = create_summarization_processor(
    trigger=("tokens", 100000),
    keep=("messages", 20),
)

agent = Agent("openai:gpt-4.1", history_processors=[processor])

Available Processors

Processor LLM Cost Latency Context Preservation
ContextManagerCapability Per compression Low tracking Intelligent summary + tool truncation
SummarizationProcessor High High Intelligent summary
SlidingWindowProcessor Zero ~0ms Discards old messages
LimitWarnerProcessor Zero ~0ms Full history + warning injection

Intelligent Summarization

Uses an LLM to create summaries of older messages:

from pydantic_ai_summarization import create_summarization_processor

processor = create_summarization_processor(
    trigger=("tokens", 100000),  # When to summarize
    keep=("messages", 20),       # What to keep
)

Zero-Cost Sliding Window

Simply discards old messages — no LLM calls:

from pydantic_ai_summarization import create_sliding_window_processor

processor = create_sliding_window_processor(
    trigger=("messages", 100),  # When to trim
    keep=("messages", 50),      # What to keep
)

Limit Warnings

Warn the agent before requests, context usage, or total tokens hit a cap:

from pydantic_ai_summarization import create_limit_warner_processor

processor = create_limit_warner_processor(
    max_iterations=40,
    max_context_tokens=100000,
    max_total_tokens=200000,
)

Context Manager Capability

Full context management with token tracking, auto-compression, and tool output truncation:

from pydantic_ai import Agent
from pydantic_ai_summarization import ContextManagerCapability

agent = Agent(
    "openai:gpt-4.1",
    capabilities=[ContextManagerCapability(
        max_tokens=100_000,
        compress_threshold=0.9,
        max_tool_output_tokens=5000,
    )],
)

Trigger Types

Type Example Description
messages ("messages", 50) Trigger when message count exceeds threshold
tokens ("tokens", 100000) Trigger when token count exceeds threshold
fraction ("fraction", 0.8) Trigger at percentage of max_input_tokens

Keep Types

Type Example Description
messages ("messages", 20) Keep last N messages
tokens ("tokens", 10000) Keep last N tokens worth
fraction ("fraction", 0.2) Keep last N% of context

Advanced Configuration

Multiple Triggers

from pydantic_ai_summarization import SummarizationProcessor

processor = SummarizationProcessor(
    model="openai:gpt-4o",
    trigger=[
        ("messages", 50),    # OR 50+ messages
        ("tokens", 100000),  # OR 100k+ tokens
    ],
    keep=("messages", 10),
)

Fraction-Based

processor = SummarizationProcessor(
    model="openai:gpt-4o",
    trigger=("fraction", 0.8),  # 80% of context window
    keep=("fraction", 0.2),     # Keep last 20%
    max_input_tokens=128000,    # GPT-4's context window
)

Custom Token Counter

def my_token_counter(messages):
    return sum(len(str(msg)) for msg in messages) // 4

processor = create_summarization_processor(
    token_counter=my_token_counter,
)

Custom Model (e.g., Azure OpenAI)

from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
from pydantic_ai_summarization import create_summarization_processor

azure_model = OpenAIModel(
    "gpt-4o",
    provider=OpenAIProvider(
        base_url="https://my-resource.openai.azure.com/openai/deployments/gpt-4o",
        api_key="your-azure-api-key",
    ),
)

processor = create_summarization_processor(
    model=azure_model,
    trigger=("tokens", 100000),
    keep=("messages", 20),
)

Custom Summary Prompt

processor = create_summarization_processor(
    summary_prompt="""
    Extract key information from this conversation.
    Focus on: decisions made, code written, pending tasks.

    Conversation:
    {messages}
    """,
)

Why Choose This Library?

Feature Description
Two Strategies Intelligent summarization or fast sliding window
Flexible Triggers Message count, token count, or fraction-based
Safe Cutoff Never breaks tool call/response pairs
Auto max_tokens Auto-detect context window from genai-prices
Message Persistence Save all messages to JSON for session resume
Guided Compaction Focus summaries on specific topics
Callbacks on_before/after_compress with instruction re-injection
Async Token Counting Sync or async token counter support
Token Tracking Real-time usage monitoring with callbacks
Tool Truncation Automatic truncation of large tool outputs
Custom Models Use any pydantic-ai Model (Azure, custom providers)
Lightweight Only requires pydantic-ai-slim (no extra model SDKs)

Related Projects

Package Description
Pydantic Deep Agents Full agent framework (uses this library)
pydantic-ai-backend File storage and Docker sandbox
pydantic-ai-todo Task planning toolset
subagents-pydantic-ai Multi-agent orchestration
pydantic-ai The foundation — agent framework by Pydantic

Contributing

git clone https://github.com/vstorm-co/summarization-pydantic-ai.git
cd summarization-pydantic-ai
make install
make test  # 100% coverage required

License

MIT — see LICENSE


Need help implementing this in your company?

We're Vstorm — an Applied Agentic AI Engineering Consultancy
with 30+ production AI agent implementations.

Talk to us



Made with ❤️ by Vstorm

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

summarization_pydantic_ai-0.1.1.tar.gz (146.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

summarization_pydantic_ai-0.1.1-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file summarization_pydantic_ai-0.1.1.tar.gz.

File metadata

File hashes

Hashes for summarization_pydantic_ai-0.1.1.tar.gz
Algorithm Hash digest
SHA256 183ec04f03bf3250d1f7a1ed143dbe44bfd7b8c677d89320a32b41d9824fe8fc
MD5 108d8eeb39370c308a946150b6bfb57e
BLAKE2b-256 353c1a7ac15b8cea92fe8e5a7db65aa0902fab4563a0b38d4a838eb8080a6481

See more details on using hashes here.

Provenance

The following attestation bundles were made for summarization_pydantic_ai-0.1.1.tar.gz:

Publisher: publish.yml on vstorm-co/summarization-pydantic-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file summarization_pydantic_ai-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for summarization_pydantic_ai-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9360f1cb971a94beb2c5cedd1fb368ba76ca552d80f94a3d298fe7c5208b01cb
MD5 6ec95b78f4f37d8abcb14b04d28c345c
BLAKE2b-256 2f131581777e9dee31923b28e4beaaa322a76de2fda117f502f8bc534a3c8bed

See more details on using hashes here.

Provenance

The following attestation bundles were made for summarization_pydantic_ai-0.1.1-py3-none-any.whl:

Publisher: publish.yml on vstorm-co/summarization-pydantic-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page