Automatic Conversation Summarization and History Management for Pydantic AI
Project description
Context Management for Pydantic AI
Automatic Conversation Summarization and History Management
Intelligent Summarization — LLM-powered context compression • Sliding Window — zero-cost message trimming • Limit Warnings — finish-soon guidance before hard caps • Context Manager — real-time token tracking + tool truncation • Safe Cutoff — preserves tool call pairs
Context Management for Pydantic AI helps your Pydantic AI agents handle long conversations without exceeding model context limits. Choose between intelligent LLM summarization or fast sliding window trimming.
Full framework? Check out Pydantic Deep Agents — complete agent framework with planning, filesystem, subagents, and skills.
Use Cases
| What You Want to Build | How This Library Helps |
|---|---|
| Long-Running Agent | Automatically compress history when context fills up |
| Customer Support Bot | Preserve key details while discarding routine exchanges |
| Code Assistant | Keep recent code context, summarize older discussions |
| High-Throughput App | Zero-cost sliding window for maximum speed |
| Cost-Sensitive App | Choose between quality (summarization) or free (sliding window) |
Installation
pip install summarization-pydantic-ai
Or with uv:
uv add summarization-pydantic-ai
For accurate token counting:
pip install summarization-pydantic-ai[tiktoken]
For real-time token tracking and tool output truncation:
pip install summarization-pydantic-ai[hybrid]
Quick Start
from pydantic_ai import Agent
from pydantic_ai_summarization import create_summarization_processor
processor = create_summarization_processor(
trigger=("tokens", 100000),
keep=("messages", 20),
)
agent = Agent(
"openai:gpt-4o",
history_processors=[processor],
)
result = await agent.run("Hello!")
That's it. Your agent now:
- Monitors conversation size on every turn
- Summarizes older messages when limits are reached
- Preserves tool call/response pairs (never breaks them)
- Keeps recent context intact
Available Processors
| Processor | LLM Cost | Latency | Context Preservation |
|---|---|---|---|
SummarizationProcessor |
High | High | Intelligent summary |
SlidingWindowProcessor |
Zero | ~0ms | Discards old messages |
LimitWarnerProcessor |
Zero | ~0ms | Full history + warning injection |
ContextManagerMiddleware |
Per compression | Low tracking / High compression | Intelligent summary |
Intelligent Summarization
Uses an LLM to create summaries of older messages:
from pydantic_ai_summarization import create_summarization_processor
processor = create_summarization_processor(
trigger=("tokens", 100000), # When to summarize
keep=("messages", 20), # What to keep
)
Zero-Cost Sliding Window
Simply discards old messages — no LLM calls:
from pydantic_ai_summarization import create_sliding_window_processor
processor = create_sliding_window_processor(
trigger=("messages", 100), # When to trim
keep=("messages", 50), # What to keep
)
Limit Warnings
Warn the agent before requests, context usage, or total tokens hit a cap:
from pydantic_ai_summarization import create_limit_warner_processor
processor = create_limit_warner_processor(
max_iterations=40,
max_context_tokens=100000,
max_total_tokens=200000,
)
Real-Time Context Manager
Dual-protocol middleware combining token tracking, auto-compression, message persistence, and tool output truncation:
from pydantic_ai import Agent
from pydantic_ai_summarization import create_context_manager_middleware
middleware = create_context_manager_middleware(
model_name="openai:gpt-4.1", # auto-detect max_tokens from genai-prices
compress_threshold=0.9,
messages_path="messages.json", # persist all messages
on_usage_update=lambda pct, cur, mx: print(f"{pct:.0%} used ({cur:,}/{mx:,})"),
on_after_compress=lambda msgs: "Re-inject critical instructions here",
)
agent = Agent(
"openai:gpt-4.1",
history_processors=[middleware],
)
Requires pip install summarization-pydantic-ai[hybrid]
Trigger Types
| Type | Example | Description |
|---|---|---|
messages |
("messages", 50) |
Trigger when message count exceeds threshold |
tokens |
("tokens", 100000) |
Trigger when token count exceeds threshold |
fraction |
("fraction", 0.8) |
Trigger at percentage of max_input_tokens |
Keep Types
| Type | Example | Description |
|---|---|---|
messages |
("messages", 20) |
Keep last N messages |
tokens |
("tokens", 10000) |
Keep last N tokens worth |
fraction |
("fraction", 0.2) |
Keep last N% of context |
Advanced Configuration
Multiple Triggers
from pydantic_ai_summarization import SummarizationProcessor
processor = SummarizationProcessor(
model="openai:gpt-4o",
trigger=[
("messages", 50), # OR 50+ messages
("tokens", 100000), # OR 100k+ tokens
],
keep=("messages", 10),
)
Fraction-Based
processor = SummarizationProcessor(
model="openai:gpt-4o",
trigger=("fraction", 0.8), # 80% of context window
keep=("fraction", 0.2), # Keep last 20%
max_input_tokens=128000, # GPT-4's context window
)
Custom Token Counter
def my_token_counter(messages):
return sum(len(str(msg)) for msg in messages) // 4
processor = create_summarization_processor(
token_counter=my_token_counter,
)
Custom Model (e.g., Azure OpenAI)
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
from pydantic_ai_summarization import create_summarization_processor
azure_model = OpenAIModel(
"gpt-4o",
provider=OpenAIProvider(
base_url="https://my-resource.openai.azure.com/openai/deployments/gpt-4o",
api_key="your-azure-api-key",
),
)
processor = create_summarization_processor(
model=azure_model,
trigger=("tokens", 100000),
keep=("messages", 20),
)
Custom Summary Prompt
processor = create_summarization_processor(
summary_prompt="""
Extract key information from this conversation.
Focus on: decisions made, code written, pending tasks.
Conversation:
{messages}
""",
)
Why Choose This Library?
| Feature | Description |
|---|---|
| Two Strategies | Intelligent summarization or fast sliding window |
| Flexible Triggers | Message count, token count, or fraction-based |
| Safe Cutoff | Never breaks tool call/response pairs |
| Auto max_tokens | Auto-detect context window from genai-prices |
| Message Persistence | Save all messages to JSON for session resume |
| Guided Compaction | Focus summaries on specific topics |
| Callbacks | on_before/after_compress with instruction re-injection |
| Async Token Counting | Sync or async token counter support |
| Token Tracking | Real-time usage monitoring with callbacks |
| Tool Truncation | Automatic truncation of large tool outputs |
| Custom Models | Use any pydantic-ai Model (Azure, custom providers) |
| Lightweight | Only requires pydantic-ai-slim (no extra model SDKs) |
Related Projects
| Package | Description |
|---|---|
| Pydantic Deep Agents | Full agent framework (uses this library) |
| pydantic-ai-backend | File storage and Docker sandbox |
| pydantic-ai-todo | Task planning toolset |
| subagents-pydantic-ai | Multi-agent orchestration |
| pydantic-ai | The foundation — agent framework by Pydantic |
Contributing
git clone https://github.com/vstorm-co/summarization-pydantic-ai.git
cd summarization-pydantic-ai
make install
make test # 100% coverage required
License
MIT — see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file summarization_pydantic_ai-0.0.5.tar.gz.
File metadata
- Download URL: summarization_pydantic_ai-0.0.5.tar.gz
- Upload date:
- Size: 306.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68a05340e789bd9cc8911b9063e9ae5142622f0baa10337f1504ee306fd0f9d8
|
|
| MD5 |
61e1db561112395b4aeb7f14e09836a2
|
|
| BLAKE2b-256 |
fccb677084c8345dc11380b5897566c97e5aa186652aafeb75d747e559692c7b
|
Provenance
The following attestation bundles were made for summarization_pydantic_ai-0.0.5.tar.gz:
Publisher:
publish.yml on vstorm-co/summarization-pydantic-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
summarization_pydantic_ai-0.0.5.tar.gz -
Subject digest:
68a05340e789bd9cc8911b9063e9ae5142622f0baa10337f1504ee306fd0f9d8 - Sigstore transparency entry: 1153368425
- Sigstore integration time:
-
Permalink:
vstorm-co/summarization-pydantic-ai@fbf7756bce72a9ba0537014945f32377c8e71fad -
Branch / Tag:
refs/tags/0.0.5 - Owner: https://github.com/vstorm-co
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fbf7756bce72a9ba0537014945f32377c8e71fad -
Trigger Event:
release
-
Statement type:
File details
Details for the file summarization_pydantic_ai-0.0.5-py3-none-any.whl.
File metadata
- Download URL: summarization_pydantic_ai-0.0.5-py3-none-any.whl
- Upload date:
- Size: 28.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6dbd58057a780daf9c2da5497295809ea1a4978804449f5779ba47b92cbcc48
|
|
| MD5 |
2855f7aa9f252960f39af1ac6e12f6af
|
|
| BLAKE2b-256 |
2ac2765f416bfff235238b785c7280771193c734d39cf6e4d4494473f94df3cc
|
Provenance
The following attestation bundles were made for summarization_pydantic_ai-0.0.5-py3-none-any.whl:
Publisher:
publish.yml on vstorm-co/summarization-pydantic-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
summarization_pydantic_ai-0.0.5-py3-none-any.whl -
Subject digest:
b6dbd58057a780daf9c2da5497295809ea1a4978804449f5779ba47b92cbcc48 - Sigstore transparency entry: 1153368428
- Sigstore integration time:
-
Permalink:
vstorm-co/summarization-pydantic-ai@fbf7756bce72a9ba0537014945f32377c8e71fad -
Branch / Tag:
refs/tags/0.0.5 - Owner: https://github.com/vstorm-co
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fbf7756bce72a9ba0537014945f32377c8e71fad -
Trigger Event:
release
-
Statement type: