Skip to main content

Automatic conversation summarization for pydantic-ai agents

Project description

summarization-pydantic-ai

PyPI version Python 3.10+ License: MIT Coverage Status

Automatic conversation summarization and context management for pydantic-ai agents.

Looking for a complete agent framework? Check out pydantic-deep - a full-featured deep agent framework with planning, subagents, and skills system.

Need file operations? Check out pydantic-ai-backend - file storage and sandbox backends for AI agents.

Documentation

Full Documentation - Installation, concepts, examples, and API reference.

Installation

pip install summarization-pydantic-ai

# With tiktoken for accurate token counting
pip install summarization-pydantic-ai[tiktoken]

Available Processors

This library provides two history processors for managing conversation context:

Processor LLM Cost Latency Context Preservation Use Case
SummarizationProcessor High High Intelligent summary When context quality matters
SlidingWindowProcessor Zero ~0ms Discards old messages When speed/cost matters

Quick Start

SummarizationProcessor - Intelligent Summarization

Uses an LLM to create intelligent summaries of older messages:

from pydantic_ai import Agent
from pydantic_ai_summarization import create_summarization_processor

# Create a processor that triggers at 100k tokens and keeps 20 messages
processor = create_summarization_processor(
    trigger=("tokens", 100000),
    keep=("messages", 20),
)

agent = Agent(
    "openai:gpt-4.1",
    history_processors=[processor],
)

# The processor will automatically summarize older messages
# when the conversation grows too long
result = await agent.run("Hello!")

SlidingWindowProcessor - Zero-Cost Trimming

Simply discards old messages without LLM calls - fastest and cheapest option:

from pydantic_ai import Agent
from pydantic_ai_summarization import create_sliding_window_processor

# Keep last 50 messages when reaching 100
processor = create_sliding_window_processor(
    trigger=("messages", 100),
    keep=("messages", 50),
)

agent = Agent(
    "openai:gpt-4.1",
    history_processors=[processor],
)

# Old messages are simply discarded - no LLM cost
result = await agent.run("Hello!")

Multiple Triggers

Both processors support triggering based on multiple conditions:

from pydantic_ai_summarization import SummarizationProcessor, SlidingWindowProcessor

# Summarization with multiple triggers
processor = SummarizationProcessor(
    model="openai:gpt-4.1",
    trigger=[
        ("messages", 50),    # OR 50+ messages
        ("tokens", 100000),  # OR 100k+ tokens
    ],
    keep=("messages", 10),
)

# Sliding window with multiple triggers
processor = SlidingWindowProcessor(
    trigger=[
        ("messages", 100),
        ("tokens", 50000),
    ],
    keep=("messages", 30),
)

Fraction-Based Configuration

Trigger when reaching a percentage of the model's context window:

from pydantic_ai_summarization import SummarizationProcessor, SlidingWindowProcessor

# Summarization at 80% of context
processor = SummarizationProcessor(
    model="openai:gpt-4.1",
    trigger=("fraction", 0.8),  # 80% of context window
    keep=("fraction", 0.2),     # Keep last 20%
    max_input_tokens=128000,    # GPT-4's context window
)

# Sliding window at 80% of context
processor = SlidingWindowProcessor(
    trigger=("fraction", 0.8),
    keep=("fraction", 0.3),
    max_input_tokens=128000,
)

Custom Token Counter

Use a custom token counting function with either processor:

from pydantic_ai_summarization import (
    create_summarization_processor,
    create_sliding_window_processor,
)

def my_token_counter(messages):
    # Your custom token counting logic
    return sum(len(str(msg)) for msg in messages) // 4

# With summarization
processor = create_summarization_processor(
    token_counter=my_token_counter,
)

# With sliding window
processor = create_sliding_window_processor(
    token_counter=my_token_counter,
)

Custom Summary Prompt

Customize how summaries are generated (SummarizationProcessor only):

from pydantic_ai_summarization import create_summarization_processor

processor = create_summarization_processor(
    summary_prompt="""
    Extract the key information from this conversation.
    Focus on: decisions made, code written, and pending tasks.

    Conversation:
    {messages}
    """,
)

Trigger Types

Type Example Description
messages ("messages", 50) Trigger when message count exceeds threshold
tokens ("tokens", 100000) Trigger when token count exceeds threshold
fraction ("fraction", 0.8) Trigger at percentage of max_input_tokens

Keep Types

Type Example Description
messages ("messages", 20) Keep last N messages after processing
tokens ("tokens", 10000) Keep last N tokens worth of messages
fraction ("fraction", 0.2) Keep last N% of max_input_tokens

How It Works

SummarizationProcessor

  1. Monitoring: Tracks token count on every call
  2. Trigger Check: When any trigger condition is met, summarization begins
  3. Safe Cutoff: Finds a safe point to cut that doesn't split tool call pairs
  4. Summarization: Uses an LLM to generate a summary of older messages
  5. Replacement: Older messages are replaced with a summary message

SlidingWindowProcessor

  1. Monitoring: Tracks message/token count on every call
  2. Trigger Check: When any trigger condition is met, trimming begins
  3. Safe Cutoff: Finds a safe point to cut that doesn't split tool call pairs
  4. Trimming: Older messages are simply discarded (no LLM call)

API Reference

SummarizationProcessor

@dataclass
class SummarizationProcessor:
    model: str                           # Model for generating summaries
    trigger: ContextSize | list[ContextSize] | None  # When to trigger
    keep: ContextSize                    # How much to keep
    token_counter: TokenCounter          # Token counting function
    summary_prompt: str                  # Prompt template
    max_input_tokens: int | None         # Required for fraction-based
    trim_tokens_to_summarize: int | None # Limit summary input

SlidingWindowProcessor

@dataclass
class SlidingWindowProcessor:
    trigger: ContextSize | list[ContextSize] | None  # When to trigger
    keep: ContextSize                    # How much to keep
    token_counter: TokenCounter          # Token counting function
    max_input_tokens: int | None         # Required for fraction-based

Factory Functions

# Summarization with defaults
def create_summarization_processor(
    model: str = "openai:gpt-4.1",
    trigger: ContextSize | list[ContextSize] | None = ("tokens", 170000),
    keep: ContextSize = ("messages", 20),
    max_input_tokens: int | None = None,
    token_counter: TokenCounter | None = None,
    summary_prompt: str | None = None,
) -> SummarizationProcessor

# Sliding window with defaults
def create_sliding_window_processor(
    trigger: ContextSize | list[ContextSize] | None = ("messages", 100),
    keep: ContextSize = ("messages", 50),
    max_input_tokens: int | None = None,
    token_counter: TokenCounter | None = None,
) -> SlidingWindowProcessor

Choosing a Processor

Use SummarizationProcessor when:

  • Context quality is important
  • You need to preserve key information from long conversations
  • LLM cost is acceptable

Use SlidingWindowProcessor when:

  • Speed and cost are priorities
  • Recent context is most important
  • You're running many parallel conversations
  • You want deterministic, predictable behavior

Development

git clone https://github.com/vstorm-co/summarization-pydantic-ai.git
cd summarization-pydantic-ai
make install
make test

Related Projects

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

summarization_pydantic_ai-0.0.1.tar.gz (248.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

summarization_pydantic_ai-0.0.1-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file summarization_pydantic_ai-0.0.1.tar.gz.

File metadata

File hashes

Hashes for summarization_pydantic_ai-0.0.1.tar.gz
Algorithm Hash digest
SHA256 de662ecd3aa6fcc04f08796dcd1d2eaca65ee9a1e0954246b18606ccda488bca
MD5 8667a3a1cd65d421b1c0067de0f390ac
BLAKE2b-256 ab0a75bd08d4212dc4474e40d6f610964d3632cd7dce869f5a8a65e125e78be3

See more details on using hashes here.

Provenance

The following attestation bundles were made for summarization_pydantic_ai-0.0.1.tar.gz:

Publisher: publish.yml on vstorm-co/summarization-pydantic-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file summarization_pydantic_ai-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for summarization_pydantic_ai-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b84aa8741fad1becc38fa240c4caf9095c8c913e66904f395c86563a1e3a39dc
MD5 21975ba9f380d12cd1a781a463bd35f7
BLAKE2b-256 d8670f73ad4171111181d745f10a928eff6a93813127135f682e56150a0ec91c

See more details on using hashes here.

Provenance

The following attestation bundles were made for summarization_pydantic_ai-0.0.1-py3-none-any.whl:

Publisher: publish.yml on vstorm-co/summarization-pydantic-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page