Skip to main content

Claude Code's compaction engine, repackaged as a drop-in DeepAgents middleware

Project description


compact-middleware

Claude Code's compaction engine, as a drop-in DeepAgents middleware.

License Python DeepAgents

ProblemHow it worksQuick startConfigurationComparison


Long-running AI agents hit the context window wall. The built-in SummarizationMiddleware does a single-pass summary with a generic prompt — it works, but it loses critical details and has no lightweight fallbacks.

compact-middleware takes the battle-tested compaction pipeline from Claude Code and makes it a composable DeepAgents middleware. One import, and your agents handle 10x longer conversations without blowing the context window.


The Problem

What goes wrong Built-in middleware compact-middleware
Summary loses file paths, code, user feedback Generic prompt 9-section structured prompt
Every compaction = expensive LLM call Yes 3 free levels tried first
Recently-read files vanish after compaction Yes Auto-restores top 5 files + active plan
Compaction fails and retries forever Yes Circuit breaker after 3 failures
No way to clear stale tool results cheaply Correct Time-based microcompaction (free)
Token counting is pure heuristic Yes Hybrid: real API usage + heuristic tail

How It Works

The middleware runs a multi-level cascade — cheapest fix first, LLM call only if needed:

flowchart TD
    A["Context growing..."] --> B{"1. COLLAPSE"}
    B -->|"Group consecutive read/search\ninto badge summaries"| C{"2. TRUNCATE"}
    C -->|"Shorten large tool args\nin old messages"| D{"3. MICROCOMPACT"}
    D -->|"Clear stale tool results\n(time gap > 60 min)"| E{"Still over\nthreshold?"}
    E -->|No| F["Done — no LLM call needed"]
    E -->|Yes| G{"4. SUMMARIZE"}
    G -->|"9-section structured summary\n+ restore files & plan"| H["Done"]

    style F fill:#2d6a4f,color:#fff
    style H fill:#2d6a4f,color:#fff
    style B fill:#1a1a2e,color:#fff
    style C fill:#1a1a2e,color:#fff
    style D fill:#1a1a2e,color:#fff
    style G fill:#e63946,color:#fff

Levels 1-3 are free (no LLM call). In many cases they're enough to stay within budget.


Quick Start

Install

# From source (not yet on PyPI)
pip install git+https://github.com/emanueleielo/compact-middleware.git

# Or local development
git clone https://github.com/emanueleielo/compact-middleware.git
cd compact-middleware
pip install -e ".[dev]"

Minimal — zero config

from deepagents import create_deep_agent
from compact_middleware import CompactionMiddleware, CompactionToolMiddleware

mw = CompactionMiddleware(
    model="anthropic:claude-sonnet-4-6",
    backend=backend,
)
tool_mw = CompactionToolMiddleware(mw)  # optional: lets the agent compact manually

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    tools=[read_file, edit_file, execute],
    system_prompt="You are a coding assistant.",
    backend=backend,
    middleware=[mw, tool_mw],
)

# That's it — compaction triggers automatically at ~85% context usage
result = agent.invoke({"messages": [("human", "Refactor the auth module")]})

With create_deep_agent — full setup

from deepagents import create_deep_agent
from deepagents.backends import FilesystemBackend
from langchain.chat_models import init_chat_model

from compact_middleware import (
    CompactionConfig,
    CompactionMiddleware,
    CompactionToolMiddleware,
)

backend = FilesystemBackend(root_dir="/data/workspace")
model = init_chat_model("anthropic:claude-sonnet-4-6")

mw = CompactionMiddleware(model=model, backend=backend)
tool_mw = CompactionToolMiddleware(mw)

agent = create_deep_agent(
    model=model,
    tools=[search_tool, execute_tool, edit_file_tool],
    system_prompt="You are a senior engineer.",
    backend=backend,
    middleware=[mw, tool_mw],
    memory=["/memory/AGENTS.md"],
    interrupt_on={"edit_file": True},
)

# Async works too — all operations have async variants
result = await agent.ainvoke({"messages": [("human", "Add pagination to the API")]})

Configuration

The defaults match Claude Code's production settings. Override what you need:

Custom triggers and budgets

from compact_middleware import CompactionConfig, CompactionMiddleware
from compact_middleware.config import (
    CollapseConfig,
    MicrocompactConfig,
    RestorationConfig,
    TokenBudgetConfig,
    TruncateArgsConfig,
)

config = CompactionConfig(
    # --- When to compact ---
    trigger=("fraction", 0.80),          # at 80% of context window (default: 0.85)
    keep=("messages", 10),               # keep last 10 messages after compaction

    # --- Circuit breaker ---
    max_consecutive_failures=5,          # default: 3

    # --- Custom summary instructions ---
    custom_instructions="Focus on code diffs and test output. Include file paths verbatim.",
    suppress_follow_up_questions=True,   # resume without asking "where were we?"
)

mw = CompactionMiddleware(model=model, backend=backend, config=config)

Microcompaction (free tool result clearing)

config = CompactionConfig(
    microcompact=MicrocompactConfig(
        enabled=True,                       # default
        gap_threshold_minutes=30,           # clear after 30 min gap (default: 60)
        keep_recent=3,                      # always keep last 3 results (default: 5)
        compactable_tools={                 # which tools' results can be cleared
            "read_file", "execute", "grep", "glob",
            "web_search", "web_fetch", "edit_file", "write_file",
        },
    ),
)

Argument truncation

config = CompactionConfig(
    truncate_args=TruncateArgsConfig(
        trigger=("fraction", 0.80),         # when to start truncating
        max_length=1_000,                   # chars per arg value (default: 2000)
        truncate_all_tools=True,            # all tools, not just write/edit (default)
    ),
)

Message collapsing

config = CompactionConfig(
    collapse=CollapseConfig(
        enabled=True,
        min_group_size=3,                   # need 3+ consecutive reads to collapse (default: 2)
        collapse_tools={"read_file", "grep", "glob", "web_search"},
    ),
)

Post-compaction restoration

config = CompactionConfig(
    restoration=RestorationConfig(
        enabled=True,
        max_files=3,                        # re-read top 3 recent files (default: 5)
        file_budget_chars=30_000,           # total budget for restored content
        per_file_chars=10_000,              # max per file
        restore_plans=True,                 # re-attach active plan state
    ),
)

Token budgets

config = CompactionConfig(
    token_budget=TokenBudgetConfig(
        per_tool_chars=50_000,              # max chars per tool result
        per_message_chars=200_000,          # aggregate max per message turn
    ),
)

Trigger formats

The trigger and keep parameters accept three formats:

# Absolute token count
trigger=("tokens", 170_000)

# Fraction of context window (requires model with known context size)
trigger=("fraction", 0.85)

# Message count
trigger=("messages", 50)

# Multiple triggers (any fires)
trigger=[("fraction", 0.85), ("messages", 100)]

Comparison

Feature SummarizationMiddleware compact-middleware
Summary prompt Generic 9-section structured
Pre-summarization optimization Collapse + Truncate + Microcompact
Partial compaction (prefix/suffix) Yes
Post-compaction restoration Files + Plans
Circuit breaker Configurable
PTL error recovery Head truncation + retry
Token counting Heuristic only Hybrid (real API + heuristic)
Argument truncation write_file, edit_file All tools
Time-based clearing Configurable gap threshold
Message collapsing Consecutive read/search
Custom summary instructions Yes
Async support Yes Yes (concurrent offload + summary)

The 9-Section Summary Prompt

Unlike a generic "summarize this conversation", the compaction prompt enforces 9 sections that preserve what agents actually need:

  1. Primary Request & Intent — what the user asked for
  2. Key Technical Concepts — frameworks, patterns, technologies
  3. Files & Code Sections — paths, snippets, why they matter
  4. Errors & Fixes — what broke and how it was resolved
  5. Problem Solving — debugging strategies and troubleshooting
  6. All User Messages — verbatim non-tool messages (catches intent drift)
  7. Pending Tasks — what's still open
  8. Current Work — exactly where things left off
  9. Optional Next Step — with direct quotes to prevent task drift

An <analysis> scratchpad is used during generation for quality, then stripped from the final summary.


Hybrid Token Counting

Most middleware estimates tokens with len(text) / 4. That's a guess.

compact-middleware uses a hybrid approach ported from Claude Code:

Messages: [Human] [AI] [Tool] [AI with usage={input:45000, output:1200}] [Human] [AI]
                                              ^                              ^      ^
                                       real: 46,200 tokens               heuristic only
                                       (from API response)               (for these 2)

It walks messages backwards, finds the last AIMessage with real API token usage (response_metadata.usage), and estimates only the messages after it. Falls back to pure heuristic when no API response is available.

Works with Anthropic, OpenAI, and any LangChain-compatible provider.


Architecture

compact_middleware/
├── middleware.py      CompactionMiddleware + CompactionToolMiddleware
├── decision.py        Multi-level cascade engine
├── compaction.py      LLM summarization (full + partial)
├── tokens.py          Hybrid token counting (real API + heuristic)
├── prompts.py         9-section prompt templates
├── microcompact.py    Time-based tool result clearing
├── collapse.py        Message collapsing
├── truncation.py      Argument truncation
├── restoration.py     Post-compaction file/plan restoration
├── config.py          All configuration dataclasses
└── state.py           State schema (TypedDict events)

Development

git clone https://github.com/emanueleielo/compact-middleware
cd compact-middleware
pip install -e ".[dev]"

pytest -v              # run tests
ruff check .           # lint
mypy compact_middleware # typecheck

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compact_middleware-0.1.0.tar.gz (38.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

compact_middleware-0.1.0-py3-none-any.whl (40.8 kB view details)

Uploaded Python 3

File details

Details for the file compact_middleware-0.1.0.tar.gz.

File metadata

  • Download URL: compact_middleware-0.1.0.tar.gz
  • Upload date:
  • Size: 38.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for compact_middleware-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cb3e48c3870fed68989aa6e08ec668ac08382feb5a7812e97a7ab800de657355
MD5 a40b43f9e5ffa6cc446d89654774d921
BLAKE2b-256 1987736626db08234f59e27a27957cad41cadf7bb53c82af5e8c5734e21baf81

See more details on using hashes here.

File details

Details for the file compact_middleware-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for compact_middleware-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3437ff46d9948f0f29f708b7a18470a2f9c7a932e0f02b4102f62ef4f5c769e7
MD5 cd248f1600aa075e14f876a40bb58a5a
BLAKE2b-256 f54427594a38c71615bce76a419aa1cd7ae94d1768a37df1e81a7554c4f58f66

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page