Skip to main content

14-stage Fusion Pipeline for LLM token compression — 15-82% reduction depending on content, zero LLM inference cost, reversible compression, AST-aware code analysis

Project description

Claw Compactor

14-Stage Fusion Pipeline for LLM Token Compression

Claw Compactor Banner

CI codecov Python License PyPI Downloads Stars

15–82% compression depending on content · Zero LLM inference cost · Reversible · 1600+ tests

Documentation · Architecture · Benchmarks · Quick Start · API


What is Claw Compactor?

Claw Compactor is an open-source LLM token compression engine built around a 14-stage Fusion Pipeline. Each stage is a specialized compressor — from AST-aware code analysis to JSON statistical sampling to simhash-based deduplication — chained through an immutable data flow architecture where each stage's output feeds the next.

Demo

$ claw-compactor benchmark ./my-workspace

  Claw Compactor v7.0 — Fusion Pipeline Benchmark
  ─────────────────────────────────────────────────

  Scanning workspace... 47 files, 234,891 tokens

  Stage Results:
  ┌──────────────────┬──────────┬───────────┬──────────┐
  │ Stage            │ Applied  │ Reduction │ Time     │
  ├──────────────────┼──────────┼───────────┼──────────┤
  │ Cortex           │ 47/47    │ —         │ 12ms     │
  │ Photon           │ 3/47     │ 2.1%      │ 4ms      │
  │ RLE              │ 41/47    │ 8.3%      │ 6ms      │
  │ SemanticDedup    │ 47/47    │ 12.7%     │ 18ms     │
  │ Ionizer          │ 8/47     │ 71.2%     │ 9ms      │
  │ Neurosyntax      │ 23/47    │ 18.4%     │ 31ms     │
  │ TokenOpt         │ 47/47    │ 4.1%      │ 3ms      │
  │ Abbrev           │ 12/47    │ 6.8%      │ 5ms      │
  └──────────────────┴──────────┴───────────┴──────────┘

  Summary:
    Before:  234,891 tokens ($2.35 at GPT-4 rates)
    After:   108,250 tokens ($1.08)
    Saved:   126,641 tokens (53.9%) — $1.27/run
    Time:    88ms total

  Estimated monthly savings at 100 runs/day: $3,810

How It Compares

Feature Claw Compactor LLMLingua-2 SelectiveContext gzip + base64
Compression rate 15–82% 30–70% 10–40% 60–80%
ROUGE-L @ 0.3 0.653 0.346 ~0.4 N/A
ROUGE-L @ 0.5 0.723 0.570 ~0.6 N/A
LLM inference cost $0 ~$0.02/call $0 $0
Latency <50ms ~300ms ~200ms <10ms
Reversible Yes No No Yes (manual)
Content-aware routing 14 stages 1 (perplexity) 1 (self-info) None
AST-aware code handling Yes (tree-sitter) No No No
JSON schema sampling Yes No No No
Log/diff/search stages Yes No No No
Required dependencies 0 torch, transformers torch zlib
LLM-readable output Yes Partial Partial No

Why Claw Compactor wins: LLMLingua-2 drops tokens by perplexity score — effective for natural language, but destroys code identifiers, JSON keys, and log patterns. Claw Compactor uses content-type-aware stages that understand the structure of what they're compressing.


Input
  |
  v
┌─────────────────────────────────────────────────────────────────────────┐
│                         FUSION PIPELINE                                 │
│                                                                         │
│  QuantumLock ─> Cortex ─> Photon ─> RLE ─> SemanticDedup ─> Ionizer    │
│       |            |         |        |          |              |        │
│   KV-cache    auto-detect  base64   path     simhash       JSON         │
│   alignment   16 languages  strip  shorten   dedup        sampling      │
│                                                                         │
│  ─> LogCrunch ─> SearchCrunch ─> DiffCrunch ─> StructuralCollapse      │
│        |              |              |                |                  │
│    log folding    result dedup   context fold    import merge            │
│                                                                         │
│  ─> Neurosyntax ─> Nexus ─> TokenOpt ─> Abbrev ─────────> Output       │
│        |             |          |           |                            │
│    AST compress   ML token   format     NL shorten                      │
│    (tree-sitter)  classify   optimize   (text only)                     │
│                                                                         │
│  [ RewindStore ] ── hash-addressed LRU for reversible retrieval         │
└─────────────────────────────────────────────────────────────────────────┘

Key design principles:

  • Immutable data flowFusionContext is a frozen dataclass. Every stage produces a new FusionResult; nothing is mutated in-place.
  • Gate-before-compress — Each stage has should_apply() that inspects context type, language, and role before doing any work. Stages that don't apply are skipped at zero cost.
  • Content-aware routing — Cortex auto-detects content type (code, JSON, logs, diffs, search results) and language (Python, Go, Rust, TypeScript, etc.), then downstream stages make type-aware compression decisions.
  • Reversible compression — Ionizer stores originals in a hash-addressed RewindStore. The LLM can call a tool to retrieve any compressed section by its marker ID.

Benchmarks

Real-World Compression (FusionEngine v7 vs Legacy Regex)

Content Type Legacy FusionEngine Improvement
Python source 7.3% 25.0% 3.4x
JSON (100 items) 12.6% 81.9% 6.5x
Build logs 5.5% 24.1% 4.4x
Agent conversation 5.7% 31.0% 5.4x
Git diff 6.2% 15.0% 2.4x
Search results 5.3% 40.7% 7.7x
Weighted average 9.2% 36.3% 3.9x

SWE-bench Real Tasks

Tested on real SWE-bench instances with actual repository code:

Instance Size Compression
django__django-11620 4.5K 14.5%
sympy__sympy-14396 5.5K 19.1%
scikit-learn-25747 11.8K 15.9%
scikit-learn-13554 73K 11.8%
scikit-learn-25308 81K 14.4%

vs LLMLingua-2 (ROUGE-L Fidelity)

Compression Rate Claw Compactor LLMLingua-2 Delta
0.3 (aggressive) 0.653 0.346 +88.2%
0.5 (balanced) 0.723 0.570 +26.8%

Claw Compactor preserves more semantic content at the same compression ratio, with zero LLM inference cost.


Quick Start

Install from PyPI

pip install claw-compactor

Or clone from source

git clone https://github.com/open-compress/claw-compactor.git
cd claw-compactor
pip install -e .

Run

# Benchmark your workspace (non-destructive)
claw-compactor benchmark /path/to/workspace

# Full compression pipeline
claw-compactor compress /path/to/workspace

Requirements: Python 3.9+. Optional: pip install claw-compactor[accurate] for exact token counts via tiktoken.


API

FusionEngine — Single Text

from scripts.lib.fusion.engine import FusionEngine

engine = FusionEngine()

result = engine.compress(
    text="def hello():\n    # greeting function\n    print('hello')",
    content_type="code",    # or let Cortex auto-detect
    language="python",      # optional hint
)

print(result["compressed"])     # compressed output
print(result["stats"])          # per-stage timing + token counts
print(result["markers"])        # Rewind markers for reversibility

FusionEngine — Chat Messages

messages = [
    {"role": "system", "content": "You are a coding assistant..."},
    {"role": "user", "content": "Fix the auth bug in login.py"},
    {"role": "assistant", "content": "I found the issue. Here's the fix:\n```python\n..."},
    {"role": "tool", "content": '{"results": [{"file": "login.py", ...}, ...]}'},
]

result = engine.compress_messages(messages)

# Cross-message dedup runs first, then per-message pipeline
print(result["stats"]["reduction_pct"])   # aggregate compression %
print(result["per_message"])              # per-message breakdown

Rewind — Reversible Retrieval

engine = FusionEngine(enable_rewind=True)
result = engine.compress(large_json, content_type="json")

# LLM sees compressed output with markers like [rewind:abc123...]
# When the LLM needs the original, it calls the Rewind tool:
original = engine.rewind_store.retrieve("abc123def456...")

Custom Stage

from scripts.lib.fusion.base import FusionStage, FusionContext, FusionResult

class MyStage(FusionStage):
    name = "my_compressor"
    order = 22  # runs between StructuralCollapse (20) and Neurosyntax (25)

    def should_apply(self, ctx: FusionContext) -> bool:
        return ctx.content_type == "log"

    def apply(self, ctx: FusionContext) -> FusionResult:
        compressed = my_compression_logic(ctx.content)
        return FusionResult(
            content=compressed,
            original_tokens=estimate_tokens(ctx.content),
            compressed_tokens=estimate_tokens(compressed),
        )

# Add to pipeline
pipeline = engine.pipeline.add(MyStage())

The 14 Stages

# Stage Order Purpose Applies To
1 QuantumLock 3 Isolates dynamic content in system prompts to maximize KV-cache hit rate system messages
2 Cortex 5 Auto-detects content type and programming language (16 languages) untyped content
3 Photon 8 Detects and compresses base64-encoded images all
4 RLE 10 Path shorthand ($WS), IP prefix compression, enum compaction all
5 SemanticDedup 12 SimHash fingerprint deduplication across content blocks all
6 Ionizer 15 JSON array statistical sampling with schema discovery + error preservation json
7 LogCrunch 16 Folds repeated log lines with occurrence counts log
8 SearchCrunch 17 Deduplicates search/grep results search
9 DiffCrunch 18 Folds unchanged context lines in git diffs diff
10 StructuralCollapse 20 Merges import blocks, collapses repeated assertions/patterns code
11 Neurosyntax 25 AST-aware code compression via tree-sitter (safe regex fallback). Never shortens identifiers. code
12 Nexus 35 ML token-level compression (stopword removal fallback without model) text
13 TokenOpt 40 Tokenizer format optimization — strips bold/italic markers, normalizes whitespace all
14 Abbrev 45 Natural language abbreviation. Only fires on text — never touches code, JSON, or structured data. text

Each stage is independent and stateless. Stages communicate only through the immutable FusionContext that flows forward through the pipeline.


Workspace Commands

python3 scripts/mem_compress.py <workspace> <command> [options]
Command Description
full Run complete compression pipeline
benchmark Dry-run compression report
compress Rule-based compression only
dict Dictionary encoding with auto-learned codebook
observe Session transcript JSONL to structured observations
tiers Generate L0/L1/L2 tiered summaries
dedup Cross-file duplicate detection
estimate Token count report
audit Workspace health check
optimize Tokenizer-level format optimization
auto Watch mode — compress on file changes

Options: --json, --dry-run, --since YYYY-MM-DD, --quiet


Architecture

See ARCHITECTURE.md for the full technical deep-dive:

  • Immutable data flow design
  • Stage execution model and gating
  • Rewind reversible compression protocol
  • Cross-message semantic deduplication
  • How to extend the pipeline
12,000+ lines Python  ·  1,600+ tests  ·  14 fusion stages  ·  0 external ML dependencies

Installation

# Clone
git clone https://github.com/open-compress/claw-compactor.git
cd claw-compactor

# Optional: exact token counting
pip install tiktoken

# Optional: AST-aware code compression (Neurosyntax)
pip install tree-sitter-language-pack

# Development
pip install -e ".[dev,accurate]"

Zero required dependencies. tiktoken and tree-sitter are optional enhancements — the pipeline runs with built-in heuristic fallbacks for both.


Who Uses This

Project How
OpenClaw Built-in skill for all OpenClaw AI agents — compresses workspace context before every LLM call
OpenCompress Production compression engine powering the OpenCompress API

Using Claw Compactor? Open a PR to add yourself here.


Project Stats

Metric Value
Tests 1,600+ passed
Python source 12,000+ lines
Fusion stages 14
Languages detected 16
Required dependencies 0
Compression (code) 15–25%
Compression (JSON peak) 81.9%
ROUGE-L @ 0.3 rate 0.653
License MIT

Contributing

See CONTRIBUTING.md for guidelines on:

  • Setting up the development environment
  • Adding new Fusion stages
  • Running the test suite
  • Submitting PRs

Related


token-compression llm-tools fusion-pipeline reversible-compression ast-code-analysis context-compression ai-agent openclaw python developer-tools

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claw_compactor-7.1.0.tar.gz (273.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

claw_compactor-7.1.0-py3-none-any.whl (183.5 kB view details)

Uploaded Python 3

File details

Details for the file claw_compactor-7.1.0.tar.gz.

File metadata

  • Download URL: claw_compactor-7.1.0.tar.gz
  • Upload date:
  • Size: 273.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for claw_compactor-7.1.0.tar.gz
Algorithm Hash digest
SHA256 2c1df2dc1bc59506f115a498829ca124d9d65fe6c55478ba5b22066e7ca7dc55
MD5 259b8afe6c4cd97851885a0ec0c65870
BLAKE2b-256 498cbb6095c86863445d46a40ed7893b9f6053c8cd029202e86c38ca5a5a1ac7

See more details on using hashes here.

Provenance

The following attestation bundles were made for claw_compactor-7.1.0.tar.gz:

Publisher: publish.yml on open-compress/claw-compactor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file claw_compactor-7.1.0-py3-none-any.whl.

File metadata

  • Download URL: claw_compactor-7.1.0-py3-none-any.whl
  • Upload date:
  • Size: 183.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for claw_compactor-7.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b905cbb6e742875307e631dc73982bc92e39b7d53ffe1c560f8aebcabf2dc70
MD5 b04b814368bd63e5732d52b3d48f01f0
BLAKE2b-256 89f1d904fcb8092ac416d95196c1bd616c39cc0c5ecd9079e4ec413fb3458b2b

See more details on using hashes here.

Provenance

The following attestation bundles were made for claw_compactor-7.1.0-py3-none-any.whl:

Publisher: publish.yml on open-compress/claw-compactor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page