Skip to main content

Shrink LLM context windows by removing noise, redundancy, and long-tail detail — without losing signal.

Project description

Context Compressor

Shrink LLM context windows — removing noise, redundancy, and long-tail detail without losing the signal. Typically 40–80% fewer tokens depending on how repetitive the input is (benchmarks).

CI Python License: MIT Zero dependencies

Long agent loops, verbose tool output, and 10,000-line security scans blow past even a 200K context window — and every redundant token you send costs latency and money. Context Compressor is a small, fast, rule-based pipeline that strips the fat out of any text before it reaches the model.

from context_compressor import ContextCompressor

compressor = ContextCompressor()
result = compressor.compress(noisy_log_text)

print(result.stats.summary())   # 136 -> 65 tokens (52.2% smaller, backend=tiktoken)
print(result.compressed)        # the cleaned text, ready to send to your LLM

Why

Problem What happens What this does
Context overflow Multi-turn agents accumulate history until the window overflows and the run breaks. Collapse repeated turns and boilerplate phrasing.
Verbose tool output A vuln scan or SELECT * dumps thousands of near-identical lines, 90% noise. Drop noise, dedupe rows, trim long-tail detail.
Token cost Every wasted token is latency + dollars on every call. 40–80% token reduction on noisy input, measured with tiktoken.

It works on anything text: chat transcripts, application logs, JSON blobs, SQL result dumps, and security scanner output.

Highlights

  • Zero required dependencies. Pure Python standard library. tiktoken is optional — without it a built-in heuristic counter is used automatically.
  • Lossless-leaning by default. Removals are high-precision; counts are preserved (port 22 open [x3]) rather than silently dropped.
  • Composable pipeline. Toggle each stage, tune thresholds, or add your own noise patterns via plain dataclass config.
  • Measured, not guessed. Every run returns before/after token counts and a per-stage breakdown.
  • Security-aware. A dedicated summarizer turns raw scanner output into a severity-ranked brief.
  • CLI included. cat scan.log | context-compress --stats.

Install

pip install llm-context-compressor                 # zero dependencies
pip install "llm-context-compressor[tiktoken]"     # exact OpenAI/Anthropic-style token counts

Install name is llm-context-compressor; the import is import context_compressor (like scikit-learnimport sklearn).

Or from source:

git clone https://github.com/uninhibited-scholar/context-compressor
cd context-compressor
pip install -e ".[dev]"
pytest

How it works

The pipeline runs cheap, high-precision stages first, then optionally falls back to extractive summarization only if a target_ratio is requested and the rules didn't get there:

raw text
   │
   ▼  1. NoiseFilter        drop timestamps, progress bars, status chatter, separators
   ▼  2. DetailTrimmer      shorten long strings, cap JSON depth, collapse table runs, strip log metadata
   ▼  3. PatternRemover     collapse repeated agent phrasing ("as I mentioned…")
   ▼  4. RedundancyFilter   dedupe identical/near-identical lines, keep counts
   ▼  5. ExtractiveSummarizer   (optional) TextRank-style, only if still over target
   │
   ▼
compressed text  +  full token/stage metrics

Usage

Presets

from context_compressor import ContextCompressor, CompressionConfig

ContextCompressor(CompressionConfig.conservative())  # safe, lossless-ish
ContextCompressor()                                  # balanced default
ContextCompressor(CompressionConfig.aggressive())    # smallest output

Hit a target size

cfg = CompressionConfig(target_ratio=0.3)   # aim for 30% of the original
result = ContextCompressor(cfg).compress(long_transcript)

Tune any stage

from context_compressor import CompressionConfig, NoiseConfig

cfg = CompressionConfig()
cfg.noise.drop_log_levels = True
cfg.redundancy.near_duplicate = True
cfg.trim.max_list_items = 10
cfg.noise.extra_patterns.append((r"^TRACE:.*$", "trace_line"))  # your own rule

Security scan brief

brief = ContextCompressor().compress_security_scan(nessus_output, examples_per_type=3)
print(brief)
【Security Scan Summary】

Detected 11 findings across 6 categories.

🔴 SQL Injection [CRITICAL]: 2  (CVE-2024-2117)
    1. https://shop.local/search?q=test
    2. https://shop.local/item?id=42
🟠 Cross-Site Scripting (XSS) [HIGH]: 2
    ...
🟢 Open Port / Service [LOW]: 3
    1. 10.0.0.5
    … and 2 more

Command line

context-compress scan.log --stats
cat transcript.txt | context-compress --preset aggressive
context-compress nessus.txt --security
context-compress big.log --target 0.3 > small.log

Compressed text goes to stdout; metrics go to stderr, so pipes stay clean.

Reading the metrics

result = compressor.compress(text)
s = result.stats

s.original_tokens      # 136
s.compressed_tokens    # 65
s.reduction_pct        # 52.2
s.token_backend        # "tiktoken" or "heuristic"
for stage in s.stages:
    print(stage.name, stage.chars_removed, stage.details)

Benchmarks

Reproducible with python benchmarks/benchmark.py (token counts via tiktoken):

Dataset Tokens before Tokens after Reduction Time
Application log 13,479 7,864 41.7% 30 ms
Security scan 4,625 3,863 16.5% 11 ms
Agent transcript 3,172 2,665 16.0% 8 ms
JSON result dump 1,124 220 80.4% 1 ms

Reduction scales with how repetitive the input is — heavily duplicated logs and scan output compress much further than already-unique prose. See benchmarks/BENCHMARKS.md for the chart.

JSON blobs

result = compressor.compress_json(huge_json_string)   # caps depth, lists, long strings
print(result.compressed)

RAG: LangChain & LlamaIndex

Drop-in adapters compress retrieved chunks before they reach the model. They are dependency-free (they duck-type the document objects), so installing this package never pulls in either framework.

# LangChain — implements the BaseDocumentTransformer interface
from context_compressor.integrations import CompressorDocumentTransformer

transformer = CompressorDocumentTransformer()
smaller_docs = transformer.transform_documents(retrieved_docs)
# each doc.metadata["compression"] now records the token savings

# LlamaIndex — works on nodes / Documents
from context_compressor.integrations import compress_nodes

nodes = compress_nodes(retriever.retrieve("my query"))

Integrating with an agent loop

from context_compressor import ContextCompressor, CompressionConfig

compressor = ContextCompressor(CompressionConfig(target_ratio=0.4))

def before_model_call(history: str) -> str:
    # Compress accumulated context before each turn to stay under the window.
    return compressor.compress(history).compressed

API at a glance

Object Purpose
ContextCompressor The pipeline. .compress(text) -> CompressionResult.
CompressionConfig All knobs; .aggressive() / .conservative() presets.
CompressionResult .compressed text + .stats.
NoiseFilter, RedundancyFilter, DetailTrimmer, PatternRemover Stages, usable standalone.
ExtractiveSummarizer Dependency-free TextRank-style summarizer.
SecuritySummarizer Scanner output → severity-ranked brief.
TokenCounter tiktoken-backed counter with heuristic fallback.

Development

pip install -e ".[dev]"
pytest --cov=context_compressor      # 24 tests, ~94% coverage

中文简介

Context Compressor 是一个零依赖的 Python 库,用于在把文本送入大模型之前 压缩上下文:去除噪音(时间戳、进度条、状态消息)、合并重复行、裁剪长字符串/ 深层 JSON、折叠 Agent 重复话术,并可选地做抽取式摘要。典型可减少 50–80% 的 Token,配合 tiktoken 可获得与 OpenAI/Anthropic 对齐的精确计数。内置面向网络 安全扫描结果的专属摘要器,可将上万行扫描日志归纳为按风险等级排序的简报。

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_context_compressor-0.2.0.tar.gz (29.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_context_compressor-0.2.0-py3-none-any.whl (27.4 kB view details)

Uploaded Python 3

File details

Details for the file llm_context_compressor-0.2.0.tar.gz.

File metadata

  • Download URL: llm_context_compressor-0.2.0.tar.gz
  • Upload date:
  • Size: 29.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_context_compressor-0.2.0.tar.gz
Algorithm Hash digest
SHA256 cf715af3cb842b093180826371f5478b4ea7bb9f162ffc78b8c9b80c8b619462
MD5 fcc9a45260f074711ae5a67d55819d35
BLAKE2b-256 eb84a7a527cb1ea3c6a7c621b1a0db860ebfacac3f44e54b5fb79e75b1859305

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_context_compressor-0.2.0.tar.gz:

Publisher: publish.yml on uninhibited-scholar/context-compressor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_context_compressor-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_context_compressor-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97d6390264d02101934ff2675cb402850930db467d62a1abfbed002cf213e98d
MD5 46cfad7ae597bc19988d738ed9021cf6
BLAKE2b-256 b6564002838b030686a7b3d5b7709aace0bd093112b7765367a8ea5f029db1f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_context_compressor-0.2.0-py3-none-any.whl:

Publisher: publish.yml on uninhibited-scholar/context-compressor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page