Multi-layer context optimization proxy for LLM agents
Project description
Kompact
Context compression proxy for LLM agents. Sits between your agent and the LLM provider, compresses context on the fly, and cuts your token bill 40-70% — with zero code changes.
Save real money
For a team running 1,000 agentic requests/day with ~10K token contexts:
| Model | Without Kompact | With Kompact | Monthly Savings |
|---|---|---|---|
| Sonnet ($3/M) | $900/mo | $405/mo | $495/mo |
| Opus ($15/M) | $4,500/mo | $2,025/mo | $2,475/mo |
| GPT-4o ($2.50/M) | $750/mo | $338/mo | $412/mo |
Savings scale linearly. 10K requests/day = 10x the numbers above.
Get started in 30 seconds
pip install kompact # or: uv add kompact
kompact proxy --port 7878
export ANTHROPIC_BASE_URL=http://localhost:7878
# That's it. Your agent now uses fewer tokens.
No SDK changes. No prompt rewriting. Just point your base URL at the proxy.
Quality stays intact
Evaluated on BFCL (1,431 real API schemas) — the standard benchmark for tool-calling agents. End-to-end through Claude, scored with context-bench.
Quality impact vs no compression (closer to 0% = better):
| Model | Kompact | Headroom | LLMLingua-2 |
|---|---|---|---|
| Haiku | -2.6% | -3.0% | -23.4% |
| Sonnet | -3.9% | -3.5% | -20.6% |
| Opus | -0.5% | -0.5% | -27.3% |
Kompact and Headroom both stay within ~3% of baseline. LLMLingua-2 destroys tool schemas regardless of model (-20 to -27%).
Compression across content types
Measured offline on 12,795 examples across 3 datasets:
| Dataset | Examples | Kompact | Headroom | LLMLingua-2 |
|---|---|---|---|---|
| BFCL (tool schemas) | 1,431 | 55.3% | ~0% | 55.4% |
| Glaive (tool calling) | 3,959 | 56.6% | ~0% | ~50% |
| HotpotQA (prose QA) | 7,405 | 17.9% | ~0% | 49.9% |
Headroom's SmartCrusher doesn't compress JSON — it's designed for prose. LLMLingua-2 compresses aggressively but destroys information (see quality table above).
How it works
Kompact is a transparent HTTP proxy. It intercepts LLM API requests, compresses the context, then forwards to the provider.
┌──────────────────────────────────────────────┐
│ Kompact Proxy (:7878) │
│ │
Agent ─>│ 1. Schema Optimizer (TF-IDF selection) │─> LLM Provider
│ 2. Content Compressors (TOON, JSON, code) │
│ 3. Extractive Compress (TF-IDF sentences) │
│ 4. Observation Masker (history mgmt) │
│ 5. Cache Aligner (prefix caching) │
│ │
└──────────────────────────────────────────────┘
8 transforms, each targeting a different content type. The pipeline adapts automatically — short contexts get light compression, long contexts get aggressive optimization. Sub-millisecond overhead.
Per-request control
Disable transforms for a single request without affecting other clients using the X-Kompact-Disable header:
# Anthropic SDK
client.messages.create(..., extra_headers={"X-Kompact-Disable": "toon,code_compressor"})
# OpenAI SDK
client.chat.completions.create(..., extra_headers={"X-Kompact-Disable": "toon,code_compressor"})
Comma-separated transform names: toon, json_crusher, code_compressor, log_compressor, content_compressor, observation_masker, cache_aligner, schema_optimizer.
Monitoring
Kompact exports OpenTelemetry metrics (on by default, disable with --no-otel). A Prometheus + Grafana stack is included:
cd monitoring
docker compose up -d
- Grafana dashboard: http://localhost:9473 (pre-built "Kompact" dashboard)
- Prometheus: http://localhost:9090
- Metrics endpoint: http://localhost:9464/metrics
The dashboard shows request rate, token savings, compression ratio, pipeline latency percentiles, and per-transform breakdowns.
Running benchmarks
# Offline compression (no LLM calls, measures compression + needle preservation)
uv run python benchmarks/run_dataset_eval.py --dataset bfcl
# End-to-end quality (sends through proxy chain, measures LLM answer quality)
# Requires: claude-relay running on :8084, kompact on :7878
uv run python benchmarks/run_e2e_eval.py --dataset bfcl --model haiku --workers 20
See benchmarks/README.md for full methodology.
Development
uv sync --extra dev
uv run pytest # 48 tests
uv run ruff check src/ tests/
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kompact-0.3.0.tar.gz.
File metadata
- Download URL: kompact-0.3.0.tar.gz
- Upload date:
- Size: 84.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65adfc7c291a8c540b606fcfb8b5933f25545c90c9ab868282e8f54ecf1433c8
|
|
| MD5 |
632d30123351f66c62fdada45364c666
|
|
| BLAKE2b-256 |
efbec477b24db6af53117ea60448f12303d3437e9da9a861eb54fbfdfc9ecf9f
|
Provenance
The following attestation bundles were made for kompact-0.3.0.tar.gz:
Publisher:
publish.yml on npow/kompact
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kompact-0.3.0.tar.gz -
Subject digest:
65adfc7c291a8c540b606fcfb8b5933f25545c90c9ab868282e8f54ecf1433c8 - Sigstore transparency entry: 1154700221
- Sigstore integration time:
-
Permalink:
npow/kompact@fd608ddf0a2af99d7881499f876fb94baab80edd -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/npow
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fd608ddf0a2af99d7881499f876fb94baab80edd -
Trigger Event:
release
-
Statement type:
File details
Details for the file kompact-0.3.0-py3-none-any.whl.
File metadata
- Download URL: kompact-0.3.0-py3-none-any.whl
- Upload date:
- Size: 42.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0771083d9866ebebe4a8b8575943bda35909c70862ff6b00c91a1d35552c11e4
|
|
| MD5 |
cddd6eef86a156c6fe3a4cbebd464729
|
|
| BLAKE2b-256 |
ae6202910e91940fa8af2ad837252bf537cc3a20cf2f2ac3da35edb75352cf1e
|
Provenance
The following attestation bundles were made for kompact-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on npow/kompact
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kompact-0.3.0-py3-none-any.whl -
Subject digest:
0771083d9866ebebe4a8b8575943bda35909c70862ff6b00c91a1d35552c11e4 - Sigstore transparency entry: 1154700226
- Sigstore integration time:
-
Permalink:
npow/kompact@fd608ddf0a2af99d7881499f876fb94baab80edd -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/npow
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fd608ddf0a2af99d7881499f876fb94baab80edd -
Trigger Event:
release
-
Statement type: