Skip to main content

Multi-layer context optimization proxy for LLM agents

Project description

Kompact

CI PyPI Python 3.10+ Docs

Context compression proxy for LLM agents. Sits between your agent and the LLM provider, compresses context on the fly, and cuts your token bill 40-70% — with zero code changes.

Save real money

For a team running 1,000 agentic requests/day with ~10K token contexts:

Model Without Kompact With Kompact Monthly Savings
Sonnet ($3/M) $900/mo $405/mo $495/mo
Opus ($15/M) $4,500/mo $2,025/mo $2,475/mo
GPT-4o ($2.50/M) $750/mo $338/mo $412/mo

Savings scale linearly. 10K requests/day = 10x the numbers above.

Get started in 30 seconds

pip install kompact   # or: uv add kompact
kompact proxy --port 7878
export ANTHROPIC_BASE_URL=http://localhost:7878
# That's it. Your agent now uses fewer tokens.

No SDK changes. No prompt rewriting. Just point your base URL at the proxy.

Quality stays intact

Evaluated on BFCL (1,431 real API schemas) — the standard benchmark for tool-calling agents. End-to-end through Claude, scored with context-bench.

Quality impact vs no compression (closer to 0% = better):

Model Kompact Headroom LLMLingua-2
Haiku -2.6% -3.0% -23.4%
Sonnet -3.9% -3.5% -20.6%
Opus -0.5% -0.5% -27.3%

Kompact and Headroom both stay within ~3% of baseline. LLMLingua-2 destroys tool schemas regardless of model (-20 to -27%).

Compression across content types

Measured offline on 12,795 examples across 3 datasets:

Dataset Examples Kompact Headroom LLMLingua-2
BFCL (tool schemas) 1,431 55.3% ~0% 55.4%
Glaive (tool calling) 3,959 56.6% ~0% ~50%
HotpotQA (prose QA) 7,405 17.9% ~0% 49.9%

Headroom's SmartCrusher doesn't compress JSON — it's designed for prose. LLMLingua-2 compresses aggressively but destroys information (see quality table above).

How it works

Kompact is a transparent HTTP proxy. It intercepts LLM API requests, compresses the context, then forwards to the provider.

        ┌──────────────────────────────────────────────┐
        │           Kompact Proxy (:7878)              │
        │                                              │
Agent ─>│  1. Schema Optimizer    (TF-IDF selection)   │─> LLM Provider
        │  2. Content Compressors (TOON, JSON, code)   │
        │  3. Extractive Compress (TF-IDF sentences)   │
        │  4. Observation Masker  (history mgmt)       │
        │  5. Cache Aligner       (prefix caching)     │
        │                                              │
        └──────────────────────────────────────────────┘

8 transforms, each targeting a different content type. The pipeline adapts automatically — short contexts get light compression, long contexts get aggressive optimization. Sub-millisecond overhead.

Per-request control

Disable transforms for a single request without affecting other clients using the X-Kompact-Disable header:

# Anthropic SDK
client.messages.create(..., extra_headers={"X-Kompact-Disable": "toon,code_compressor"})

# OpenAI SDK
client.chat.completions.create(..., extra_headers={"X-Kompact-Disable": "toon,code_compressor"})

Comma-separated transform names: toon, json_crusher, code_compressor, log_compressor, content_compressor, observation_masker, cache_aligner, schema_optimizer.

Monitoring

Kompact exports OpenTelemetry metrics (on by default, disable with --no-otel). A Prometheus + Grafana stack is included:

cd monitoring
docker compose up -d

The dashboard shows request rate, token savings, compression ratio, pipeline latency percentiles, and per-transform breakdowns.

Running benchmarks

# Offline compression (no LLM calls, measures compression + needle preservation)
uv run python benchmarks/run_dataset_eval.py --dataset bfcl

# End-to-end quality (sends through proxy chain, measures LLM answer quality)
# Requires: claude-relay running on :8084, kompact on :7878
uv run python benchmarks/run_e2e_eval.py --dataset bfcl --model haiku --workers 20

See benchmarks/README.md for full methodology.

Development

uv sync --extra dev
uv run pytest          # 48 tests
uv run ruff check src/ tests/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kompact-0.3.0.tar.gz (84.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kompact-0.3.0-py3-none-any.whl (42.9 kB view details)

Uploaded Python 3

File details

Details for the file kompact-0.3.0.tar.gz.

File metadata

  • Download URL: kompact-0.3.0.tar.gz
  • Upload date:
  • Size: 84.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kompact-0.3.0.tar.gz
Algorithm Hash digest
SHA256 65adfc7c291a8c540b606fcfb8b5933f25545c90c9ab868282e8f54ecf1433c8
MD5 632d30123351f66c62fdada45364c666
BLAKE2b-256 efbec477b24db6af53117ea60448f12303d3437e9da9a861eb54fbfdfc9ecf9f

See more details on using hashes here.

Provenance

The following attestation bundles were made for kompact-0.3.0.tar.gz:

Publisher: publish.yml on npow/kompact

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kompact-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: kompact-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 42.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kompact-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0771083d9866ebebe4a8b8575943bda35909c70862ff6b00c91a1d35552c11e4
MD5 cddd6eef86a156c6fe3a4cbebd464729
BLAKE2b-256 ae6202910e91940fa8af2ad837252bf537cc3a20cf2f2ac3da35edb75352cf1e

See more details on using hashes here.

Provenance

The following attestation bundles were made for kompact-0.3.0-py3-none-any.whl:

Publisher: publish.yml on npow/kompact

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page