Skip to main content

Multi-layer context optimization proxy for LLM agents

Project description

Kompact

CI PyPI Python 3.10+ Docs

Context compression proxy for LLM agents. Sits between your agent and the LLM provider, compresses context on the fly, and cuts your token bill 40-70% — with zero code changes.

Save real money

For a team running 1,000 agentic requests/day with ~10K token contexts:

Model Without Kompact With Kompact Monthly Savings
Sonnet ($3/M) $900/mo $405/mo $495/mo
Opus ($15/M) $4,500/mo $2,025/mo $2,475/mo
GPT-4o ($2.50/M) $750/mo $338/mo $412/mo

Savings scale linearly. 10K requests/day = 10x the numbers above.

Get started in 30 seconds

pip install kompact   # or: uv add kompact
kompact proxy --port 7878
export ANTHROPIC_BASE_URL=http://localhost:7878
# That's it. Your agent now uses fewer tokens.

No SDK changes. No prompt rewriting. Just point your base URL at the proxy.

Quality stays intact

Evaluated on BFCL (1,431 real API schemas) — the standard benchmark for tool-calling agents. End-to-end through Claude, scored with context-bench.

Quality impact vs no compression (closer to 0% = better):

Model Kompact Headroom LLMLingua-2
Haiku -2.6% -3.0% -23.4%
Sonnet -3.9% -3.5% -20.6%
Opus -0.5% -0.5% -27.3%

Kompact and Headroom both stay within ~3% of baseline. LLMLingua-2 destroys tool schemas regardless of model (-20 to -27%).

Compression across content types

Measured offline on 12,795 examples across 3 datasets:

Dataset Examples Kompact Headroom LLMLingua-2
BFCL (tool schemas) 1,431 55.3% ~0% 55.4%
Glaive (tool calling) 3,959 56.6% ~0% ~50%
HotpotQA (prose QA) 7,405 17.9% ~0% 49.9%

Headroom's SmartCrusher doesn't compress JSON — it's designed for prose. LLMLingua-2 compresses aggressively but destroys information (see quality table above).

How it works

Kompact is a transparent HTTP proxy. It intercepts LLM API requests, compresses the context, then forwards to the provider.

        ┌──────────────────────────────────────────────┐
        │           Kompact Proxy (:7878)              │
        │                                              │
Agent ─>│  1. Schema Optimizer    (TF-IDF selection)   │─> LLM Provider
        │  2. Content Compressors (TOON, JSON, code)   │
        │  3. Extractive Compress (TF-IDF sentences)   │
        │  4. Observation Masker  (history mgmt)       │
        │  5. Cache Aligner       (prefix caching)     │
        │                                              │
        └──────────────────────────────────────────────┘

8 transforms, each targeting a different content type. The pipeline adapts automatically — short contexts get light compression, long contexts get aggressive optimization. Sub-millisecond overhead.

Per-request control

Disable transforms for a single request without affecting other clients using the X-Kompact-Disable header:

# Anthropic SDK
client.messages.create(..., extra_headers={"X-Kompact-Disable": "toon,code_compressor"})

# OpenAI SDK
client.chat.completions.create(..., extra_headers={"X-Kompact-Disable": "toon,code_compressor"})

Comma-separated transform names: toon, json_crusher, code_compressor, log_compressor, content_compressor, observation_masker, cache_aligner, schema_optimizer.

Monitoring

Kompact exports OpenTelemetry metrics (on by default, disable with --no-otel). A Prometheus + Grafana stack is included:

cd monitoring
docker compose up -d

The dashboard shows request rate, token savings, compression ratio, pipeline latency percentiles, and per-transform breakdowns.

Running benchmarks

# Offline compression (no LLM calls, measures compression + needle preservation)
uv run python benchmarks/run_dataset_eval.py --dataset bfcl

# End-to-end quality (sends through proxy chain, measures LLM answer quality)
# Requires: claude-relay running on :8084, kompact on :7878
uv run python benchmarks/run_e2e_eval.py --dataset bfcl --model haiku --workers 20

See benchmarks/README.md for full methodology.

Development

uv sync --extra dev
uv run pytest          # 48 tests
uv run ruff check src/ tests/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kompact-0.4.0.tar.gz (90.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kompact-0.4.0-py3-none-any.whl (46.5 kB view details)

Uploaded Python 3

File details

Details for the file kompact-0.4.0.tar.gz.

File metadata

  • Download URL: kompact-0.4.0.tar.gz
  • Upload date:
  • Size: 90.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kompact-0.4.0.tar.gz
Algorithm Hash digest
SHA256 6b01176bfa95d615675e5c27d60d3a16148e587e098ee5ace6649ef205bb2b3a
MD5 556815e6d23c0f5a58846947974b451f
BLAKE2b-256 f4607f82bffe1fd9ba09676edaa9a1c14e8e661e9aec344481d7d9c411e64afc

See more details on using hashes here.

Provenance

The following attestation bundles were made for kompact-0.4.0.tar.gz:

Publisher: publish.yml on npow/kompact

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kompact-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: kompact-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 46.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kompact-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0d6fa576ee368332f7230dfd9dbb4353b653c48290f47ccf5315835208bd0906
MD5 9ac8dbcf152f3aea0e010324f68421e8
BLAKE2b-256 d95d8533b3e81a1b4d97b2d16383e461de24565e5cac874067f775b98721e2df

See more details on using hashes here.

Provenance

The following attestation bundles were made for kompact-0.4.0-py3-none-any.whl:

Publisher: publish.yml on npow/kompact

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page