Skip to main content

Multi-layer context optimization proxy for LLM agents

Project description

Kompact

CI PyPI Python 3.10+ Docs

Context compression proxy for LLM agents. Sits between your agent and the LLM provider, compresses context on the fly, and cuts your token bill 40-70% — with zero code changes.

Save real money

For a team running 1,000 agentic requests/day with ~10K token contexts:

Model Without Kompact With Kompact Monthly Savings
Sonnet ($3/M) $900/mo $405/mo $495/mo
Opus ($15/M) $4,500/mo $2,025/mo $2,475/mo
GPT-4o ($2.50/M) $750/mo $338/mo $412/mo

Savings scale linearly. 10K requests/day = 10x the numbers above.

Get started in 30 seconds

pip install kompact   # or: uv add kompact
kompact proxy --port 7878
export ANTHROPIC_BASE_URL=http://localhost:7878
# That's it. Your agent now uses fewer tokens.

No SDK changes. No prompt rewriting. Just point your base URL at the proxy.

Quality stays intact

Evaluated on BFCL (1,431 real API schemas) — the standard benchmark for tool-calling agents. End-to-end through Claude, scored with context-bench.

Quality impact vs no compression (closer to 0% = better):

Model Kompact Headroom LLMLingua-2
Haiku -2.6% -3.0% -23.4%
Sonnet -3.9% -3.5% -20.6%
Opus -0.5% -0.5% -27.3%

Kompact and Headroom both stay within ~3% of baseline. LLMLingua-2 destroys tool schemas regardless of model (-20 to -27%).

Compression across content types

Measured offline on 12,795 examples across 3 datasets:

Dataset Examples Kompact Headroom LLMLingua-2
BFCL (tool schemas) 1,431 55.3% ~0% 55.4%
Glaive (tool calling) 3,959 56.6% ~0% ~50%
HotpotQA (prose QA) 7,405 17.9% ~0% 49.9%

Headroom's SmartCrusher doesn't compress JSON — it's designed for prose. LLMLingua-2 compresses aggressively but destroys information (see quality table above).

How it works

Kompact is a transparent HTTP proxy. It intercepts LLM API requests, compresses the context, then forwards to the provider.

        ┌──────────────────────────────────────────────┐
        │           Kompact Proxy (:7878)              │
        │                                              │
Agent ─>│  1. Schema Optimizer    (TF-IDF selection)   │─> LLM Provider
        │  2. Content Compressors (TOON, JSON, code)   │
        │  3. Extractive Compress (TF-IDF sentences)   │
        │  4. Observation Masker  (history mgmt)       │
        │  5. Cache Aligner       (prefix caching)     │
        │                                              │
        └──────────────────────────────────────────────┘

8 transforms, each targeting a different content type. The pipeline adapts automatically — short contexts get light compression, long contexts get aggressive optimization. Sub-millisecond overhead.

Per-request control

Disable transforms for a single request without affecting other clients using the X-Kompact-Disable header:

# Anthropic SDK
client.messages.create(..., extra_headers={"X-Kompact-Disable": "toon,code_compressor"})

# OpenAI SDK
client.chat.completions.create(..., extra_headers={"X-Kompact-Disable": "toon,code_compressor"})

Comma-separated transform names: toon, json_crusher, code_compressor, log_compressor, content_compressor, observation_masker, cache_aligner, schema_optimizer.

Running benchmarks

# Offline compression (no LLM calls, measures compression + needle preservation)
uv run python benchmarks/run_dataset_eval.py --dataset bfcl

# End-to-end quality (sends through proxy chain, measures LLM answer quality)
# Requires: claude-relay running on :8084, kompact on :7878
uv run python benchmarks/run_e2e_eval.py --dataset bfcl --model haiku --workers 20

See benchmarks/README.md for full methodology.

Development

uv sync --extra dev
uv run pytest          # 48 tests
uv run ruff check src/ tests/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kompact-0.2.0.tar.gz (79.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kompact-0.2.0-py3-none-any.whl (39.3 kB view details)

Uploaded Python 3

File details

Details for the file kompact-0.2.0.tar.gz.

File metadata

  • Download URL: kompact-0.2.0.tar.gz
  • Upload date:
  • Size: 79.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kompact-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9d739ff1fe312510f01646bd5e35e3931cd9a2bf3f03100014b2d1fe6c9088f5
MD5 0e56e3d2539eb2640650a1557abd073c
BLAKE2b-256 18a752bbc4bdc57af7858155bcc89dc3b4f716bc4ffcc727825c4018c84112a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for kompact-0.2.0.tar.gz:

Publisher: publish.yml on npow/kompact

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kompact-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: kompact-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 39.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kompact-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e351b5e5efbce547414c36d10161e802bd0618a3a7356e62090e61772f09ae1
MD5 de1a86f4d921cbbcfb6853d429e6142d
BLAKE2b-256 9f155f7f1023997d5b025ff36bad596a213de412bde168e08084a0c89cd8022e

See more details on using hashes here.

Provenance

The following attestation bundles were made for kompact-0.2.0-py3-none-any.whl:

Publisher: publish.yml on npow/kompact

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page