Skip to main content

SuperCompress — learned context compression for LLMs.

Project description

SuperCompress

Learned context compression for LLMs — trim long prompts before inference with a small CPU policy, measurable quality vs baselines, and documented environmental impact.

GitHub stars PyPI version Python License Tests

Live site supercompress.vercel.app
Documentation arjunkshah-supercompress-55.mintlify.app
API dashboard /dashboard on the live site
Hosted API Same origin on Vercel — /api/health, /api/v1/compress, dashboard at /dashboard

SuperCompress architecture: context and question enter, compression policy keeps answer-critical lines, compressed context sent to LLM

Open the interactive playground →

Why SuperCompress?

Long agent context is expensive. Blind truncation keeps head and tail but drops answers in the middle. SuperCompress learns which lines to keep for the current question — under a fixed token budget.

Metric SuperCompress Truncation / FIFO
KV savings @ 35% budget ~65% ~65%
Oracle recall 100% ~25%
Policy size ~5K params rule-based
Runs on CPU (pre-inference) CPU

At 1M compressions (est.): ~800M tokens avoided · 29 kWh · 12 kg CO₂ — see the environment guide.

Hosted API (Vercel)

The live site ships serverless API routes backed by Vercel Blob for key storage. No separate deploy step — push to main and Vercel builds static web/ plus api/.

Optional self-host: Docker, Fly.io (fly.toml), or Render (render.yaml) for the Python FastAPI stack.


Quick start

Hosted API (key + package)

pip install git+https://github.com/arjunkshah/supercompress.git
export SUPERCOMPRESS_API_KEY=sc_live_YOUR_KEY
from supercompress import SuperCompress

out = SuperCompress().compress(context, "Your question")
print(out.compressed_text)

Get a key at supercompress.vercel.app/dashboard.

Install (local compression)

pip install git+https://github.com/arjunkshah/supercompress.git
# local dev + tests + API server
pip install -e ".[dev,serve]"

Python (in-process)

from supercompress import compress_context, compare_policies

result = compress_context(
    "long context text…",
    "What does fetch return when the row is missing?",
    budget_ratio=0.35,
)
print(result.compressed_text)
print(f"{result.kv_savings_pct:.1f}% KV saved · {result.kept_tokens}/{result.original_tokens} tokens")

Hosted API (recommended)

1. Get a keydashboard → Create key → copy sc_live_…

2. Install & call (stdlib HTTP client — no local PyTorch needed for the API):

pip install git+https://github.com/arjunkshah/supercompress.git
export SUPERCOMPRESS_API_KEY=sc_live_YOUR_KEY
from supercompress import SuperCompress

sc = SuperCompress()  # reads SUPERCOMPRESS_API_KEY
out = sc.compress("long context…", "What does fetch return?")
print(out.compressed_text)  # send to your LLM

Or raw HTTP:

curl -X POST https://supercompress.vercel.app/api/v1/compress \
  -H "X-API-Key: sc_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"context":"…","query":"Summarize","budget_ratio":0.35}'

On the live site, the dashboard hits the same origin — no SC_API_BASE config needed.

Local dev (no Firebase):

SC_AUTH_DEV=1 SC_KEY_STORE=memory python scripts/local_web_server.py
# → http://127.0.0.1:8790/dashboard

Deploy API (Docker / Fly.io / Render): see the API dashboard guide.

Browser demo

Open web/index.html or deploy the static web/ folder. Compression runs client-side — no API key required for the playground.


Documentation

Full docs: arjunkshah-supercompress-55.mintlify.app

Doc Description
Quickstart First compression in minutes
API reference Python + HTTP endpoints
API dashboard Keys, auth, usage
Integrations OpenAI, LangChain, LlamaIndex
Environment kWh / CO₂ methodology

Repo copies also live under docs/.


Benchmarks

python scripts/benchmark_web.py    # regenerates web/assets/data/benchmarks.json
python scripts/generate_charts.py  # SVG charts for landing page
pytest tests/ -q                   # 65 tests

Full benchmarks: supercompress.vercel.app/benchmarks

Policy comparison (8 seeds, budget 0.35):

Policy Oracle recall Entity recall Latency
FIFO / Truncation 25% 73% ~57 ms
Summarization 61% 65% ~63 ms
H2O 98% 73% ~56 ms
SuperCompress 100% 73% ~60 ms

Charts: web/assets/img/chart-kv-savings.svg, chart-oracle-recall.svg, chart-impact.svg


Project layout

supercompress/          # Core library (~5K-param policy, baselines)
  api/                  # Hosted API — keys, Firebase auth, usage
web/                    # Landing page + browser demo + dashboard
scripts/                # benchmark_web.py, local_web_server.py, charts
tests/                  # test_supercompress, test_api_hard, test_api_server
checkpoints/default.pt  # Trained weights (included)
docs/                   # API, integrations, environment, dashboard

Development

git clone https://github.com/arjunkshah/supercompress.git
cd supercompress
pip install -e ".[dev,serve]"
pytest tests/ -q
python scripts/local_web_server.py   # optional: /dashboard, /v1/compress

Optional extras:

pip install -e ".[firebase]"   # Firebase Admin for production key store

NOTE: I DO NOT GIVE AYUSH ROUT (github.com/ayushrout12) ANY PERMISSION TO COPY OR USE MY PRODUCT IN ANY WAY, SHAPE, OR FORM. I DO NOT GIVE HIM CONSENT TO FORK, REFERENCE, OR CLONE/REFERENCE/USE THIS REPO IN ANY WAY, SHAPE, OR FORM.

What we claim (and don't)

We claim: learned CPU eviction beats truncation on oracle recall at similar KV savings; documented environmental estimates; reproducible benchmarks and tests.

We don't claim: live datacenter metering; CO₂ numbers without documented assumptions; that every workload matches benchmark seeds.


License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

supercompress-0.5.0.tar.gz (36.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

supercompress-0.5.0-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file supercompress-0.5.0.tar.gz.

File metadata

  • Download URL: supercompress-0.5.0.tar.gz
  • Upload date:
  • Size: 36.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for supercompress-0.5.0.tar.gz
Algorithm Hash digest
SHA256 df0eac3b8593c3d8fdb1e1832af3b2204b593f2bcd21f232af9677a02ecb3aca
MD5 c820aa0dacfd4c5211a1fde1c967814d
BLAKE2b-256 6c7f5dc0b0414852917f98ddc39f331e7652f71a0d6ca7594e4901a0b3ead198

See more details on using hashes here.

File details

Details for the file supercompress-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: supercompress-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for supercompress-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5e423876eb416c5e833dd2e0fb39aa72d5fff5c5352836d073b027a5b78b525b
MD5 85de9003e334cd57a0c88c8bb8bb257c
BLAKE2b-256 82646186e86ffd983c9e3033ebc0c82b6da86b47da490173a4173399fc3261af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page