SuperCompress — learned context compression for LLMs.
Project description
SuperCompress
Learned context compression for LLMs — trim long prompts before inference with a small CPU policy, measurable quality vs baselines, and documented environmental impact.
| Live site | supercompress.vercel.app |
| Documentation | arjunkshah-supercompress-55.mintlify.app |
| API dashboard | /dashboard on the live site |
| Hosted API | Same origin on Vercel — /api/health, /api/v1/compress, dashboard at /dashboard |
Open the interactive playground →
Why SuperCompress?
Long agent context is expensive. Blind truncation keeps head and tail but drops answers in the middle. SuperCompress learns which lines to keep for the current question — under a fixed token budget.
| Metric | SuperCompress | Truncation / FIFO |
|---|---|---|
| KV savings @ 35% budget | ~65% | ~65% |
| Oracle recall | 100% | ~25% |
| Policy size | ~5K params | rule-based |
| Runs on | CPU (pre-inference) | CPU |
At 1M compressions (est.): ~800M tokens avoided · 29 kWh · 12 kg CO₂ — see the environment guide.
Hosted API (Vercel)
The live site ships serverless API routes backed by Vercel Blob for key storage. No separate deploy step — push to main and Vercel builds static web/ plus api/.
Optional self-host: Docker, Fly.io (fly.toml), or Render (render.yaml) for the Python FastAPI stack.
Quick start
Hosted API (key + package)
pip install git+https://github.com/arjunkshah/supercompress.git
export SUPERCOMPRESS_API_KEY=sc_live_YOUR_KEY
from supercompress import SuperCompress
out = SuperCompress().compress(context, "Your question")
print(out.compressed_text)
Get a key at supercompress.vercel.app/dashboard.
Install (local compression)
pip install git+https://github.com/arjunkshah/supercompress.git
# local dev + tests + API server
pip install -e ".[dev,serve]"
Python (in-process)
from supercompress import compress_context, compare_policies
result = compress_context(
"long context text…",
"What does fetch return when the row is missing?",
budget_ratio=0.35,
)
print(result.compressed_text)
print(f"{result.kv_savings_pct:.1f}% KV saved · {result.kept_tokens}/{result.original_tokens} tokens")
Hosted API (recommended)
1. Get a key — dashboard → Create key → copy sc_live_…
2. Install & call (stdlib HTTP client — no local PyTorch needed for the API):
pip install git+https://github.com/arjunkshah/supercompress.git
export SUPERCOMPRESS_API_KEY=sc_live_YOUR_KEY
from supercompress import SuperCompress
sc = SuperCompress() # reads SUPERCOMPRESS_API_KEY
out = sc.compress("long context…", "What does fetch return?")
print(out.compressed_text) # send to your LLM
Or raw HTTP:
curl -X POST https://supercompress.vercel.app/api/v1/compress \
-H "X-API-Key: sc_live_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"context":"…","query":"Summarize","budget_ratio":0.35}'
On the live site, the dashboard hits the same origin — no SC_API_BASE config needed.
Local dev (no Firebase):
SC_AUTH_DEV=1 SC_KEY_STORE=memory python scripts/local_web_server.py
# → http://127.0.0.1:8790/dashboard
Deploy API (Docker / Fly.io / Render): see the API dashboard guide.
Browser demo
Open web/index.html or deploy the static web/ folder. Compression runs client-side — no API key required for the playground.
Documentation
Full docs: arjunkshah-supercompress-55.mintlify.app
| Doc | Description |
|---|---|
| Quickstart | First compression in minutes |
| API reference | Python + HTTP endpoints |
| API dashboard | Keys, auth, usage |
| Integrations | OpenAI, LangChain, LlamaIndex |
| Environment | kWh / CO₂ methodology |
Repo copies also live under docs/.
Benchmarks
python scripts/benchmark_web.py # regenerates web/assets/data/benchmarks.json
python scripts/generate_charts.py # SVG charts for landing page
pytest tests/ -q # 65 tests
Full benchmarks: supercompress.vercel.app/benchmarks
Policy comparison (8 seeds, budget 0.35):
| Policy | Oracle recall | Entity recall | Latency |
|---|---|---|---|
| FIFO / Truncation | 25% | 73% | ~57 ms |
| Summarization | 61% | 65% | ~63 ms |
| H2O | 98% | 73% | ~56 ms |
| SuperCompress | 100% | 73% | ~60 ms |
Charts: web/assets/img/chart-kv-savings.svg, chart-oracle-recall.svg, chart-impact.svg
Project layout
supercompress/ # Core library (~5K-param policy, baselines)
api/ # Hosted API — keys, Firebase auth, usage
web/ # Landing page + browser demo + dashboard
scripts/ # benchmark_web.py, local_web_server.py, charts
tests/ # test_supercompress, test_api_hard, test_api_server
checkpoints/default.pt # Trained weights (included)
docs/ # API, integrations, environment, dashboard
Development
git clone https://github.com/arjunkshah/supercompress.git
cd supercompress
pip install -e ".[dev,serve]"
pytest tests/ -q
python scripts/local_web_server.py # optional: /dashboard, /v1/compress
Optional extras:
pip install -e ".[firebase]" # Firebase Admin for production key store
NOTE: I DO NOT GIVE AYUSH ROUT (github.com/ayushrout12) ANY PERMISSION TO COPY OR USE MY PRODUCT IN ANY WAY, SHAPE, OR FORM. I DO NOT GIVE HIM CONSENT TO FORK, REFERENCE, OR CLONE/REFERENCE/USE THIS REPO IN ANY WAY, SHAPE, OR FORM.
What we claim (and don't)
We claim: learned CPU eviction beats truncation on oracle recall at similar KV savings; documented environmental estimates; reproducible benchmarks and tests.
We don't claim: live datacenter metering; CO₂ numbers without documented assumptions; that every workload matches benchmark seeds.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file supercompress-0.5.0.tar.gz.
File metadata
- Download URL: supercompress-0.5.0.tar.gz
- Upload date:
- Size: 36.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df0eac3b8593c3d8fdb1e1832af3b2204b593f2bcd21f232af9677a02ecb3aca
|
|
| MD5 |
c820aa0dacfd4c5211a1fde1c967814d
|
|
| BLAKE2b-256 |
6c7f5dc0b0414852917f98ddc39f331e7652f71a0d6ca7594e4901a0b3ead198
|
File details
Details for the file supercompress-0.5.0-py3-none-any.whl.
File metadata
- Download URL: supercompress-0.5.0-py3-none-any.whl
- Upload date:
- Size: 33.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e423876eb416c5e833dd2e0fb39aa72d5fff5c5352836d073b027a5b78b525b
|
|
| MD5 |
85de9003e334cd57a0c88c8bb8bb257c
|
|
| BLAKE2b-256 |
82646186e86ffd983c9e3033ebc0c82b6da86b47da490173a4173399fc3261af
|