Local proxy that optimizes LLM context by 55-97%. Same quality, fraction of the cost.
Project description
Brevia
Save 55-93% on LLM tokens. Same quality, fraction of the cost.
Brevia is a local proxy that sits between your tools and the Anthropic API. It algorithmically optimizes context before it reaches Claude — cutting tokens by 55-97% on large contexts while maintaining full recall and producing more precise answers.
Works with Claude Code, Cursor, Continue, aider, and any tool that uses the Anthropic SDK.
Quick Start
pip install brevia
brevia login # Opens browser — sign in with GitHub or Google
brevia serve # Starts proxy on localhost:8420
Then add to your shell profile (~/.zshrc, ~/.bashrc):
export ANTHROPIC_BASE_URL=http://localhost:8420
That's it. Everything works exactly as before — but cheaper and often smarter.
How It Works
Your Tool (Claude Code, Cursor, etc.)
│
│ ANTHROPIC_BASE_URL=http://localhost:8420
▼
┌─────────────────────────────────┐
│ Brevia Proxy │ ← Runs locally
│ │ Zero LLM cost
│ 1. Analyze context structure │ ~5ms latency
│ 2. Score relevance by query │
│ 3. Extract key sections │
│ 4. Inject liberation prompt │
└─────────────────────────────────┘
│
│ Optimized payload (55-97% smaller)
▼
┌─────────────────────────────────┐
│ api.anthropic.com │ ← Your API key
│ │ Your account
│ Claude processes focused │ You pay less
│ context = better answers │
└─────────────────────────────────┘
│
│ Response streams back
▼
Your Tool (unchanged behavior)
Key insight: Less noise = better answers. When Claude sees 2.4k tokens of the RIGHT code instead of 95k tokens of everything, it produces more precise diagnoses.
Benchmarks
Tested against Claude Opus 4.6/4.7 on real codebases (Django, FastAPI, psf/requests):
| Context Size | Cost Savings | Quality Impact |
|---|---|---|
| < 2k tokens | 0% (passthrough) | None |
| 10k tokens | ~55% | None |
| 50k tokens | 76-93% | None |
| 95k tokens | 76% | Improved (more precise) |
Real Code Analysis (4-Path Comparison)
| Path | Total Cost | vs Direct |
|---|---|---|
| Direct Opus (full context) | $0.836 | baseline |
| Brevia + Opus | $0.278 | 67% cheaper |
Structured Data (50k token billing report)
| Metric | Value |
|---|---|
| Token reduction | 97.4% |
| Cost savings | 93.3% |
| Recall | 1.0/1.0 (perfect) |
Full benchmark methodology and raw data: benchmarks/BENCHMARKS.md
Commands
| Command | Description |
|---|---|
brevia login |
Authenticate (opens browser) |
brevia serve |
Start the proxy |
brevia serve -p 9000 |
Start on custom port |
brevia stats |
Show your savings stats |
brevia stats -d 30 |
Show last 30 days |
brevia logout |
Remove credentials |
What You'll See
When brevia serve is running:
╭─ 🏛️ Brevia ──────────────────────────────────╮
│ Brevia is running │
│ │
│ Proxy: http://127.0.0.1:8420 │
│ User: @yourname │
│ Status: Optimizing all Anthropic API calls │
│ │
│ Set this in your shell: │
│ export ANTHROPIC_BASE_URL=http://127.0.0.1:8420 │
╰────────────────────────────────────────────────╯
Run brevia stats anytime:
╭─ 📊 Brevia Stats ─────────────────────────────╮
│ All-time savings │
│ │
│ Days active: 12 │
│ Total requests: 847 │
│ Tokens saved: 4,230,000 │
│ Avg reduction: 71% │
│ Est. $ saved: $63.45 │
╰────────────────────────────────────────────────╯
Where Brevia Helps Most
- Large contexts (50k+ tokens): 76-97% savings with equal or better quality
- Noisy contexts: Relevant info buried in boilerplate — Brevia extracts what matters
- Multi-file contexts: Only sends relevant files to Claude
Where Brevia Does NOT Help
- Tiny contexts (< 2k tokens): Passed through unchanged (no overhead)
- Already-focused queries: If you're already sending only relevant code, nothing to cut
- Full-file reasoning tasks: Some tasks need the entire file flow
Privacy & Security
- Your API key is passed through — Brevia never stores it
- Optimization happens locally — your code never leaves your machine
- Telemetry is aggregated stats only: token counts, savings, request count
- No content is ever sent to Brevia servers
- Credentials stored in
~/.brevia/with restricted permissions
Platform Support
- macOS (Intel + Apple Silicon)
- Linux (x86_64 + ARM64)
- Windows 10+
Requires Python 3.10+.
Enterprise
Need team-wide deployment, custom optimization rules, or priority support?
Contact us: enterprise@brevia.dev
How It's Different
| Brevia | Prompt caching | Summarization | |
|---|---|---|---|
| Approach | Algorithmic extraction | Cache repeated prefixes | LLM summarizes context |
| Cost | Zero (local CPU) | Reduced on cache hit | Adds an LLM call |
| Latency | ~5ms | None on hit | +1-3s per call |
| Quality | Equal or better | Same | Often degrades |
| Works with | Any Anthropic tool | SDK only | Custom code only |
Brevia stacks with prompt caching — use both for maximum savings.
License
MIT
Built by engineers who got tired of paying for noise.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file breviadev-0.1.0.tar.gz.
File metadata
- Download URL: breviadev-0.1.0.tar.gz
- Upload date:
- Size: 20.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01aaeeac72d69f253e09cac02ad62ff64fd0ce78beb1e357b31dd9f9c4ad6a4e
|
|
| MD5 |
c1b258563686ea95ce3f74f8e3696d78
|
|
| BLAKE2b-256 |
9708241c6407dfabc11ca0b37823f0548cb6210c355f0f49607334cd22542c98
|
File details
Details for the file breviadev-0.1.0-py3-none-any.whl.
File metadata
- Download URL: breviadev-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6cbe239d3afd9f90216b9a08d0c1a1d9be65f6a61073078c1ca8cec98e446ef
|
|
| MD5 |
c79207067303bb8df08942c7be0c2b89
|
|
| BLAKE2b-256 |
2b8a3761314c5888ee13b610f6a6988439c528b793c98ad6e8080f5724ca337c
|