Save 55-93% on Claude API tokens. Same quality, fraction of the cost.
Project description
Brevia
Use Claude smarter. Save 55-93% on tokens without losing quality.
Brevia is a local proxy that makes your Claude API calls cheaper and often better. It runs between your tools and Anthropic, automatically trimming unnecessary context so Claude focuses on what actually matters.
Works with Claude Code, Cursor, Continue, aider, and anything that uses the Anthropic SDK. No code changes needed.
Quick Start
pip install breviadev
brevia login # Opens browser — sign in with GitHub or Google
brevia serve # Starts on localhost:8420
Then add to your shell profile (~/.zshrc, ~/.bashrc):
export ANTHROPIC_BASE_URL=http://localhost:8420
That's it. Everything works exactly as before — but cheaper and often better.
How It Works
Your Tool (Claude Code, Cursor, etc.)
│
│ ANTHROPIC_BASE_URL=http://localhost:8420
▼
┌─────────────────────────────────┐
│ Brevia │ ← Runs on your machine
│ │
│ Reads your request, figures │
│ out what's relevant, removes │
│ the noise, and enhances the │
│ prompt for better results. │
└─────────────────────────────────┘
│
│ Smaller, focused payload
▼
┌─────────────────────────────────┐
│ api.anthropic.com │ ← Your API key
│ │ Your account
│ Claude gets less noise, │ You pay less
│ gives better answers. │
└─────────────────────────────────┘
│
│ Response streams back
▼
Your Tool (unchanged behavior)
Why it works: When Claude gets 2k tokens of the right code instead of 95k tokens of everything, it gives more precise answers. Less noise in, better signal out.
Benchmarks
Tested on real codebases (Django, FastAPI, psf/requests) with Claude Opus:
| Context Size | Cost Savings | Quality |
|---|---|---|
| < 2k tokens | 0% (passthrough) | Same |
| 10k tokens | ~55% | Same |
| 50k tokens | 76-93% | Same |
| 95k tokens | 76% | Better (more precise) |
Real-World Example
| Setup | Total Cost | Compared to Direct |
|---|---|---|
| Direct Claude (full context) | $0.836 | — |
| With Brevia | $0.278 | 67% cheaper |
Large Data (50k token input)
| Metric | Value |
|---|---|
| Token reduction | 97% |
| Cost savings | 93% |
| Accuracy | Perfect (found all issues) |
Full benchmark details: benchmarks/BENCHMARKS.md
Commands
| Command | What it does |
|---|---|
brevia login |
Sign in (opens browser) |
brevia serve |
Start Brevia |
brevia serve -p 9000 |
Start on a different port |
brevia stats |
See how much you've saved |
brevia stats -d 30 |
See last 30 days |
brevia logout |
Sign out |
What You'll See
When brevia serve is running:
╭─ Brevia ─────────────────────────────────────╮
│ Brevia is running │
│ │
│ Address: http://127.0.0.1:8420 │
│ User: @yourname │
│ Status: Active │
│ │
│ Add to your shell: │
│ export ANTHROPIC_BASE_URL=http://127.0.0.1:8420 │
╰───────────────────────────────────────────────╯
Check your savings anytime with brevia stats:
╭─ Brevia Stats ────────────────────────────────╮
│ │
│ Days active: 12 │
│ Total requests: 847 │
│ Tokens saved: 4,230,000 │
│ Avg reduction: 71% │
│ Est. $ saved: $63.45 │
╰───────────────────────────────────────────────╯
Where It Helps Most
- Big contexts (50k+ tokens): The more noise, the more Brevia saves
- Multi-file projects: Keeps only the files that matter for your question
- Repetitive code: Strips boilerplate so Claude focuses on the real problem
Where It Doesn't Help
- Short prompts (< 2k tokens): Already small — Brevia passes these through unchanged
- Already focused: If you're manually sending only relevant code, there's nothing to trim
Privacy & Security
- Your API key stays yours — Brevia passes it through, never stores it
- Your code stays local — nothing leaves your machine
- We only collect usage stats — token counts and savings, never content
- Credentials are stored locally in
~/.brevia/with restricted file permissions
Platforms
- macOS (Intel + Apple Silicon)
- Linux (x86_64 + ARM64)
- Windows 10+
Requires Python 3.10+.
Enterprise
Need team-wide deployment, custom rules, or dedicated support?
Contact us: enterprise@brevia.dev
How It Compares
| Brevia | Prompt Caching | Manual Trimming | |
|---|---|---|---|
| Setup | One command | Built into SDK | You do it yourself |
| Effort | Zero — automatic | Zero — automatic | High — manual work |
| Savings | 55-93% | Varies (cache hits only) | Depends on you |
| Quality | Same or better | Same | Risk of cutting too much |
| Works with | Any Anthropic tool | SDK only | Your code only |
Brevia works alongside prompt caching — use both for maximum savings.
License
Proprietary. Free for individual use. See LICENSE for details.
Built for developers who'd rather spend money on building, not on sending noise to an API.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file breviadev-0.1.1.tar.gz.
File metadata
- Download URL: breviadev-0.1.1.tar.gz
- Upload date:
- Size: 19.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9525de966ea4851a3d559eff0881f56a106c8d54cb51d23b864ea7df5b1d797c
|
|
| MD5 |
1d82b6b4323f22c3cc30aa1c02480932
|
|
| BLAKE2b-256 |
224c7e05fffc4e6fa0a1f53f5aad1c4a0eab9c288918e1038fe138c5c7c8e604
|
File details
Details for the file breviadev-0.1.1-py3-none-any.whl.
File metadata
- Download URL: breviadev-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec05949d1be936d17b3035acbfbd683c7baca676468908ee6af8b491b3c08d6f
|
|
| MD5 |
f1d8d34316c78635f5e87c756852bbde
|
|
| BLAKE2b-256 |
a354708c6a609d0d453b8141610f3157243e8d8095117ff2b5b49bb3c2f26c73
|