Skip to main content

Local proxy that optimizes LLM context by 55-97%. Same quality, fraction of the cost.

Project description

Brevia

Save 55-93% on LLM tokens. Same quality, fraction of the cost.

Brevia is a local proxy that sits between your tools and the Anthropic API. It algorithmically optimizes context before it reaches Claude — cutting tokens by 55-97% on large contexts while maintaining full recall and producing more precise answers.

Works with Claude Code, Cursor, Continue, aider, and any tool that uses the Anthropic SDK.


Quick Start

pip install brevia
brevia login          # Opens browser — sign in with GitHub or Google
brevia serve          # Starts proxy on localhost:8420

Then add to your shell profile (~/.zshrc, ~/.bashrc):

export ANTHROPIC_BASE_URL=http://localhost:8420

That's it. Everything works exactly as before — but cheaper and often smarter.


How It Works

Your Tool (Claude Code, Cursor, etc.)
    │
    │  ANTHROPIC_BASE_URL=http://localhost:8420
    ▼
┌─────────────────────────────────┐
│         Brevia Proxy            │  ← Runs locally
│                                 │     Zero LLM cost
│  1. Analyze context structure   │     ~5ms latency
│  2. Score relevance by query    │
│  3. Extract key sections        │
│  4. Inject liberation prompt    │
└─────────────────────────────────┘
    │
    │  Optimized payload (55-97% smaller)
    ▼
┌─────────────────────────────────┐
│      api.anthropic.com          │  ← Your API key
│                                 │     Your account
│  Claude processes focused       │     You pay less
│  context = better answers       │
└─────────────────────────────────┘
    │
    │  Response streams back
    ▼
Your Tool (unchanged behavior)

Key insight: Less noise = better answers. When Claude sees 2.4k tokens of the RIGHT code instead of 95k tokens of everything, it produces more precise diagnoses.


Benchmarks

Tested against Claude Opus 4.6/4.7 on real codebases (Django, FastAPI, psf/requests):

Context Size Cost Savings Quality Impact
< 2k tokens 0% (passthrough) None
10k tokens ~55% None
50k tokens 76-93% None
95k tokens 76% Improved (more precise)

Real Code Analysis (4-Path Comparison)

Path Total Cost vs Direct
Direct Opus (full context) $0.836 baseline
Brevia + Opus $0.278 67% cheaper

Structured Data (50k token billing report)

Metric Value
Token reduction 97.4%
Cost savings 93.3%
Recall 1.0/1.0 (perfect)

Full benchmark methodology and raw data: benchmarks/BENCHMARKS.md


Commands

Command Description
brevia login Authenticate (opens browser)
brevia serve Start the proxy
brevia serve -p 9000 Start on custom port
brevia stats Show your savings stats
brevia stats -d 30 Show last 30 days
brevia logout Remove credentials

What You'll See

When brevia serve is running:

╭─ 🏛️  Brevia ──────────────────────────────────╮
│ Brevia is running                              │
│                                                │
│   Proxy:    http://127.0.0.1:8420              │
│   User:     @yourname                          │
│   Status:   Optimizing all Anthropic API calls │
│                                                │
│   Set this in your shell:                      │
│   export ANTHROPIC_BASE_URL=http://127.0.0.1:8420 │
╰────────────────────────────────────────────────╯

Run brevia stats anytime:

╭─ 📊 Brevia Stats ─────────────────────────────╮
│ All-time savings                               │
│                                                │
│   Days active:    12                           │
│   Total requests: 847                          │
│   Tokens saved:   4,230,000                    │
│   Avg reduction:  71%                          │
│   Est. $ saved:   $63.45                       │
╰────────────────────────────────────────────────╯

Where Brevia Helps Most

  • Large contexts (50k+ tokens): 76-97% savings with equal or better quality
  • Noisy contexts: Relevant info buried in boilerplate — Brevia extracts what matters
  • Multi-file contexts: Only sends relevant files to Claude

Where Brevia Does NOT Help

  • Tiny contexts (< 2k tokens): Passed through unchanged (no overhead)
  • Already-focused queries: If you're already sending only relevant code, nothing to cut
  • Full-file reasoning tasks: Some tasks need the entire file flow

Privacy & Security

  • Your API key is passed through — Brevia never stores it
  • Optimization happens locally — your code never leaves your machine
  • Telemetry is aggregated stats only: token counts, savings, request count
  • No content is ever sent to Brevia servers
  • Credentials stored in ~/.brevia/ with restricted permissions

Platform Support

  • macOS (Intel + Apple Silicon)
  • Linux (x86_64 + ARM64)
  • Windows 10+

Requires Python 3.10+.


Enterprise

Need team-wide deployment, custom optimization rules, or priority support?

Contact us: enterprise@brevia.dev


How It's Different

Brevia Prompt caching Summarization
Approach Algorithmic extraction Cache repeated prefixes LLM summarizes context
Cost Zero (local CPU) Reduced on cache hit Adds an LLM call
Latency ~5ms None on hit +1-3s per call
Quality Equal or better Same Often degrades
Works with Any Anthropic tool SDK only Custom code only

Brevia stacks with prompt caching — use both for maximum savings.


License

MIT


Built by engineers who got tired of paying for noise.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

breviadev-0.1.0.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

breviadev-0.1.0-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file breviadev-0.1.0.tar.gz.

File metadata

  • Download URL: breviadev-0.1.0.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for breviadev-0.1.0.tar.gz
Algorithm Hash digest
SHA256 01aaeeac72d69f253e09cac02ad62ff64fd0ce78beb1e357b31dd9f9c4ad6a4e
MD5 c1b258563686ea95ce3f74f8e3696d78
BLAKE2b-256 9708241c6407dfabc11ca0b37823f0548cb6210c355f0f49607334cd22542c98

See more details on using hashes here.

File details

Details for the file breviadev-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: breviadev-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for breviadev-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e6cbe239d3afd9f90216b9a08d0c1a1d9be65f6a61073078c1ca8cec98e446ef
MD5 c79207067303bb8df08942c7be0c2b89
BLAKE2b-256 2b8a3761314c5888ee13b610f6a6988439c528b793c98ad6e8080f5724ca337c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page