Compress LLM output safely. Save tokens without breaking your code.
Project description
Brevix
Compress LLM output safely. Save tokens without breaking your code.
Install · Usage · Cost Routing · Accuracy Guard · Benchmarks · Roadmap · Contributing
Cuts response tokens 40–75% with a deterministic rule engine. Routes each task to the cheapest capable Claude tier — saving up to ~80% of API spend. Verifies meaning is preserved before emit. Works across 20+ AI coding tools.
Why Brevix
Brevix is two cost-saving layers in one tool:
- Output compression — cuts response tokens 40–75% with a rule engine that's verified by a local Accuracy Guard. No silent meaning loss on dense technical prose.
- Cost management & smart model routing — picks the cheapest Claude tier (Haiku / Sonnet / Opus) per task, escalates only on low-confidence answers, and enforces token + USD budget caps. Typical savings: ~80% of API spend vs always running Opus.
Works with Claude Code · Cursor · Windsurf · OpenAI Codex CLI · Google Antigravity · Gemini CLI · GitHub Copilot Chat · Aider · Continue.dev · Cline · Roo Code · Zed AI · Augment · Kilo · OpenHands · Tabnine · Warp · Replit · Sourcegraph Amp — plus any tool reading AGENTS.md.
💰 Cost Management & Model Routing
What it does: Stops you from paying Opus prices for tasks Haiku can solve in a single shot.
The problem
Most LLM coding tools call the most expensive model for every prompt. A task like "classify this support ticket" hits Opus at $0.05 per call, when Haiku could answer it for $0.0002 — 250× cheaper, identical answer.
The fix
Brevix Route is a thin layer in front of the Anthropic SDK that:
- Classifies the task (regex/keyword-based, ~6μs, no API call).
- Picks the cheapest capable model from a
task → modelrule table you control. - Optionally scores the response for confidence (hedge phrases, validity, length, optional semantic check).
- Escalates to the next tier (Haiku → Sonnet → Opus) only when confidence is below threshold.
- Records every call to a local JSONL log so you can audit exactly what was saved.
How to use it
Option A — From your code (Python)
from brevix import RoutedClient
client = RoutedClient(log_enabled=True)
# Cheap task -> auto-routed to Haiku.
r = client.call("Classify this ticket: app crashes on login")
print(r.text, r.model, r.cost_usd)
# > "bug" claude-haiku-4-5 0.000028
# Hard task -> auto-routed to Opus.
r = client.call("Architect a multi-region payment system with sub-100ms p99 latency")
print(r.model)
# > claude-opus-4-7
# Confidence-guarded mode: retries on next tier only if the first answer hedges.
r = client.call("Review this auth middleware for token expiry bugs", confidence_check=True)
print(r.model, r.escalations, r.confidence)
# > claude-opus-4-7 1 0.94
Option B — From the CLI
brevix route --init # one-time: write default config
brevix route "classify ticket: app crashes" --explain # dry-run, show decision + cost
brevix route "..." --call # actually call the model
brevix route "..." --call --confidence # call + escalate on low conf
brevix route --budget-show # current spend
brevix stats --routing --since 7d # weekly cost-saved report
brevix route --learn-suggest # recommend rule changes
brevix route --learn-apply # apply them
Option C — From Claude Code (slash commands)
After /plugin install brevix@brevix:
/brevix-route classify this support ticket: "app crashes on login"
↓
[brevix] task=classify tier=haiku escalated=0
bug
/brevix-route architect a multi-region payment system with sub-100ms p99 latency
↓
[brevix] task=architecture tier=opus escalated=0
<full Opus response>
/brevix-route-stats --since 7d # cost saved vs Opus-only baseline
/brevix-learn # suggested rule changes
/brevix-learn apply # apply them
Hard budget cap
Edit ~/.brevix/route.json:
{
"budget": { "tokens": 0, "cost_usd": 50.00 }
}
When the cap is hit, the next call raises BudgetExceededError instead of silently overspending.
Or per-run from the CLI:
brevix route --budget-cost 5.00 --call "your prompt"
Real-world savings
| Workload | Naive (Opus only) | Brevix routed | Saved |
|---|---|---|---|
| Solo dev, 50 calls/day, mostly easy tasks | $0.40/day | $0.06/day | 85% |
| Team of 5, 500 calls/day, mixed | $4.00/day | $0.80/day | 80% |
| Customer-support bot, 10k tickets/day, 90% classify | $80/day | $1.50/day | 98% |
Same answer quality — confidence guard ensures hard cases still hit Opus.
See the full Routing tutorial in
## ⚡ Usagefor advanced flags, force-tier overrides, and the auto-tuning learn loop.
Features
Compression Engine
|
Safety & Verification
|
Integrations
|
Insights
|
💰 Cost Management & Model Routing — new
|
|
🚀 Install
One-liner — macOS / Linux / WSL
curl -fsSL https://raw.githubusercontent.com/Yash-Koladiya30/brevix/main/install.sh | bash -s -- --all
Windows — PowerShell
irm https://raw.githubusercontent.com/Yash-Koladiya30/brevix/main/install.ps1 | iex
skills CLI — one command, 9 tools at once
Auto-installs Brevix skills into Antigravity, Claude Code, Cline, Codex, Cursor, Gemini CLI, GitHub Copilot, Kiro CLI, and Qoder simultaneously.
npx skills add https://github.com/Yash-Koladiya30/brevix
Pick a specific skill:
npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix
npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix-commit
npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix-stats
Listing → skills.sh/Yash-Koladiya30/brevix
Manual install
pip install brevix # core
pip install 'brevix[guard]' # + semantic Accuracy Guard
pip install 'brevix[tokens]' # + accurate tiktoken counts
pip install 'brevix[all]' # everything
Plug into your LLM coding tool
brevix install --list # show all 20 targets
brevix install claude-code # Claude Code plugin layout
brevix install cursor # .cursor/rules/brevix.mdc
brevix install codex # AGENTS.md + .codex/hooks.json
brevix install gemini # gemini-extension.json + GEMINI.md
brevix install all # write rule files for every tool
Idempotent — re-running updates the Brevix block, leaves your other content alone.
Claude Code marketplace
/plugin marketplace add Yash-Koladiya30/brevix
/plugin install brevix@brevix
MCP middleware
Compress upstream MCP server descriptions:
npm install -g brevix-shrink
Wrap any MCP server in your Claude config:
{
"mcpServers": {
"fs-shrunk": {
"command": "npx",
"args": ["brevix-shrink", "npx", "-y",
"@modelcontextprotocol/server-filesystem", "/tmp"]
}
}
}
⚡ Usage
Slash commands — Claude Code, Cursor, etc.
# Output compression
/brevix # toggle on (full mode)
/brevix lite # gentle compression
/brevix ultra # max compression
/brevix auto # pick best mode per response
/brevix off # disable
/brevix-commit # terse Conventional Commit message
/brevix-check # run Accuracy Guard on a snippet
/brevix-stats # show compression savings
# Smart model routing (saves API cost)
/brevix-route <task> # route to cheapest capable tier (Haiku/Sonnet/Opus)
/brevix-route-stats # cost saved vs Opus-only baseline
/brevix-learn # suggested rule changes from observed escalations
/brevix-learn apply # apply suggestions to ~/.brevix/route.json
For Codex CLI (no slash commands), use $brevix lite|full|ultra|auto|off.
CLI
# Output compression
brevix compress "Your verbose text here" --mode full
brevix compress - # stdin
brevix compress . --mode auto -v # adaptive picks best
brevix compress . --guard --strict --threshold 0.85
# File compression (CLAUDE.md, AGENTS.md, project notes)
brevix compress-file CLAUDE.md # writes .original.md backup
brevix compress-file CLAUDE.md --dry-run
# Stats
brevix stats # estimated, in-process
brevix stats --real --since 7d # parsed from Claude Code session logs
brevix stats --share # tweet-ready one-liner
brevix stats --reset
# Verification
brevix check "original" "compressed"
brevix count "how many tokens?"
# Install rules into a project
brevix install cursor
brevix install --list
# Smart model routing
brevix route --init # write default config to ~/.brevix/route.json
brevix route "classify ticket: app crashes" # print suggested model
brevix route "..." --explain # task, model, est cost, reason
brevix route "..." --call # actually call the chosen model
brevix route "..." --call --confidence # also escalate on low-confidence answers
brevix route --budget-show # current spend vs cap
brevix route --budget-tokens 1000000 --budget-cost 50.00 "..." --call
brevix stats --routing # cost saved, escalation rate, by model/task
brevix stats --routing --since 7d
brevix route --learn-suggest # recommend rule changes
brevix route --learn-apply # apply them
Subagents — Claude Code
agents/ ships six focused subagents that emit ~60% smaller tool results than vanilla agents:
| Agent | Purpose | Output format |
|---|---|---|
| brevix-investigator | Read-only code locator | path:line — symbol — note |
| brevix-builder | Surgical 1–2 file edits with verification | Diff + verify status |
| brevix-reviewer | Bug-focused diff review | path:line: 🔴 bug: …. fix. |
| brevix-haiku | Cheap tier — classify, parse, format, rename | Brief, exact answer |
| brevix-sonnet | Mid tier — review, refactor, debug, explain | Direct, file:line cited |
| brevix-opus | Heavy tier — architecture, multi-agent, hard bugs | Decision + reasoning chain |
Smart Model Routing — /brevix-route
Save ~80% on Claude API cost without manually picking a model per task.
Quick start (5 steps, ~1 minute)
1. Install Brevix and the Anthropic SDK
pip install brevix anthropic
export ANTHROPIC_API_KEY=sk-ant-...
2. Install the Claude Code plugin
/plugin marketplace add Yash-Koladiya30/brevix
/plugin install brevix@brevix
This adds the routing-tier subagents (brevix-haiku / brevix-sonnet / brevix-opus) and the /brevix-route, /brevix-route-stats, /brevix-learn slash commands.
3. Initialize the routing config
brevix route --init
Writes ~/.brevix/route.json with sensible defaults: classify→Haiku, code review→Sonnet, architecture→Opus, escalation chain Haiku→Sonnet→Opus, no budget cap.
4. Use it from Claude Code
/brevix-route classify this support ticket: "app crashes on login"
↓
[brevix] task=classify tier=haiku escalated=0
bug
/brevix-route architect a multi-region payment system with sub-100ms p99 latency
↓
[brevix] task=architecture tier=opus escalated=0
<full Opus response>
5. Check what you saved
/brevix-route-stats --since 7d
How it works under the hood
/brevix-route <task>runsbrevix route ... --explainunder the hood.- Reads the suggested model from CLI output.
- Spawns the matching subagent (
brevix-haiku/brevix-sonnet/brevix-opus) via theTasktool. - If the subagent returns
escalate: <reason>, retries on the next tier (capped at 1 retry per tier). - Records the call to
~/.brevix/routing_log.jsonlfor stats.
Force a tier when you already know the right one:
/brevix-route --force-tier=sonnet refactor this 200-line module
Track your savings — /brevix-route-stats
/brevix-route-stats # all-time
/brevix-route-stats --since 7d # last week
Sample output:
Brevix Routing Stats
--------------------
Window: 7d
Calls: 412
Total cost: $1.84
Opus-only baseline: $9.67
Saved: $7.83 (80.9%)
Escalations: 18 (4.4% of calls)
By model:
claude-haiku-4-5 289 (70.1%) $0.08
claude-sonnet-4-6 108 (26.2%) $0.65
claude-opus-4-7 15 ( 3.7%) $1.11
Auto-tune from your usage — /brevix-learn
After a few days, Brevix recommends rule changes from observed escalation patterns:
/brevix-learn
refactor: claude-sonnet-4-6 -> claude-opus-4-7
samples=54 escalation_rate=68.5%
reason: 37/54 escalated (69%); 32 of those landed on claude-opus-4-7
Apply with /brevix-learn apply to update ~/.brevix/route.json. User customizations and budget caps are preserved.
Hard budget cap
Edit ~/.brevix/route.json:
{
"budget": { "tokens": 0, "cost_usd": 50.00 }
}
Or per-run from the CLI:
brevix route --budget-cost 5.00 --call "your prompt"
When the cap is hit, the next call raises BudgetExceededError instead of silently overspending.
🛡 How Accuracy Guard works
- Compress output via the rule engine.
- Score the original vs compressed text with a local sentence-transformer (no API cost).
- If similarity ≥ threshold (default
0.85) → emit compressed. Otherwise warn, or in--strictmode fall back to original. - Without
sentence-transformersinstalled → falls back to content-word containment (drops stopwords without penalty, fair to compression).
Result: compression you can trust on production code, specs, and contracts.
💡 Compression example
Before
The reason your React component is re-rendering on every parent update is that you are passing an inline object as a prop. In JavaScript, every render creates a new object reference, even if the contents are identical. To fix this, wrap the object in
useMemoso the reference stays stable across renders.
After (full mode)
Inline object prop = new ref each render = re-render. Wrap in
useMemo.
Tokens saved: ~75% · Meaning preserved: ✅ similarity 0.91
📊 Benchmarks
Reproducible three-arm A/B harness in evals/. Compares no-system-prompt vs "be terse" control vs Brevix on 10 developer prompts.
| arm | n | median | mean | total | vs baseline | vs control |
|---|---|---|---|---|---|---|
| baseline | 10 | 221 | 247.3 | 2473 | — | — |
| control | 10 | 178 | 191.6 | 1916 | 22.5% | — |
| brevix | 10 | 119 | 128.4 | 1284 | 48.1% | 33.0% |
Run yourself:
pip install 'brevix[all]' anthropic
export ANTHROPIC_API_KEY=...
python evals/llm_run.py --model claude-sonnet-4-6
python evals/measure.py
The
vs controlcolumn is the honest savings — what Brevix adds beyond "just be brief."
🗺 Roadmap
- Core compression engine (lite / full / ultra)
- Adaptive (auto) mode
- Accuracy Guard (semantic + content-word fallback)
- Local stats counter
- Multi-platform installer (20 targets)
- File-level compression (
brevix compress-file) - MCP middleware (
brevix-shrink) - Statusline badge + Claude Code hooks
- Subagents (investigator / builder / reviewer)
- Three-arm eval harness
- PowerShell installer + uninstaller
- Model routing engine (Haiku / Sonnet / Opus tiering)
- Confidence-driven escalation (hedge / validity / length scorers)
- Token + cost budget enforcement (
BudgetExceededError) - Routing stats dashboard (
brevix stats --routing) - Learn loop (
brevix route --learn-suggest|--learn-apply) - Claude Code routing tier subagents + slash commands
- VSCode extension UI
- Browser extension (claude.ai, chatgpt.com web)
- Two-way compression (compress prompts before send)
- Custom user-defined rule packs
- Web dashboard (team tier)
📜 License & Contributing
MIT — free for personal and commercial use. Issues and PRs welcome — see docs/CONTRIBUTING.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file brevix-0.4.2.tar.gz.
File metadata
- Download URL: brevix-0.4.2.tar.gz
- Upload date:
- Size: 60.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
915077ef27d6d8ef674693449f71c4a7a1648b86e2477549489169bf74882fd5
|
|
| MD5 |
d75ea3136534b3cb738613f41bfaea2a
|
|
| BLAKE2b-256 |
eeef8f2aaf40241e7f1d4e17743e1ccff88a2aec2669f2b26a341103edaac4d1
|
Provenance
The following attestation bundles were made for brevix-0.4.2.tar.gz:
Publisher:
pypi.yml on Yash-Koladiya30/brevix
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
brevix-0.4.2.tar.gz -
Subject digest:
915077ef27d6d8ef674693449f71c4a7a1648b86e2477549489169bf74882fd5 - Sigstore transparency entry: 1449841297
- Sigstore integration time:
-
Permalink:
Yash-Koladiya30/brevix@326dfe078aa23beae3db39818de1e0ee7f16da58 -
Branch / Tag:
refs/tags/v0.4.2 - Owner: https://github.com/Yash-Koladiya30
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@326dfe078aa23beae3db39818de1e0ee7f16da58 -
Trigger Event:
push
-
Statement type:
File details
Details for the file brevix-0.4.2-py3-none-any.whl.
File metadata
- Download URL: brevix-0.4.2-py3-none-any.whl
- Upload date:
- Size: 48.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e42f65c95a5a7a9b839626656d27ee70901d20909dc0c8ac27b46a701749abd
|
|
| MD5 |
927f9bf988c9b6cd00836bd72ec72a9b
|
|
| BLAKE2b-256 |
3588b6e1b1c92dcfb4f6d25413a9a084124b38f256c76bb73e2b14c74e37e1f3
|
Provenance
The following attestation bundles were made for brevix-0.4.2-py3-none-any.whl:
Publisher:
pypi.yml on Yash-Koladiya30/brevix
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
brevix-0.4.2-py3-none-any.whl -
Subject digest:
9e42f65c95a5a7a9b839626656d27ee70901d20909dc0c8ac27b46a701749abd - Sigstore transparency entry: 1449841316
- Sigstore integration time:
-
Permalink:
Yash-Koladiya30/brevix@326dfe078aa23beae3db39818de1e0ee7f16da58 -
Branch / Tag:
refs/tags/v0.4.2 - Owner: https://github.com/Yash-Koladiya30
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@326dfe078aa23beae3db39818de1e0ee7f16da58 -
Trigger Event:
push
-
Statement type: