Skip to main content

Compress LLM output safely. Save tokens without breaking your code.

Project description

Brevix — Compress AI responses. Save tokens. Keep meaning.

Brevix

Compress LLM output safely. Save tokens without breaking your code.

CI Tests PyPI npm skills.sh License: MIT Python Claude Code Plugin

Install  ·  Usage  ·  Cost Routing  ·  Accuracy Guard  ·  Benchmarks  ·  Roadmap  ·  Contributing

Cuts response tokens 40–75% with a deterministic rule engine. Routes each task to the cheapest capable Claude tier — saving up to ~80% of API spend. Verifies meaning is preserved before emit. Works across 20+ AI coding tools.

40-75% savings ~80% API cost cut Accuracy Guard 20+ Platforms 100% Local


Why Brevix

Why Brevix

Brevix is two cost-saving layers in one tool:

  1. Output compression — cuts response tokens 40–75% with a rule engine that's verified by a local Accuracy Guard. No silent meaning loss on dense technical prose.
  2. Cost management & smart model routing — picks the cheapest Claude tier (Haiku / Sonnet / Opus) per task, escalates only on low-confidence answers, and enforces token + USD budget caps. Typical savings: ~80% of API spend vs always running Opus.

Works with Claude Code · Cursor · Windsurf · OpenAI Codex CLI · Google Antigravity · Gemini CLI · GitHub Copilot Chat · Aider · Continue.dev · Cline · Roo Code · Zed AI · Augment · Kilo · OpenHands · Tabnine · Warp · Replit · Sourcegraph Amp — plus any tool reading AGENTS.md.


Cost Management

💰 Cost Management & Model Routing

What it does: Stops you from paying Opus prices for tasks Haiku can solve in a single shot.

The problem

Most LLM coding tools call the most expensive model for every prompt. A task like "classify this support ticket" hits Opus at $0.05 per call, when Haiku could answer it for $0.0002 — 250× cheaper, identical answer.

The fix

Brevix Route is a thin layer in front of the Anthropic SDK that:

  1. Classifies the task (regex/keyword-based, ~6μs, no API call).
  2. Picks the cheapest capable model from a task → model rule table you control.
  3. Optionally scores the response for confidence (hedge phrases, validity, length, optional semantic check).
  4. Escalates to the next tier (Haiku → Sonnet → Opus) only when confidence is below threshold.
  5. Records every call to a local JSONL log so you can audit exactly what was saved.

How to use it

Option A — From your code (Python)

from brevix import RoutedClient

client = RoutedClient(log_enabled=True)

# Cheap task -> auto-routed to Haiku.
r = client.call("Classify this ticket: app crashes on login")
print(r.text, r.model, r.cost_usd)
# > "bug"  claude-haiku-4-5  0.000028

# Hard task -> auto-routed to Opus.
r = client.call("Architect a multi-region payment system with sub-100ms p99 latency")
print(r.model)
# > claude-opus-4-7

# Confidence-guarded mode: retries on next tier only if the first answer hedges.
r = client.call("Review this auth middleware for token expiry bugs", confidence_check=True)
print(r.model, r.escalations, r.confidence)
# > claude-opus-4-7  1  0.94

Option B — From the CLI

brevix route --init                                    # one-time: write default config
brevix route "classify ticket: app crashes" --explain  # dry-run, show decision + cost
brevix route "..." --call                              # actually call the model
brevix route "..." --call --confidence                 # call + escalate on low conf
brevix route --budget-show                             # current spend
brevix stats --routing --since 7d                      # weekly cost-saved report
brevix route --learn-suggest                           # recommend rule changes
brevix route --learn-apply                             # apply them

Option C — From Claude Code (slash commands)

After /plugin install brevix@brevix:

/brevix-route classify this support ticket: "app crashes on login"
        ↓
[brevix] task=classify tier=haiku escalated=0
bug
/brevix-route architect a multi-region payment system with sub-100ms p99 latency
        ↓
[brevix] task=architecture tier=opus escalated=0
<full Opus response>
/brevix-route-stats --since 7d         # cost saved vs Opus-only baseline
/brevix-learn                          # suggested rule changes
/brevix-learn apply                    # apply them

Hard budget cap

Edit ~/.brevix/route.json:

{
  "budget": { "tokens": 0, "cost_usd": 50.00 }
}

When the cap is hit, the next call raises BudgetExceededError instead of silently overspending.

Or per-run from the CLI:

brevix route --budget-cost 5.00 --call "your prompt"

Real-world savings

Workload Naive (Opus only) Brevix routed Saved
Solo dev, 50 calls/day, mostly easy tasks $0.40/day $0.06/day 85%
Team of 5, 500 calls/day, mixed $4.00/day $0.80/day 80%
Customer-support bot, 10k tickets/day, 90% classify $80/day $1.50/day 98%

Same answer quality — confidence guard ensures hard cases still hit Opus.

See the full Routing tutorial in ## ⚡ Usage for advanced flags, force-tier overrides, and the auto-tuning learn loop.


Features

Features

Compression Engine

  • Three modes — Lite · Full · Ultra
  • Adaptive Auto mode — picks safest aggressive level per response
  • Protected regions — code blocks, URLs, error quotes never touched
  • File compressionCLAUDE.md, AGENTS.md, project notes with .original backup

Safety & Verification

  • Accuracy Guard — semantic similarity check before emit
  • Strict mode — auto-fallback to original when meaning would be lost
  • Local-first — no telemetry, no API calls in the engine
  • Reproducible benchmarks — three-arm A/B eval harness with tiktoken o200k_base

Integrations

  • MCP middleware (brevix-shrink) — compresses tool/prompt/resource descriptions
  • 20 platform install targets — idempotent BREVIX:BEGIN/END markers
  • Statusline badge[BREVIX] ⛏ X.Xk saved
  • Subagents — investigator · builder · reviewer with terse output

Insights

  • Real session-log token countsstats --real --since 7d
  • Shareable savingsstats --share produces tweet-ready output
  • Per-mode breakdown — see exactly what each level saves
  • Free + MIT licensed

💰 Cost Management & Model Routing — new

  • Task-aware tiering — classify/parse → Haiku, code review/debug → Sonnet, architecture → Opus
  • Confidence-driven escalation — hedge / validity / length scorers retry on the next tier only when needed
  • Hard budget capsBudgetExceededError blocks runaway spend (per-token or per-USD limits)
  • Routing dashboardbrevix stats --routing shows cost saved vs Opus-only baseline, escalation rate, per-model + per-task breakdown
  • Self-tuningbrevix route --learn-suggest|--learn-apply patches your config from observed escalation patterns
  • Claude Code subagentsbrevix-haiku · brevix-sonnet · brevix-opus plus /brevix-route slash dispatcher

Install

🚀 Install

One-liner — macOS / Linux / WSL

curl -fsSL https://raw.githubusercontent.com/Yash-Koladiya30/brevix/main/install.sh | bash -s -- --all

Windows — PowerShell

irm https://raw.githubusercontent.com/Yash-Koladiya30/brevix/main/install.ps1 | iex

skills CLI — one command, 9 tools at once

Auto-installs Brevix skills into Antigravity, Claude Code, Cline, Codex, Cursor, Gemini CLI, GitHub Copilot, Kiro CLI, and Qoder simultaneously.

npx skills add https://github.com/Yash-Koladiya30/brevix

Pick a specific skill:

npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix
npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix-commit
npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix-stats

Listing → skills.sh/Yash-Koladiya30/brevix

Manual install

pip install brevix                  # core
pip install 'brevix[guard]'         # + semantic Accuracy Guard
pip install 'brevix[tokens]'        # + accurate tiktoken counts
pip install 'brevix[all]'           # everything

Plug into your LLM coding tool

brevix install --list                # show all 20 targets
brevix install claude-code           # Claude Code plugin layout
brevix install cursor                # .cursor/rules/brevix.mdc
brevix install codex                 # AGENTS.md + .codex/hooks.json
brevix install gemini                # gemini-extension.json + GEMINI.md
brevix install all                   # write rule files for every tool

Idempotent — re-running updates the Brevix block, leaves your other content alone.

Claude Code marketplace

/plugin marketplace add Yash-Koladiya30/brevix
/plugin install brevix@brevix

MCP middleware

Compress upstream MCP server descriptions:

npm install -g brevix-shrink

Wrap any MCP server in your Claude config:

{
  "mcpServers": {
    "fs-shrunk": {
      "command": "npx",
      "args": ["brevix-shrink", "npx", "-y",
               "@modelcontextprotocol/server-filesystem", "/tmp"]
    }
  }
}

Usage

⚡ Usage

Slash commands — Claude Code, Cursor, etc.

# Output compression
/brevix                  # toggle on (full mode)
/brevix lite             # gentle compression
/brevix ultra            # max compression
/brevix auto             # pick best mode per response
/brevix off              # disable
/brevix-commit           # terse Conventional Commit message
/brevix-check            # run Accuracy Guard on a snippet
/brevix-stats            # show compression savings

# Smart model routing (saves API cost)
/brevix-route <task>     # route to cheapest capable tier (Haiku/Sonnet/Opus)
/brevix-route-stats      # cost saved vs Opus-only baseline
/brevix-learn            # suggested rule changes from observed escalations
/brevix-learn apply      # apply suggestions to ~/.brevix/route.json

For Codex CLI (no slash commands), use $brevix lite|full|ultra|auto|off.

CLI

# Output compression
brevix compress "Your verbose text here" --mode full
brevix compress -                      # stdin
brevix compress . --mode auto -v       # adaptive picks best
brevix compress . --guard --strict --threshold 0.85

# File compression (CLAUDE.md, AGENTS.md, project notes)
brevix compress-file CLAUDE.md         # writes .original.md backup
brevix compress-file CLAUDE.md --dry-run

# Stats
brevix stats                           # estimated, in-process
brevix stats --real --since 7d         # parsed from Claude Code session logs
brevix stats --share                   # tweet-ready one-liner
brevix stats --reset

# Verification
brevix check "original" "compressed"
brevix count "how many tokens?"

# Install rules into a project
brevix install cursor
brevix install --list

# Smart model routing
brevix route --init                                # write default config to ~/.brevix/route.json
brevix route "classify ticket: app crashes"        # print suggested model
brevix route "..." --explain                       # task, model, est cost, reason
brevix route "..." --call                          # actually call the chosen model
brevix route "..." --call --confidence             # also escalate on low-confidence answers
brevix route --budget-show                         # current spend vs cap
brevix route --budget-tokens 1000000 --budget-cost 50.00 "..." --call
brevix stats --routing                             # cost saved, escalation rate, by model/task
brevix stats --routing --since 7d
brevix route --learn-suggest                       # recommend rule changes
brevix route --learn-apply                         # apply them

Subagents — Claude Code

agents/ ships six focused subagents that emit ~60% smaller tool results than vanilla agents:

Agent Purpose Output format
brevix-investigator Read-only code locator path:line — symbol — note
brevix-builder Surgical 1–2 file edits with verification Diff + verify status
brevix-reviewer Bug-focused diff review path:line: 🔴 bug: …. fix.
brevix-haiku Cheap tier — classify, parse, format, rename Brief, exact answer
brevix-sonnet Mid tier — review, refactor, debug, explain Direct, file:line cited
brevix-opus Heavy tier — architecture, multi-agent, hard bugs Decision + reasoning chain

Smart Model Routing — /brevix-route

Save ~80% on Claude API cost without manually picking a model per task.

Quick start (5 steps, ~1 minute)

1. Install Brevix and the Anthropic SDK

pip install brevix anthropic
export ANTHROPIC_API_KEY=sk-ant-...

2. Install the Claude Code plugin

/plugin marketplace add Yash-Koladiya30/brevix
/plugin install brevix@brevix

This adds the routing-tier subagents (brevix-haiku / brevix-sonnet / brevix-opus) and the /brevix-route, /brevix-route-stats, /brevix-learn slash commands.

3. Initialize the routing config

brevix route --init

Writes ~/.brevix/route.json with sensible defaults: classify→Haiku, code review→Sonnet, architecture→Opus, escalation chain Haiku→Sonnet→Opus, no budget cap.

4. Use it from Claude Code

/brevix-route classify this support ticket: "app crashes on login"
        ↓
[brevix] task=classify tier=haiku escalated=0
bug
/brevix-route architect a multi-region payment system with sub-100ms p99 latency
        ↓
[brevix] task=architecture tier=opus escalated=0
<full Opus response>

5. Check what you saved

/brevix-route-stats --since 7d
How it works under the hood
  1. /brevix-route <task> runs brevix route ... --explain under the hood.
  2. Reads the suggested model from CLI output.
  3. Spawns the matching subagent (brevix-haiku / brevix-sonnet / brevix-opus) via the Task tool.
  4. If the subagent returns escalate: <reason>, retries on the next tier (capped at 1 retry per tier).
  5. Records the call to ~/.brevix/routing_log.jsonl for stats.

Force a tier when you already know the right one:

/brevix-route --force-tier=sonnet refactor this 200-line module

Track your savings — /brevix-route-stats

/brevix-route-stats             # all-time
/brevix-route-stats --since 7d  # last week

Sample output:

Brevix Routing Stats
--------------------
Window:             7d
Calls:              412
Total cost:         $1.84
Opus-only baseline: $9.67
Saved:              $7.83 (80.9%)
Escalations:        18 (4.4% of calls)

By model:
  claude-haiku-4-5    289 (70.1%)  $0.08
  claude-sonnet-4-6   108 (26.2%)  $0.65
  claude-opus-4-7      15 ( 3.7%)  $1.11

Auto-tune from your usage — /brevix-learn

After a few days, Brevix recommends rule changes from observed escalation patterns:

/brevix-learn

  refactor: claude-sonnet-4-6 -> claude-opus-4-7
    samples=54  escalation_rate=68.5%
    reason: 37/54 escalated (69%); 32 of those landed on claude-opus-4-7

Apply with /brevix-learn apply to update ~/.brevix/route.json. User customizations and budget caps are preserved.

Hard budget cap

Edit ~/.brevix/route.json:

{
  "budget": { "tokens": 0, "cost_usd": 50.00 }
}

Or per-run from the CLI:

brevix route --budget-cost 5.00 --call "your prompt"

When the cap is hit, the next call raises BudgetExceededError instead of silently overspending.


Accuracy Guard

🛡 How Accuracy Guard works

  1. Compress output via the rule engine.
  2. Score the original vs compressed text with a local sentence-transformer (no API cost).
  3. If similarity ≥ threshold (default 0.85) → emit compressed. Otherwise warn, or in --strict mode fall back to original.
  4. Without sentence-transformers installed → falls back to content-word containment (drops stopwords without penalty, fair to compression).

Result: compression you can trust on production code, specs, and contracts.


Example

💡 Compression example

Before

The reason your React component is re-rendering on every parent update is that you are passing an inline object as a prop. In JavaScript, every render creates a new object reference, even if the contents are identical. To fix this, wrap the object in useMemo so the reference stays stable across renders.

After (full mode)

Inline object prop = new ref each render = re-render. Wrap in useMemo.

Tokens saved: ~75%  ·  Meaning preserved: ✅ similarity 0.91


Benchmarks

📊 Benchmarks

Reproducible three-arm A/B harness in evals/. Compares no-system-prompt vs "be terse" control vs Brevix on 10 developer prompts.

arm n median mean total vs baseline vs control
baseline 10 221 247.3 2473
control 10 178 191.6 1916 22.5%
brevix 10 119 128.4 1284 48.1% 33.0%

Run yourself:

pip install 'brevix[all]' anthropic
export ANTHROPIC_API_KEY=...
python evals/llm_run.py --model claude-sonnet-4-6
python evals/measure.py

The vs control column is the honest savings — what Brevix adds beyond "just be brief."


Roadmap

🗺 Roadmap

  • Core compression engine (lite / full / ultra)
  • Adaptive (auto) mode
  • Accuracy Guard (semantic + content-word fallback)
  • Local stats counter
  • Multi-platform installer (20 targets)
  • File-level compression (brevix compress-file)
  • MCP middleware (brevix-shrink)
  • Statusline badge + Claude Code hooks
  • Subagents (investigator / builder / reviewer)
  • Three-arm eval harness
  • PowerShell installer + uninstaller
  • Model routing engine (Haiku / Sonnet / Opus tiering)
  • Confidence-driven escalation (hedge / validity / length scorers)
  • Token + cost budget enforcement (BudgetExceededError)
  • Routing stats dashboard (brevix stats --routing)
  • Learn loop (brevix route --learn-suggest|--learn-apply)
  • Claude Code routing tier subagents + slash commands
  • VSCode extension UI
  • Browser extension (claude.ai, chatgpt.com web)
  • Two-way compression (compress prompts before send)
  • Custom user-defined rule packs
  • Web dashboard (team tier)

License   Contributing

📜 License & Contributing

MIT — free for personal and commercial use. Issues and PRs welcome — see docs/CONTRIBUTING.md.


Built with care by Yash Koladiya · If Brevix saves you tokens, ⭐ the repo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

brevix-0.4.2.tar.gz (60.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

brevix-0.4.2-py3-none-any.whl (48.3 kB view details)

Uploaded Python 3

File details

Details for the file brevix-0.4.2.tar.gz.

File metadata

  • Download URL: brevix-0.4.2.tar.gz
  • Upload date:
  • Size: 60.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for brevix-0.4.2.tar.gz
Algorithm Hash digest
SHA256 915077ef27d6d8ef674693449f71c4a7a1648b86e2477549489169bf74882fd5
MD5 d75ea3136534b3cb738613f41bfaea2a
BLAKE2b-256 eeef8f2aaf40241e7f1d4e17743e1ccff88a2aec2669f2b26a341103edaac4d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for brevix-0.4.2.tar.gz:

Publisher: pypi.yml on Yash-Koladiya30/brevix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file brevix-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: brevix-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 48.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for brevix-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9e42f65c95a5a7a9b839626656d27ee70901d20909dc0c8ac27b46a701749abd
MD5 927f9bf988c9b6cd00836bd72ec72a9b
BLAKE2b-256 3588b6e1b1c92dcfb4f6d25413a9a084124b38f256c76bb73e2b14c74e37e1f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for brevix-0.4.2-py3-none-any.whl:

Publisher: pypi.yml on Yash-Koladiya30/brevix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page