Budget-Aware Agentic Routing — route LLM calls intelligently between cheap and powerful models with a hard budget cap.
Project description
baar-core (BAAR-Algo)
Route LLM calls intelligently between cheap and powerful models — with a hard financial kill-switch that never breaks.
🚀 Why BAAR?
Every agent developer using GPT-4o has seen this:
- Simple task → sent to GPT-4o anyway → 15× more expensive than necessary.
- Budget set to $0.10 → agent burns $0.40 → surprise invoice.
- No visibility into which agent step cost what, or why.
BAAR (Budget-Aware Agentic Routing) solves this at the protocol level.
🧠 How it Works
BAAR acts as a semantic gateway between your application and the LLM providers.
graph TD
A[Your Task] --> B{Semantic Router}
B -- Complexity < 0.65 --> C[gpt-4o-mini]
B -- Complexity >= 0.65 --> D[Budget Kill-Switch]
D -- "Affordable?" --> E[gpt-4o]
D -- "Too Expensive" --> F[Force Downgrade to Mini]
E --> G[Audit & Spend Tracking]
F --> G
C --> G
G --> H[Final Response]
- Semantic Scoring: Uses a cheap model to score task complexity (0.0–1.0).
- BCD (Budget-Constrained Decoding): If the powerful model is too expensive for your remaining budget, BAAR automatically downgrades to a cheaper one to ensure the task completes without an overage.
- Local Rejection: If even the cheapest model exceeds the budget, the request is rejected locally with zero network cost.
🔬 Benchmarking Results
To ensure frontier-grade quality, BAAR-Algo is validated on industry-standard datasets.
| Dataset | Strategy | Accuracy % | Cost (USD) | Savings vs BIG |
|---|---|---|---|---|
| MMLU | ALWAYS-BIG | 100.0% | $0.0905 | - |
| (Knowledge) | BAAR-Algo | 70.0% | $0.0050 | 93.3% |
| GSM8K | ALWAYS-BIG | 100.0% | $0.0905 | - |
| (Math) | BAAR-Algo | 80.0% | $0.0050 | 93.3% |
| HumanEval | ALWAYS-BIG | 100.0% | $0.0105 | - |
| (Coding) | BAAR-Algo | 100.0% | $0.0105 | 0.0%* |
*On HumanEval, BAAR correctly detects 100% complexity and uses the Big model, ensuring zero quality loss for critical code.
Run the Benchmark Yourself (Free)
baar-bench --dataset all --mock
📦 Installation
pip install baar-core
⚡ Quick Start
from baar import BAARRouter
# Set a hard $0.10 budget cap
router = BAARRouter(budget=0.10)
# This will be routed to gpt-4o-mini (Complexity ~0.1)
response = router.chat("What is the capital of France?")
# This will be routed to gpt-4o (Complexity ~0.9)
code = router.chat("Write a complex matrix multiplication in CUDA.")
🛡️ Resilience & Security
BAAR is designed for Financial Safety (Anti-Denial of Wallet).
| Attack Vector | BAAR Response | Proof |
|---|---|---|
| Unbounded Consumption | Zero-Call Rejection | Blocks request locally with Zero network calls. |
| Complexity Inflation | Semantic Scoring | Ignores gibberish/padding intended to drain budget. |
| Sensitivity Toggling | Tunable Threshold | Adjust complexity_threshold to match your quality needs. |
Verify resilience locally:
baar-stress
🛠️ Configuration
router = BAARRouter(
budget=0.10, # Hard cap in USD
small_model="gpt-4o-mini", # Cheap model
big_model="gpt-4o", # Powerful model
complexity_threshold=0.65, # 0–1: above this → use big model
)
📄 License & Research
Distributed under the MIT License. See LICENSE for more information.
For architectural details and mapping to the OWASP LLM10 security framework, see RESEARCH.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file baar_core-0.1.3.tar.gz.
File metadata
- Download URL: baar_core-0.1.3.tar.gz
- Upload date:
- Size: 21.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60c5afc73196548120aae3d541cfc2d94668b9643e517b5f11a78a5bf85a8b20
|
|
| MD5 |
bd15046b1419b602e275329777c31921
|
|
| BLAKE2b-256 |
e494137cc9b22ab9400257356d6aaa105b6dfc3bf03dce5cc4eb2e9175f6db9b
|
File details
Details for the file baar_core-0.1.3-py3-none-any.whl.
File metadata
- Download URL: baar_core-0.1.3-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0224de05b20e8cc5da44af1927239637be9aa4c9af97d33c02af10334097b7fa
|
|
| MD5 |
512d64fe07fffe1754aee50a785418dd
|
|
| BLAKE2b-256 |
aa5ca59285d3b63393694a3b72a4388d29f306588d3611fd5cc026eccc544c92
|