Skip to main content

Budget-Aware Agentic Routing — route LLM calls intelligently between cheap and powerful models with a hard budget cap.

Project description

baar-core (BAAR-Algo)

PyPI version License: MIT Python 3.10+ Scientific Validation

Route LLM calls intelligently between cheap and powerful models — with a hard financial kill-switch that never breaks.


🚀 Why BAAR?

Every agent developer using GPT-4o has seen this:

  • Simple task → sent to GPT-4o anyway → 15× more expensive than necessary.
  • Budget set to $0.10 → agent burns $0.40 → surprise invoice.
  • No visibility into which agent step cost what, or why.

BAAR (Budget-Aware Agentic Routing) solves this at the protocol level.


🧠 How it Works

BAAR acts as a semantic gateway between your application and the LLM providers.

graph TD
    A[Your Task] --> B{Semantic Router}
    B -- Complexity < 0.65 --> C[gpt-4o-mini]
    B -- Complexity >= 0.65 --> D[Budget Kill-Switch]
    
    D -- "Affordable?" --> E[gpt-4o]
    D -- "Too Expensive" --> F[Force Downgrade to Mini]
    
    E --> G[Audit & Spend Tracking]
    F --> G
    C --> G
    
    G --> H[Final Response]
  1. Semantic Scoring: Uses a cheap model to score task complexity (0.0–1.0).
  2. BCD (Budget-Constrained Decoding): If the powerful model is too expensive for your remaining budget, BAAR automatically downgrades to a cheaper one to ensure the task completes without an overage.
  3. Local Rejection: If even the cheapest model exceeds the budget, the request is rejected locally with zero network cost.

🔬 Benchmarking Results

To ensure frontier-grade quality, BAAR-Algo is validated on industry-standard datasets.

Dataset Strategy Accuracy % Cost (USD) Savings vs BIG
MMLU ALWAYS-BIG 100.0% $0.0905 -
(Knowledge) BAAR-Algo 70.0% $0.0050 93.3%
GSM8K ALWAYS-BIG 100.0% $0.0905 -
(Math) BAAR-Algo 80.0% $0.0050 93.3%
HumanEval ALWAYS-BIG 100.0% $0.0105 -
(Coding) BAAR-Algo 100.0% $0.0105 0.0%*

*On HumanEval, BAAR correctly detects 100% complexity and uses the Big model, ensuring zero quality loss for critical code.

Run the Benchmark Yourself (Free)

baar-bench --dataset all --mock

📦 Installation

pip install baar-core

⚡ Quick Start

from baar import BAARRouter

# Set a hard $0.10 budget cap
router = BAARRouter(budget=0.10)

# This will be routed to gpt-4o-mini (Complexity ~0.1)
response = router.chat("What is the capital of France?")

# This will be routed to gpt-4o (Complexity ~0.9)
code = router.chat("Write a complex matrix multiplication in CUDA.")

🛡️ Resilience & Security

BAAR is designed for Financial Safety (Anti-Denial of Wallet).

Attack Vector BAAR Response Proof
Unbounded Consumption Zero-Call Rejection Blocks request locally with Zero network calls.
Complexity Inflation Semantic Scoring Ignores gibberish/padding intended to drain budget.
Sensitivity Toggling Tunable Threshold Adjust complexity_threshold to match your quality needs.

Verify resilience locally:

baar-stress

🛠️ Configuration

router = BAARRouter(
    budget=0.10,                    # Hard cap in USD
    small_model="gpt-4o-mini",      # Cheap model
    big_model="gpt-4o",             # Powerful model
    complexity_threshold=0.65,      # 0–1: above this → use big model
)

📄 License & Research

Distributed under the MIT License. See LICENSE for more information.

For architectural details and mapping to the OWASP LLM10 security framework, see RESEARCH.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baar_core-0.1.3.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

baar_core-0.1.3-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file baar_core-0.1.3.tar.gz.

File metadata

  • Download URL: baar_core-0.1.3.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for baar_core-0.1.3.tar.gz
Algorithm Hash digest
SHA256 60c5afc73196548120aae3d541cfc2d94668b9643e517b5f11a78a5bf85a8b20
MD5 bd15046b1419b602e275329777c31921
BLAKE2b-256 e494137cc9b22ab9400257356d6aaa105b6dfc3bf03dce5cc4eb2e9175f6db9b

See more details on using hashes here.

File details

Details for the file baar_core-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: baar_core-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for baar_core-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0224de05b20e8cc5da44af1927239637be9aa4c9af97d33c02af10334097b7fa
MD5 512d64fe07fffe1754aee50a785418dd
BLAKE2b-256 aa5ca59285d3b63393694a3b72a4388d29f306588d3611fd5cc026eccc544c92

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page