Skip to main content

Budget-Aware Agentic Routing — route LLM calls intelligently between cheap and powerful models with a hard budget cap.

Project description

baar-core (BAAR-Algo)

PyPI version License: MIT Python 3.10+ Scientific Validation

Route LLM calls intelligently between cheap and powerful models — with a hard financial kill-switch that never breaks.


🚀 Why BAAR?

Every agent developer using GPT-4o has seen this:

  • Simple task → sent to GPT-4o anyway → 15× more expensive than necessary.
  • Budget set to $0.10 → agent burns $0.40 → surprise invoice.
  • No visibility into which agent step cost what, or why.

BAAR (Budget-Aware Agentic Routing) solves this at the protocol level.


🧠 How it Works

BAAR acts as a semantic gateway between your application and the LLM providers.

graph TD
    A[Your Task] --> B{Semantic Router}
    B -- Complexity < 0.65 --> C[gpt-4o-mini]
    B -- Complexity >= 0.65 --> D[Budget Kill-Switch]
    
    D -- "Affordable?" --> E[gpt-4o]
    D -- "Too Expensive" --> F[Force Downgrade to Mini]
    
    E --> G[Audit & Spend Tracking]
    F --> G
    C --> G
    
    G --> H[Final Response]
  1. Semantic Scoring: Uses a cheap model to score task complexity (0.0–1.0).
  2. BCD (Budget-Constrained Decoding): If the powerful model is too expensive for your remaining budget, BAAR automatically downgrades to a cheaper one to ensure the task completes without an overage.
  3. Local Rejection: If even the cheapest model exceeds the budget, the request is rejected locally with zero network cost.

🔬 Benchmarking Results

To ensure frontier-grade quality, BAAR-Algo is validated on industry-standard datasets.

Dataset Strategy Accuracy % Cost (USD) Savings vs BIG
MMLU ALWAYS-BIG 100.0% $0.0905 -
(Knowledge) BAAR-Algo 70.0% $0.0050 93.3%
GSM8K ALWAYS-BIG 100.0% $0.0905 -
(Math) BAAR-Algo 80.0% $0.0050 93.3%
HumanEval ALWAYS-BIG 100.0% $0.0105 -
(Coding) BAAR-Algo 100.0% $0.0105 0.0%*

*On HumanEval, BAAR correctly detects 100% complexity and uses the Big model, ensuring zero quality loss for critical code.

Run the Benchmark Yourself (Free)

baar-bench --dataset all --mock

📦 Installation

pip install baar-core

⚡ Quick Start

from baar import BAARRouter

# Set a hard $0.10 budget cap
router = BAARRouter(budget=0.10)

# This will be routed to gpt-4o-mini (Complexity ~0.1)
response = router.chat("What is the capital of France?")

# This will be routed to gpt-4o (Complexity ~0.9)
code = router.chat("Write a complex matrix multiplication in CUDA.")

🛡️ Resilience & Security

BAAR is designed for Financial Safety (Anti-Denial of Wallet).

Attack Vector BAAR Response Proof
Unbounded Consumption Zero-Call Rejection Blocks request locally with Zero network calls.
Complexity Inflation Semantic Scoring Ignores gibberish/padding intended to drain budget.
Sensitivity Toggling Tunable Threshold Adjust complexity_threshold to match your quality needs.

Verify resilience locally:

baar-stress

🛠️ Configuration

router = BAARRouter(
    budget=0.10,                    # Hard cap in USD
    small_model="gpt-4o-mini",      # Cheap model
    big_model="gpt-4o",             # Powerful model
    complexity_threshold=0.65,      # 0–1: above this → use big model
)

📄 License & Research

Distributed under the MIT License. See LICENSE for more information.

For architectural details and mapping to the OWASP LLM10 security framework, see RESEARCH.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baar_core-0.1.2.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

baar_core-0.1.2-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file baar_core-0.1.2.tar.gz.

File metadata

  • Download URL: baar_core-0.1.2.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for baar_core-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6087e6e3a7b04bde18a95df5b1a66337793b8f54070cf0af4c591f5e41ad6e67
MD5 94c0d3b3e3c01d41ffe4f0d0a5ca2a72
BLAKE2b-256 76d0f8d5bddfa942ac7179ed884258f6bec7fa09223035f4ad2ce3b599bbf682

See more details on using hashes here.

File details

Details for the file baar_core-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: baar_core-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for baar_core-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6b6bea654ab39a23cc020b733a6f64c59c5c066812e5683765a87a891ce0dc1c
MD5 fa48bd23fc8c1fec558dcab06b63fade
BLAKE2b-256 50452ac55fb40c0a764dddb474ccb82908d28c382e34999731f6b90c248dc1f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page