Budget-Aware Agentic Routing — route LLM calls intelligently between cheap and powerful models with a hard budget cap.

These details have not been verified by PyPI

Project links

Project description

Baar-Core

Semantic routing + a hard financial kill-switch for LLM agents.

Never get surprised by another OpenAI or Anthropic bill.

pip install baar-core

baar-core is the PyPI package name. Baar-Core is the project.

Why Baar-Core?

Production LLM agents have a dangerous habit:

Simple queries still get sent to expensive models.
One runaway loop turns your $0.10 budget into $8+ overnight.
The invoice lands before you know which step burned the budget.

Most routers optimize averages. Baar-Core ships a hard Zero-Call Financial Kill-Switch: enforce a strict USD cap, score complexity, route cheap vs capable — and if the next safe call would exceed what’s left, reject locally before a single provider request. $0 spent. Zero network calls.

What you get

Smart semantic routing — Easy work → cheap model; hard work → capable model.
Budget-constrained downgrade — If the big model would break the budget, fall back to the small one so the turn can still finish.
True zero-call kill-switch — Even the cheap model unaffordable? Fail fast — no completion call, no surprise line item.
Offline Safety — If your budget is $0, baar-core won't even attempt a DNS lookup for the LLM provider. It fails instantly in your local environment.

No surprise invoices. Stronger stance against runaway and adversarial “denial of wallet” patterns. Quality where it matters (reasoning, coding, agents) because hard tasks still reach the capable tier when the budget allows.

How it works

graph TD
    A[User task] --> B{Semantic complexity router}
    B -- Low complexity --> C[Cheap model]
    B -- High complexity --> D{Budget check}
    D -- Affordable --> E[Capable model]
    D -- Too expensive --> F[Downgrade to cheap]
    C --> G[Spend tracking]
    E --> G
    F --> G
    G --> H[Response]

Complexity scoring — Fast signal for cheap vs expensive route.
Budget-aware choice — Remaining budget checked before committing to the expensive path.
Local rejection — Exhausted or unsafe to call? Stop before the wire.

Benchmarks

Mock benchmark (deterministic, calibrated policy)

Command:

baar-bench \
  --dataset all \
  --limit 200 \
  --budget 10 \
  --mock \
  --value-policy simple \
  --auto-calibrate-alpha \
  --target-reject-rate 0.05 \
  --alpha-source percentile \
  --max-reject-rate 0.5 \
  --small-exploration-rate 0.1 \
  --seed 42

Dataset	Strategy	Accuracy	Total cost	vs always-big
MMLU	Always big	100.0%	$1.990500	—
MMLU	Baar-Core	91.5%	$1.625000	60.9% cheaper
GSM8K	Always big	100.0%	$1.990500	—
GSM8K	Baar-Core	90.0%	$1.478000	60.4% cheaper
HumanEval	Always big	100.0%	$1.630500	—
HumanEval	Baar-Core	92.7%	$1.369500	48.1% cheaper

Live benchmark (small subset sanity check)

Command:

baar-bench \
  --dataset all \
  --limit 10 \
  --budget 2 \
  --value-policy none \
  --small-exploration-rate 0.0 \
  --seed 42

Dataset	Strategy	Accuracy	Total cost	vs always-big
MMLU	Always big	50.0%	$0.002337	—
MMLU	Baar-Core	60.0%	$0.000137	93.3% cheaper
GSM8K	Always big	60.0%	$0.027615	—
GSM8K	Baar-Core	20.0%	$0.002097	93.3% cheaper
HumanEval	Always big	0.0%	$0.032125	—
HumanEval	Baar-Core	0.0%	$0.002743	93.3% cheaper

Live results can vary significantly by provider/model quality, API reliability, and prompt behavior. Use live runs as environment-specific checks, and use mock runs for reproducible routing/cost trade-off iteration.

Quick start

from baar import BAARRouter

router = BAARRouter(budget=0.10)
print(router.chat("What is the capital of France?"))          # → usually cheap model
print(router.chat("Write an optimized CUDA matmul kernel."))  # → capable model if affordable

# Kill-switch: budget too low for any safe call → blocked before the API
tight = BAARRouter(budget=0.00001)
try:
    tight.chat("Any prompt")
except RuntimeError as e:
    print("Blocked safely:", e)  # zero completion calls, $0 spent

Works with any LiteLLM-supported provider (OpenAI, Anthropic, Groq, Together, Ollama, OpenRouter, …).

Resilience

baar-stress

Adversarial-style checks (complexity games, tight budget). Baar-Core is designed with OWASP LLM Top 10 style risks in mind — including unbounded consumption. Details: RESEARCH.md.

Telemetry Summary CLI

If you enable telemetry_jsonl_path on BAARRouter, summarize logs with:

baar-telemetry path/to/telemetry.jsonl

This prints reject rate, failover rate, total spend, and per-model spend distribution.

Configuration

Default complexity_threshold=0.80 routes more traffic to the cheap model than 0.65 did; the effective threshold also rises with budget utilization so BIG is harder to justify as spend accumulates. Tighten or loosen with complexity_threshold if your workload skews very easy or very hard.

router = BAARRouter(
    budget=0.10,
    small_model="gpt-4o-mini",
    big_model="gpt-4o",
    complexity_threshold=0.80,
)

License & research

MIT — LICENSE.

Architecture, validation notes, and security mapping: RESEARCH.md.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.5

Mar 30, 2026

0.2.4

Mar 29, 2026

This version

0.2.2

Mar 29, 2026

0.2.1

Mar 29, 2026

0.1.5

Mar 28, 2026

0.1.4

Mar 28, 2026

0.1.3

Mar 28, 2026

0.1.2

Mar 28, 2026

0.1.1

Mar 28, 2026

0.1.0

Mar 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baar_core-0.2.2.tar.gz (32.3 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

baar_core-0.2.2-py3-none-any.whl (20.3 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file baar_core-0.2.2.tar.gz.

File metadata

Download URL: baar_core-0.2.2.tar.gz
Upload date: Mar 29, 2026
Size: 32.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for baar_core-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`414e4ee86a554172af54b20c6fdc40376d2dcfa505352381eec94c9462e61b5e`
MD5	`c4ccfce0f9493302ea0960c650013482`
BLAKE2b-256	`373f5c8751e9ff6e9aaf71735c82117e1974ea397ea443506c5280561b8c72e9`

See more details on using hashes here.

File details

Details for the file baar_core-0.2.2-py3-none-any.whl.

File metadata

Download URL: baar_core-0.2.2-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 20.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for baar_core-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7efbaea51db7c5cac43af59d24a20f3fe9b4516080fddc6f384455c0b90c6c58`
MD5	`23d1414a1c6c8dadea24ba692f9920c5`
BLAKE2b-256	`5d33e7fb9e2b3a48c286e8835a9a0ee4efed5095cf7ce23520fec1b1266b02cb`

See more details on using hashes here.

baar-core 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Baar-Core

Why Baar-Core?

What you get

How it works

Benchmarks

Mock benchmark (deterministic, calibrated policy)

Live benchmark (small subset sanity check)

Quick start

Resilience

Telemetry Summary CLI

Configuration

License & research

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes