Ship agents that fix themselves. Outcome-aware routing for production AI agents.
Project description
Kalibr — Ship agents that fix themselves
Your agent captures its own execution telemetry, reports whether it achieved the goal, and Kalibr routes future calls to what actually works. When models degrade, Kalibr reroutes before users notice.
Developers — hand this to your coding agent
pip install kalibr
kalibr auth
kalibr init
kalibr verify
Your agent runs kalibr auth, your terminal shows a code, you enter it at dashboard.kalibr.systems/link, and the agent handles the rest — credentials, codebase scanning, framework detection, and instrumentation. One approval, zero config.
Already have credentials? Skip signup:
pip install kalibr
export KALIBR_API_KEY=sk_...
export KALIBR_TENANT_ID=tenant_...
kalibr init
kalibr verify
If you are an agent reading this
Full autonomous onboarding protocol: kalibr.systems/llms.txt · kalibr.systems/setup.txt
Quick Start
from kalibr import Router
router = Router(
goal="extract_company",
paths=["gpt-4o", "claude-sonnet-4-20250514"],
success_when=lambda output: "company" in output.lower()
)
response = router.completion(
messages=[{"role": "user", "content": "Extract the company: Hi from Stripe."}]
)
pip install kalibr
Installation
pip install kalibr
# For accurate token counting
pip install kalibr[tokens]
Setup
Get your credentials from dashboard.kalibr.systems/settings, then:
export KALIBR_API_KEY=your-api-key
export KALIBR_TENANT_ID=your-tenant-id
export OPENAI_API_KEY=sk-... # or ANTHROPIC_API_KEY for Claude models
Or use autonomous provisioning:
export KALIBR_PROVISIONING_TOKEN=your-token # create at dashboard.kalibr.systems/settings
kalibr init # scans your project and provisions credentials automatically
Or link via device code (recommended):
kalibr auth
# Terminal shows a code. Enter it at dashboard.kalibr.systems/link.
# Agent receives credentials automatically. No email required.
kalibr init
CLI
kalibr auth # link agent to your Kalibr account (device code — recommended)
kalibr signup EMAIL # DEPRECATED: use kalibr auth instead
kalibr init # scan codebase, wrap bare LLM calls with Router, provision credentials
kalibr verify # check credentials and Router connectivity
kalibr prompt # copy Claude Code / Cursor integration prompt to clipboard
How It Works
Every call your agent makes generates data. Kalibr uses that data to get better.
- You define paths — models, tools, and parameters that can handle your task
- Kalibr picks — uses Thompson Sampling to route to what's been working while exploring alternatives
- You report outcomes — tell Kalibr if it worked (or use
success_whento automate it) - Kalibr adapts — routes more traffic to what works, routes around what doesn't
No dashboards to watch. No alerts to triage. Your agent improves itself.
Paths
A path is any combination of model + tools + params. Kalibr tracks each combination separately and learns which one works best for each goal.
# Just models
paths = ["gpt-4o", "claude-sonnet-4-20250514", "gpt-4o-mini"]
# With tools
paths = [
{"model": "gpt-4o", "tools": ["web_search"]},
{"model": "claude-sonnet-4-20250514", "tools": ["web_search", "browser"]},
]
# With params
paths = [
{"model": "gpt-4o", "params": {"temperature": 0.7}},
{"model": "gpt-4o", "params": {"temperature": 0.2}},
]
# Mix and match
paths = [
{"model": "gpt-4o", "tools": ["web_search"], "params": {"temperature": 0.3}},
{"model": "claude-sonnet-4-20250514", "params": {"temperature": 0.7}},
"gpt-4o-mini"
]
This is what makes Kalibr different from model routers. OpenRouter picks a model. Kalibr picks the full execution path — and knows whether it actually worked.
Outcome Reporting
Automatic (recommended)
router = Router(
goal="summarize",
paths=["gpt-4o", "claude-sonnet-4-20250514"],
success_when=lambda output: len(output) > 100
)
response = router.completion(messages=[...])
# Outcome reported automatically based on success_when
Manual
router = Router(goal="book_meeting", paths=["gpt-4o", "claude-sonnet-4-20250514"])
response = router.completion(messages=[...])
meeting_created = check_calendar_api()
router.report(success=meeting_created)
With failure categories
Tell Kalibr why something failed so routing decisions are made against root cause, not just success rate:
from kalibr import FAILURE_CATEGORIES
# ["timeout", "context_exceeded", "tool_error", "rate_limited",
# "validation_failed", "hallucination_detected", "user_unsatisfied",
# "empty_response", "malformed_output", "auth_error", "provider_error", "unknown"]
router.report(
success=False,
failure_category="rate_limited",
reason="hit provider limit"
)
# Invalid categories raise ValueError immediately
Update outcomes after the fact
For async validation, user feedback, or downstream system confirmation:
from kalibr import update_outcome
update_outcome(
trace_id="abc123",
goal="resolve_ticket",
success=False,
failure_reason="customer_reopened",
failure_category="user_unsatisfied",
score=0.3,
metadata={"ticket_id": "T-9182"}
)
Insights API
Query what Kalibr has learned about your goals — health status, failure mode breakdowns, path comparisons, and actionable signals:
from kalibr import get_insights
# All goals, last 7 days
insights = get_insights()
# Specific goal, custom window
insights = get_insights(goal="research_agent", window_hours=24)
for goal_data in insights["goals"]:
print(goal_data["status"]) # healthy / degraded / insufficient_data
print(goal_data["top_failure_modes"])
print(goal_data["actionable_signals"]) # path_underperforming, drift_detected, etc.
Framework Integrations
LangChain
pip install kalibr[langchain]
from kalibr import Router
router = Router(goal="summarize", paths=["gpt-4o", "claude-sonnet-4-20250514"])
llm = router.as_langchain()
chain = prompt | llm | parser
All integrations
pip install kalibr[crewai] # CrewAI
pip install kalibr[openai-agents] # OpenAI Agents SDK
pip install kalibr[langchain-all] # LangChain with all providers
Auto-Instrumentation
Kalibr auto-instruments OpenAI, Anthropic, and Google SDKs on import:
import kalibr # Must be first import
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(model="gpt-4o", messages=[...])
# Traced automatically — cost, latency, tokens, success all captured
Disable with KALIBR_AUTO_INSTRUMENT=false.
Low-Level API
Use get_policy() when you need fine-grained control — custom retry logic, framework integrations, or provider-specific features:
from kalibr import get_policy, report_outcome
policy = get_policy(goal="summarize")
model = policy["recommended_model"]
# You call the provider yourself
if model.startswith("gpt"):
client = OpenAI()
response = client.chat.completions.create(model=model, messages=[...])
report_outcome(trace_id=trace_id, goal="summarize", success=True)
Or go even lower:
from kalibr import register_path, decide, report_outcome
register_path(goal="book_meeting", model_id="gpt-4o")
register_path(goal="book_meeting", model_id="claude-sonnet-4-20250514")
decision = decide(goal="book_meeting")
model = decision["model_id"]
# Make your own LLM call, then report
report_outcome(trace_id="...", goal="book_meeting", success=True)
Configuration
| Variable | Description | Default |
|---|---|---|
KALIBR_API_KEY |
API key from dashboard | Required |
KALIBR_TENANT_ID |
Tenant ID from dashboard | Required |
KALIBR_PROVISIONING_TOKEN |
Enables kalibr init credential auto-provisioning |
— |
KALIBR_AUTO_INSTRUMENT |
Auto-instrument LLM SDKs on import | true |
KALIBR_INTELLIGENCE_URL |
Intelligence service URL | https://kalibr-intelligence.fly.dev |
KALIBR_COLLECTOR_URL |
Ingest endpoint | https://api.kalibr.systems/api/ingest |
KALIBR_CONSOLE_EXPORT |
Print spans to console | false |
Links
Development
git clone https://github.com/kalibr-ai/kalibr-sdk-python.git
cd kalibr-sdk-python
pip install -e ".[dev]"
pytest
Contributing
See CONTRIBUTING.md.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kalibr-1.6.0.tar.gz.
File metadata
- Download URL: kalibr-1.6.0.tar.gz
- Upload date:
- Size: 116.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a67ebcbcae1527037cea82884c49f2d4eb95a8162e8eecbda45b5d51d68646b5
|
|
| MD5 |
de83fc8695f590f77ce8f7362d84220f
|
|
| BLAKE2b-256 |
097e9128858bccdc3200f046b4fb6f23bae9ae51a153cdcabac00d84c9b16bdc
|
File details
Details for the file kalibr-1.6.0-py3-none-any.whl.
File metadata
- Download URL: kalibr-1.6.0-py3-none-any.whl
- Upload date:
- Size: 121.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2250f57e48af80eea813f83cf5164717acb8ecb1b0288963256265a4ef2a4b49
|
|
| MD5 |
95ad6e6aa47a9accd5c15e83316a039b
|
|
| BLAKE2b-256 |
00c2c7f2f6d7f0fe067f3d9925e5f41cfbf78c9df741e843983b44eaafe3a903
|