Skip to main content

Simple, performant, battle-tested framework for building reliable AI applications

Project description

Timbal

Timbal 2.0 is in beta. The API is stable — we're finalizing docs and tooling before the full release.

Simple, performant, battle-tested framework for building reliable AI applications.

Full documentation: docs.timbal.ai


Installation

pip install timbal

Timbal is modular. The bare install includes the agent/workflow engine, both Anthropic and OpenAI providers, MCP support, and tracing. Install extras only when you need them:

Extra What it adds When to use
timbal[server] FastAPI + uvicorn Serving agents over HTTP
timbal[documents] PyMuPDF + openpyxl + python-docx Reading PDFs, Excel, Word files
timbal[evals] rich Running the evals CLI
timbal[codegen] libcst + ruff Using the code generation tools
timbal[all] Everything above
pip install 'timbal[server]'
pip install 'timbal[documents,evals]'
pip install 'timbal[all]'

From source

git clone https://github.com/timbal-ai/timbal.git
cd timbal
uv sync --dev

Two patterns, one interface

Agent — autonomous reasoning

The LLM decides what to do. You provide tools and a goal.

from timbal import Agent
from timbal.tools import WebSearch
from datetime import datetime

def get_datetime() -> str:
    return datetime.now().isoformat()

agent = Agent(
    name="assistant",
    model="anthropic/claude-sonnet-4-6",
    tools=[get_datetime, WebSearch()],
    max_tokens=1024,
)

result = await agent.collect(prompt="What time is it in Tokyo right now?")
print(result.output)

Workflow — explicit pipelines

You define the steps. The framework handles concurrency and dependency resolution.

import httpx
from timbal import Workflow
from timbal.state import get_run_context
from timbal.tools import Write

async def fetch(url: str) -> str:
    async with httpx.AsyncClient(follow_redirects=True) as client:
        return (await client.get(url)).text

workflow = (
    Workflow(name="scraper")
    .step(fetch)
    .step(
        Write(),
        path="./output.html",
        content=lambda: get_run_context().step_span("fetch").output,
    )
)

await workflow.collect(url="https://timbal.ai")

Independent steps run concurrently. Dependencies are inferred automatically from step_span() references — no manual wiring needed.


Calling runnables

All Agent, Workflow, and Tool instances share the same interface.

# Collect all events, return final OutputEvent
result = await agent.collect(prompt="Hello")
print(result.output)           # final result
print(result.status.code)      # "success" | "error" | "cancelled"
print(result.usage)            # {"anthropic/claude-sonnet-4-6:input_tokens": 42, ...}

# Or stream events
async for event in agent(prompt="Hello"):
    if isinstance(event, DeltaEvent) and isinstance(event.item, TextDelta):
        print(event.item.text_delta, end="", flush=True)

Models

Any provider, one interface. Model strings follow provider/model-name:

anthropic/claude-sonnet-4-6       openai/gpt-4o
anthropic/claude-opus-4-7         openai/gpt-5.5
anthropic/claude-opus-4-6         openai/gpt-4o-mini
anthropic/claude-haiku-4-5        openai/o3
google/gemini-2.5-flash           groq/llama-3.3-70b-versatile
google/gemini-2.5-pro-preview     xai/grok-3
cerebras/llama-3.3-70b            sambanova/Meta-Llama-3.3-70B-Instruct

Full list and context window sizes in python/timbal/core/models.py.


Structured output

from pydantic import BaseModel

class Analysis(BaseModel):
    sentiment: str
    confidence: float
    summary: str

agent = Agent(model="openai/gpt-4o-mini", output_model=Analysis)
result = await agent.collect(prompt="Analyse: 'Timbal makes AI easy'")
print(result.output.sentiment)     # "positive"
print(result.output.confidence)    # 0.97

Streaming

from timbal.types.events import DeltaEvent
from timbal.types.events.delta import TextDelta

async for event in agent(prompt="Write a short poem"):
    if isinstance(event, DeltaEvent) and isinstance(event.item, TextDelta):
        print(event.item.text_delta, end="", flush=True)

Memory compaction

Long conversations grow large. Timbal has built-in strategies to keep context under control.

from timbal.core.memory_compaction import (
    compact_tool_results,
    keep_last_n_messages,
    keep_last_n_turns,
    summarize,
)

agent = Agent(
    model="anthropic/claude-sonnet-4-6",
    max_tokens=2048,
    memory_compaction=[
        compact_tool_results(keep_last_n=2),   # compress old tool outputs
        keep_last_n_turns(10),                 # keep last 10 user↔assistant turns
    ],
    memory_compaction_ratio=0.75,              # trigger at 75% context window usage
)

Strategies:

  • compact_tool_results(keep_last_n, threshold, replacement) — strips old tool results, optionally replacing with a summary string
  • keep_last_n_messages(n) — hard truncation, structure-aware (no orphaned tool pairs)
  • keep_last_n_turns(n) — keep last N user+assistant pairs
  • summarize(threshold, model, keep_last_n, max_summary_tokens) — async LLM-based summarization of old messages

Strategies are applied in order. Multiple strategies can be combined.


Skills

Skills are reusable, self-documenting tool packages. They sit on disk and are loaded into the agent's context only when the LLM explicitly requests them via the auto-injected read_skill tool.

skills/
└── web_research/
    ├── SKILL.md          # frontmatter + docs shown to the LLM
    └── tools/
        ├── search.py
        └── scrape.py
# SKILL.md
---
name: "web_research"
description: "Search the web and scrape pages"
---
Use `search(query)` to find pages, then `scrape(url)` to get the content.
agent = Agent(
    model="anthropic/claude-sonnet-4-6",
    skills_path="./skills",
    max_tokens=2048,
)

The agent sees skill names and descriptions at startup. It calls read_skill("web_research") to load the tools and documentation when needed — keeping context clean until the skill is actually required.


MCP servers

Connect agents to any Model Context Protocol server.

from timbal.core import MCPServer

# Local server via stdio
mcp = MCPServer(
    transport="stdio",
    command="npx",
    args=["-y", "@modelcontextprotocol/server-filesystem", "."],
)

# Remote server via HTTP
mcp = MCPServer(
    transport="http",
    url="https://my-mcp-server.com",
    headers={"Authorization": "Bearer token"},
)

agent = Agent(
    model="anthropic/claude-sonnet-4-6",
    tools=[mcp],
    max_tokens=2048,
)

Conditional workflows

workflow = (
    Workflow(name="pipeline")
    .step(validate_input)
    .step(
        process,
        when=lambda: get_run_context().step_span("validate_input").output["valid"],
        data=lambda: get_run_context().step_span("validate_input").output["data"],
    )
    .step(
        notify_failure,
        when=lambda: not get_run_context().step_span("validate_input").output["valid"],
    )
)

Steps with when= are skipped (not failed) when the condition is False. Downstream steps that depend on a skipped step are also skipped automatically.


Observability

Timbal has a layered tracing system. Every run produces a full span trace.

from timbal.state.tracing.providers import JsonlTracingProvider
from timbal.state.tracing.exporters import OTelExporter
from pathlib import Path

provider = JsonlTracingProvider.configured(
    _path=Path("traces.jsonl"),
    _exporters=[
        OTelExporter(
            endpoint="http://localhost:4318",
            service_name="my-agent",
            headers={"x-honeycomb-team": "YOUR_KEY"},
        ),
    ],
)

agent = Agent(model="...", tracing_provider=provider)

OTelExporter is fire-and-forget — it never adds latency to your runs. Compatible with Jaeger, Honeycomb, Datadog, Grafana Tempo, and any OTLP backend. Custom exporters:

from timbal.state.tracing.providers.base import Exporter

class MyExporter(Exporter):
    async def export(self, run_context) -> None:
        spans = list(run_context._trace.values())
        await my_backend.send(spans)

Session chaining

Link runs so an agent can recall what happened in a previous session — even across process restarts.

from timbal.state.tracing.providers import JsonlTracingProvider
from pathlib import Path

provider = JsonlTracingProvider.configured(_path=Path("sessions.jsonl"))
agent = Agent(model="...", tracing_provider=provider)

run1 = await agent.collect(prompt="My name is Alice.")
print(run1.run_id)   # "abc123"

# Next session — agent remembers the previous one
from timbal.state.context import RunContext
ctx = RunContext(parent_id="abc123", tracing_provider=provider)
run2 = await agent.collect(prompt="What's my name?", run_context=ctx)

HTTP serving

Requires pip install 'timbal[server]'. Serve any agent or workflow over HTTP with one command.

python -m timbal.server.http \
  --import_spec path/to/agent.py::my_agent \
  --host 0.0.0.0 \
  --port 4444 \
  --workers 4
Endpoint Method Description
/healthcheck GET Returns 204
/params_model_schema GET JSON schema of inputs
/return_model_schema GET JSON schema of output
/run POST Execute and wait
/stream POST Stream events as SSE
/cancel/{run_id} POST Cancel a running execution
curl -X POST http://localhost:4444/run \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Hello"}'

Evals

Declarative evaluation suite with built-in validators.

# evals.yaml
evals:
  - name: "greets_user"
    params:
      prompt: "Say hello to Alice"
    agent!:
      output!:
        contains_all!: ["Hello", "Alice"]
      duration!:
        lt!: 5
from timbal.evals.runner import run_eval

result = await run_eval(eval_config, agent=my_agent)
print(result.passed)
print(result.validator_results)

Validators: contains, contains_all, contains_any, starts_with, ends_with, pattern, length, min_length, max_length, eq, lt, gt, type, email, json, semantic (LLM-based), language. All support not_ negation.


Testing without API calls

from timbal.core.test_model import TestModel

model = TestModel(responses=["The answer is 42."])
agent = Agent(name="test", model=model, tools=[])

result = await agent.collect(prompt="What is the answer?")
assert result.output.collect_text() == "The answer is 42."
assert result.status.code == "success"

Responses cycle to the last item when exhausted. Pass Message objects to test tool-calling flows. No network calls.


Hooks

Pre/post hooks run around every execution and have access to the full run context.

def log_pre():
    ctx = get_run_context()
    print(f"Starting run {ctx.id}")

def log_post():
    span = get_run_context().current_span()
    print(f"Output: {span.output}")

tool = Tool(handler=my_fn, pre_hook=log_pre, post_hook=log_post)

Hooks are parameterless callables. Both sync and async are supported.


Running tests

uv run pytest
uv run pytest python/tests/core/test_jsonl_tracing_provider.py
uv run pytest python/tests/core/test_otel_exporter.py::TestRetry

Benchmarks

cd benchmarks/langchain
uv pip install langchain-core langsmith langgraph

# Quick mode (default)
uv run pytest bench_*.py -v

# Full mode
TIMBAL_BENCH_MODE=full uv run pytest bench_*.py -v

See benchmarks/README.md for methodology and how to read results.


Repository structure

timbal/
├── python/
│   ├── timbal/
│   │   ├── core/             # Agent, Workflow, Tool, LLM router, Skills, MCP
│   │   ├── state/            # RunContext, tracing providers + exporters
│   │   ├── types/            # Message, File, Events
│   │   ├── collectors/       # Output processing
│   │   ├── evals/            # Evaluation framework
│   │   ├── server/           # HTTP serving
│   │   ├── platform/         # Timbal platform integration
│   │   └── tools/            # Built-in tool library
│   └── tests/core/
├── benchmarks/
│   ├── README.md
│   └── langchain/
├── CLAUDE.md                 # Codebase guide for AI agents
└── pyproject.toml

Why Timbal

Transparent by default. No hidden magic. Under the hood it's async functions, Pydantic validation, and event-driven streaming — nothing you couldn't build yourself, just already built well.

Production-shaped. The core abstractions were refined through real production deployments before the framework was open-sourced. Fast failure, clear error messages, stable interfaces.

One interface for everything. Agents, workflows, and tools all share the same __call__ / .collect() convention and the same event stream. Compose them freely.

Provider-agnostic. Anthropic, OpenAI, Google, Groq, xAI, Cerebras, SambaNova — same code, swap the model string.


Documentation

docs.timbal.ai

Contributing

Pull requests and issues welcome.

License

Apache 2.0 — see LICENSE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timbal-2.0.0.tar.gz (365.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

timbal-2.0.0-py3-none-any.whl (461.7 kB view details)

Uploaded Python 3

File details

Details for the file timbal-2.0.0.tar.gz.

File metadata

  • Download URL: timbal-2.0.0.tar.gz
  • Upload date:
  • Size: 365.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for timbal-2.0.0.tar.gz
Algorithm Hash digest
SHA256 513e0095ce60c7d15e74b008a5c3a7db45a05bcc816274e9c8217c92df381d4f
MD5 163898da0682a7400b62f0335452e62b
BLAKE2b-256 a57fa40b1a741b4f9befb5583ec40bda2155ef9d42d47d5f31e894726b3055ea

See more details on using hashes here.

File details

Details for the file timbal-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: timbal-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 461.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for timbal-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cff405add1ffaef1b004ddc7ecef3f0693388a635ed261db52284a6c8cf79a1d
MD5 f0ed3d2144290ced93ca16c3bb1dfc10
BLAKE2b-256 b7d18ae2a2ea081bef29151ffc467377819dbfa542b995b881d5e98a5a466353

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page