Skip to main content

Simple, performant, battle-tested framework for building reliable AI applications

Project description

Timbal

Timbal 2.0 is in beta. The API is stable — we're finalizing docs and tooling before the full release.

Simple, performant, battle-tested framework for building reliable AI applications.

Full documentation: docs.timbal.ai


Installation

uv add timbal
pip install timbal

From source

git clone https://github.com/timbal-ai/timbal.git
cd timbal
uv sync --dev

Two patterns, one interface

Agent — autonomous reasoning

The LLM decides what to do. You provide tools and a goal.

from timbal import Agent
from timbal.tools import WebSearch
from datetime import datetime

def get_datetime() -> str:
    return datetime.now().isoformat()

agent = Agent(
    name="assistant",
    model="anthropic/claude-sonnet-4-6",
    tools=[get_datetime, WebSearch()],
    max_tokens=1024,
)

result = await agent.collect(prompt="What time is it in Tokyo right now?")
print(result.output)

Workflow — explicit pipelines

You define the steps. The framework handles concurrency and dependency resolution.

import httpx
from timbal import Workflow
from timbal.state import get_run_context
from timbal.tools import Write

async def fetch(url: str) -> str:
    async with httpx.AsyncClient(follow_redirects=True) as client:
        return (await client.get(url)).text

workflow = (
    Workflow(name="scraper")
    .step(fetch)
    .step(
        Write(),
        path="./output.html",
        content=lambda: get_run_context().step_span("fetch").output,
    )
)

await workflow.collect(url="https://timbal.ai")

Independent steps run concurrently. Dependencies are inferred automatically from step_span() references — no manual wiring needed.


Calling runnables

All Agent, Workflow, and Tool instances share the same interface.

# Collect all events, return final OutputEvent
result = await agent.collect(prompt="Hello")
print(result.output)           # final result
print(result.status.code)      # "success" | "error" | "cancelled"
print(result.usage)            # {"anthropic/claude-sonnet-4-6:input_tokens": 42, ...}

# Or stream events
async for event in agent(prompt="Hello"):
    if isinstance(event, DeltaEvent) and isinstance(event.item, TextDelta):
        print(event.item.text_delta, end="", flush=True)

Models

Any provider, one interface. Model strings follow provider/model-name:

anthropic/claude-sonnet-4-6       openai/gpt-4o
anthropic/claude-opus-4-6         openai/gpt-4o-mini
anthropic/claude-haiku-4-5        openai/o3
google/gemini-2.5-flash           groq/llama-3.3-70b-versatile
google/gemini-2.5-pro-preview     xai/grok-3
cerebras/llama-3.3-70b            sambanova/Meta-Llama-3.3-70B-Instruct

Full list and context window sizes in python/timbal/core/models.py.


Structured output

from pydantic import BaseModel

class Analysis(BaseModel):
    sentiment: str
    confidence: float
    summary: str

agent = Agent(model="openai/gpt-4o-mini", output_model=Analysis)
result = await agent.collect(prompt="Analyse: 'Timbal makes AI easy'")
print(result.output.sentiment)     # "positive"
print(result.output.confidence)    # 0.97

Streaming

from timbal.types.events import DeltaEvent
from timbal.types.events.delta import TextDelta

async for event in agent(prompt="Write a short poem"):
    if isinstance(event, DeltaEvent) and isinstance(event.item, TextDelta):
        print(event.item.text_delta, end="", flush=True)

Memory compaction

Long conversations grow large. Timbal has built-in strategies to keep context under control.

from timbal.core.memory_compaction import (
    compact_tool_results,
    keep_last_n_messages,
    keep_last_n_turns,
    summarize,
)

agent = Agent(
    model="anthropic/claude-sonnet-4-6",
    max_tokens=2048,
    memory_compaction=[
        compact_tool_results(keep_last_n=2),   # compress old tool outputs
        keep_last_n_turns(10),                 # keep last 10 user↔assistant turns
    ],
    memory_compaction_ratio=0.75,              # trigger at 75% context window usage
)

Strategies:

  • compact_tool_results(keep_last_n, threshold, replacement) — strips old tool results, optionally replacing with a summary string
  • keep_last_n_messages(n) — hard truncation, structure-aware (no orphaned tool pairs)
  • keep_last_n_turns(n) — keep last N user+assistant pairs
  • summarize(threshold, model, keep_last_n, max_summary_tokens) — async LLM-based summarization of old messages

Strategies are applied in order. Multiple strategies can be combined.


Skills

Skills are reusable, self-documenting tool packages. They sit on disk and are loaded into the agent's context only when the LLM explicitly requests them via the auto-injected read_skill tool.

skills/
└── web_research/
    ├── SKILL.md          # frontmatter + docs shown to the LLM
    └── tools/
        ├── search.py
        └── scrape.py
# SKILL.md
---
name: "web_research"
description: "Search the web and scrape pages"
---
Use `search(query)` to find pages, then `scrape(url)` to get the content.
agent = Agent(
    model="anthropic/claude-sonnet-4-6",
    skills_path="./skills",
    max_tokens=2048,
)

The agent sees skill names and descriptions at startup. It calls read_skill("web_research") to load the tools and documentation when needed — keeping context clean until the skill is actually required.


MCP servers

Connect agents to any Model Context Protocol server.

from timbal.core import MCPServer

# Local server via stdio
mcp = MCPServer(
    transport="stdio",
    command="npx",
    args=["-y", "@modelcontextprotocol/server-filesystem", "."],
)

# Remote server via HTTP
mcp = MCPServer(
    transport="http",
    url="https://my-mcp-server.com",
    headers={"Authorization": "Bearer token"},
)

agent = Agent(
    model="anthropic/claude-sonnet-4-6",
    tools=[mcp],
    max_tokens=2048,
)

Conditional workflows

workflow = (
    Workflow(name="pipeline")
    .step(validate_input)
    .step(
        process,
        when=lambda: get_run_context().step_span("validate_input").output["valid"],
        data=lambda: get_run_context().step_span("validate_input").output["data"],
    )
    .step(
        notify_failure,
        when=lambda: not get_run_context().step_span("validate_input").output["valid"],
    )
)

Steps with when= are skipped (not failed) when the condition is False. Downstream steps that depend on a skipped step are also skipped automatically.


Observability

Timbal has a layered tracing system. Every run produces a full span trace.

from timbal.state.tracing.providers import JsonlTracingProvider
from timbal.state.tracing.exporters import OTelExporter
from pathlib import Path

provider = JsonlTracingProvider.configured(
    _path=Path("traces.jsonl"),
    _exporters=[
        OTelExporter(
            endpoint="http://localhost:4318",
            service_name="my-agent",
            headers={"x-honeycomb-team": "YOUR_KEY"},
        ),
    ],
)

agent = Agent(model="...", tracing_provider=provider)

OTelExporter is fire-and-forget — it never adds latency to your runs. Compatible with Jaeger, Honeycomb, Datadog, Grafana Tempo, and any OTLP backend. Custom exporters:

from timbal.state.tracing.providers.base import Exporter

class MyExporter(Exporter):
    async def export(self, run_context) -> None:
        spans = list(run_context._trace.values())
        await my_backend.send(spans)

Session chaining

Link runs so an agent can recall what happened in a previous session — even across process restarts.

from timbal.state.tracing.providers import JsonlTracingProvider
from pathlib import Path

provider = JsonlTracingProvider.configured(_path=Path("sessions.jsonl"))
agent = Agent(model="...", tracing_provider=provider)

run1 = await agent.collect(prompt="My name is Alice.")
print(run1.run_id)   # "abc123"

# Next session — agent remembers the previous one
from timbal.state.context import RunContext
ctx = RunContext(parent_id="abc123", tracing_provider=provider)
run2 = await agent.collect(prompt="What's my name?", run_context=ctx)

HTTP serving

Serve any agent or workflow over HTTP with one command.

python -m timbal.server.http \
  --import_spec path/to/agent.py::my_agent \
  --host 0.0.0.0 \
  --port 4444 \
  --workers 4
Endpoint Method Description
/healthcheck GET Returns 204
/params_model_schema GET JSON schema of inputs
/return_model_schema GET JSON schema of output
/run POST Execute and wait
/stream POST Stream events as SSE
/cancel/{run_id} POST Cancel a running execution
curl -X POST http://localhost:4444/run \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Hello"}'

Evals

Declarative evaluation suite with built-in validators.

# evals.yaml
evals:
  - name: "greets_user"
    params:
      prompt: "Say hello to Alice"
    agent!:
      output!:
        contains_all!: ["Hello", "Alice"]
      duration!:
        lt!: 5
from timbal.evals.runner import run_eval

result = await run_eval(eval_config, agent=my_agent)
print(result.passed)
print(result.validator_results)

Validators: contains, contains_all, contains_any, starts_with, ends_with, pattern, length, min_length, max_length, eq, lt, gt, type, email, json, semantic (LLM-based), language. All support not_ negation.


Testing without API calls

from timbal.core.test_model import TestModel

model = TestModel(responses=["The answer is 42."])
agent = Agent(name="test", model=model, tools=[])

result = await agent.collect(prompt="What is the answer?")
assert result.output.collect_text() == "The answer is 42."
assert result.status.code == "success"

Responses cycle to the last item when exhausted. Pass Message objects to test tool-calling flows. No network calls.


Hooks

Pre/post hooks run around every execution and have access to the full run context.

def log_pre():
    ctx = get_run_context()
    print(f"Starting run {ctx.id}")

def log_post():
    span = get_run_context().current_span()
    print(f"Output: {span.output}")

tool = Tool(handler=my_fn, pre_hook=log_pre, post_hook=log_post)

Hooks are parameterless callables. Both sync and async are supported.


Running tests

uv run pytest
uv run pytest python/tests/core/test_jsonl_tracing_provider.py
uv run pytest python/tests/core/test_otel_exporter.py::TestRetry

Benchmarks

cd benchmarks/langchain
uv pip install langchain-core langsmith langgraph

# Quick mode (default)
uv run pytest bench_*.py -v

# Full mode
TIMBAL_BENCH_MODE=full uv run pytest bench_*.py -v

See benchmarks/README.md for methodology and how to read results.


Repository structure

timbal/
├── python/
│   ├── timbal/
│   │   ├── core/             # Agent, Workflow, Tool, LLM router, Skills, MCP
│   │   ├── state/            # RunContext, tracing providers + exporters
│   │   ├── types/            # Message, File, Events
│   │   ├── collectors/       # Output processing
│   │   ├── evals/            # Evaluation framework
│   │   ├── server/           # HTTP serving
│   │   ├── platform/         # Timbal platform integration
│   │   └── tools/            # Built-in tool library
│   └── tests/core/
├── benchmarks/
│   ├── README.md
│   └── langchain/
├── CLAUDE.md                 # Codebase guide for AI agents
└── pyproject.toml

Why Timbal

Transparent by default. No hidden magic. Under the hood it's async functions, Pydantic validation, and event-driven streaming — nothing you couldn't build yourself, just already built well.

Production-shaped. The core abstractions were refined through real production deployments before the framework was open-sourced. Fast failure, clear error messages, stable interfaces.

One interface for everything. Agents, workflows, and tools all share the same __call__ / .collect() convention and the same event stream. Compose them freely.

Provider-agnostic. Anthropic, OpenAI, Google, Groq, xAI, Cerebras, SambaNova — same code, swap the model string.


Documentation

docs.timbal.ai

Contributing

Pull requests and issues welcome.

License

Apache 2.0 — see LICENSE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timbal-2.0.0b1.tar.gz (331.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

timbal-2.0.0b1-py3-none-any.whl (426.1 kB view details)

Uploaded Python 3

File details

Details for the file timbal-2.0.0b1.tar.gz.

File metadata

  • Download URL: timbal-2.0.0b1.tar.gz
  • Upload date:
  • Size: 331.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for timbal-2.0.0b1.tar.gz
Algorithm Hash digest
SHA256 6f26f3d0b79dd949ae75e2ff4fe4d0a230af64d4ae82cb61299681d2a59f5bd2
MD5 e359332f121eb8766ecf623348bf0640
BLAKE2b-256 01a8faa884ff623d101fa7881a49d7d4fe2ae53b0ad54bb48f817b3e261fbd1d

See more details on using hashes here.

File details

Details for the file timbal-2.0.0b1-py3-none-any.whl.

File metadata

  • Download URL: timbal-2.0.0b1-py3-none-any.whl
  • Upload date:
  • Size: 426.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for timbal-2.0.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 03e42d96c3ac933189e2c20e765b88694befa7f26a941898dc399ffc5e771b78
MD5 3da66adf85339881f5d1dad644ec5d0b
BLAKE2b-256 8bbfa9e2798253dcab811f316284a8548562454bb65a7e875a840045d955a88a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page