Debuggable runtime for AI agent pipelines
Project description
Binex
Debuggable runtime for AI agent pipelines
Orchestrate multi-agent workflows. Trace every step. Replay and diff runs.
Why Binex?
Building multi-agent systems is hard. Debugging them is harder. Binex gives you:
- YAML-first workflows — define agent pipelines as readable DAGs, not tangled code
- Full execution tracing — every node call, every artifact, every millisecond recorded
- Post-mortem debugging — inspect any run after the fact with rich, filterable reports
- Replay with agent swap — re-run a workflow substituting different LLMs or agents
- Run diffing — compare two executions side-by-side to spot regressions
- Human-in-the-loop — approval gates and free-text input with conditional branching
Demo
A multi-provider research pipeline: Ollama runs locally for planning and summarization, OpenRouter calls cloud models for parallel research — all in one YAML file.
Requirements to run this demo
- Ollama installed and running locally
- Model pulled:
ollama pull gemma3:4b - Free OpenRouter API key (set
OPENROUTER_API_KEYin.env) - Binex installed:
pip install -e .
# examples/multi-provider-demo.yaml
name: multi-provider-research
nodes:
user_input:
agent: "human://input" # ask the user for a topic
planner:
agent: "llm://ollama/gemma3:4b" # local LLM plans the research
system_prompt: "Create a structured research plan with 3 subtopics..."
inputs: { topic: "${user_input.result}" }
depends_on: [user_input]
researcher1:
agent: "llm://openrouter/z-ai/glm-4.5-air:free" # cloud model researches subtopic 1
inputs: { plan: "${planner.result}" }
depends_on: [planner]
researcher2:
agent: "llm://openrouter/stepfun/step-3.5-flash:free" # cloud model researches subtopic 2
inputs: { plan: "${planner.result}" }
depends_on: [planner]
summarizer:
agent: "llm://ollama/gemma3:4b" # local LLM combines findings
inputs: { research1: "${researcher1.result}", research2: "${researcher2.result}" }
depends_on: [researcher1, researcher2]
graph LR
A["user_input<br/><sub>human://input</sub>"] --> B["planner<br/><sub>ollama/gemma3:4b</sub>"]
B --> C["researcher1<br/><sub>openrouter/glm-4.5-air</sub>"]
B --> D["researcher2<br/><sub>openrouter/step-3.5-flash</sub>"]
C --> E["summarizer<br/><sub>ollama/gemma3:4b</sub>"]
D --> E
Run it, explore results, debug the execution:
Quickstart
# Clone
git clone https://github.com/Alexli18/binex.git
cd binex
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# Install
pip install -e .
# Run the zero-config demo
binex hello
# Run a workflow
binex run examples/simple.yaml --var input="hello world"
# Debug a completed run
binex debug <run-id>
binex debug latest # shortcut for the most recent run
# Optional: rich colored output
pip install -e ".[rich]"
binex debug latest --rich
See it in action
$ binex hello
Running built-in hello-world workflow...
[1/2] greeter ...
[greeter] -> result:
Hello from Binex!
[2/2] responder ...
[responder] -> result:
{"greeter": "Hello from Binex!"}
Run completed (2/2 nodes)
Run ID: run_d71c9a50
Next steps:
binex debug run_d71c9a50 — inspect the run
binex init — create your own project
binex run examples/simple.yaml — try a workflow file
$ binex run examples/simple.yaml --var input="hello world"
Run ID: run_69651bec
Workflow: simple-pipeline
Status: completed
Nodes: 2/2 completed
╭──────────────────────── consumer ────────────────────────╮
│ { "art_producer": { "msg": "hello world" } } │
╰──────────────────────── result ──────────────────────────╯
Trace & Debug
Every run is fully recorded. Inspect the execution timeline and DAG:
binex trace <run-id>
Compare two runs side-by-side — spot status changes, latency deltas, and output differences:
binex diff <run-a> <run-b>
Post-mortem debug of a failed run — see errors, prompts, and artifacts per node:
binex debug <run-id> --errors --rich
How It Works
Define a workflow in YAML. Binex builds a DAG, schedules nodes respecting dependencies, dispatches each to the right agent adapter, and records everything.
name: research-pipeline
description: "Fan-out research with human approval"
nodes:
planner:
agent: "llm://openai/gpt-4"
system_prompt: "Break this topic into 3 research questions"
inputs:
topic: "${user.topic}"
outputs: [questions]
researcher_1:
agent: "llm://anthropic/claude-sonnet-4-20250514"
inputs: { question: "${planner.questions}" }
outputs: [findings]
depends_on: [planner]
researcher_2:
agent: "a2a://localhost:8001"
inputs: { question: "${planner.questions}" }
outputs: [findings]
depends_on: [planner]
reviewer:
agent: "human://approve"
inputs:
draft: "${researcher_1.findings}"
outputs: [decision]
depends_on: [researcher_1, researcher_2]
summarizer:
agent: "llm://openai/gpt-4"
inputs:
research: "${researcher_1.findings}"
outputs: [summary]
depends_on: [reviewer]
when: "${reviewer.decision} == approved"
graph TD
A[planner] --> B[researcher_1]
A --> C[researcher_2]
B --> D["reviewer (human approval)"]
C --> D
D -->|approved| E[summarizer]
Architecture
block-beta
columns 3
CLI["CLI\nrun · debug · trace · replay · diff · dev"]:3
Runtime["Runtime\nOrchestrator + Dispatcher"]:3
Adapters["Adapters\nlocal:// · llm:// · a2a:// · human://"] Graph["Graph\nDAG · topo-sort · cycle detect"] Spec["Workflow Spec\nYAML loader · validation"]
Stores["Stores\nSQLite executions + FS artifacts"]:3
Models["Models\nWorkflow · Node · Artifact · Execution"]:3
Features
Agent Adapters
| Prefix | Adapter | Description |
|---|---|---|
local:// |
LocalPythonAdapter | In-process Python callable |
llm:// |
LLMAdapter | LLM completion via LiteLLM (40+ providers) |
a2a:// |
A2AAgentAdapter | Remote agent via A2A protocol |
human://input |
HumanInputAdapter | Terminal prompt for free-text input |
human://approve |
HumanApprovalAdapter | Approval gate with conditional branching |
CLI Commands
| Command | Description |
|---|---|
binex run <workflow.yaml> |
Execute a workflow |
binex debug <run-id|latest> |
Post-mortem inspection (--json, --errors, --node, --rich) |
binex trace <run-id> |
Execution timeline, node details, or DAG graph |
binex replay <run-id> |
Re-run with optional agent swaps |
binex diff <run1> <run2> |
Compare two runs side-by-side |
binex artifacts list <run-id> |
List artifacts with lineage tracking |
binex validate <workflow.yaml> |
Validate YAML before execution |
binex scaffold workflow "A -> B" |
Generate workflow from DSL shorthand |
binex start |
Interactive wizard to create a workflow step-by-step |
binex init |
Interactive project setup (workflow / agent / full) |
binex dev up |
Start Docker dev stack (Ollama + LiteLLM + Registry) |
binex doctor |
Check system health |
binex explore |
Interactive browser for runs and artifacts |
binex hello |
Zero-config demo |
DSL Shorthand
Generate workflows from simple expressions:
binex scaffold workflow "planner -> researcher, analyst -> summarizer"
Nine built-in patterns available: simple, diamond, fan-out, fan-in, map-reduce, and more.
LLM Providers
Out-of-the-box support for 9 providers via LiteLLM:
OpenAI · Anthropic · Google Gemini · Ollama · OpenRouter · Groq · Mistral · DeepSeek · Together AI
Project Structure
src/binex/
├── adapters/ # Agent execution backends (local, LLM, A2A, human)
├── agents/ # Built-in agent implementations
├── cli/ # Click CLI commands
├── graph/ # DAG construction + topological scheduling
├── models/ # Pydantic v2 domain models
├── registry/ # FastAPI agent registry service
├── runtime/ # Orchestrator, dispatcher, lifecycle
├── stores/ # SQLite execution + filesystem artifact persistence
├── trace/ # Debug reports, lineage, timeline, diffing
├── workflow_spec/ # YAML loader + validator + variable resolution
└── tools.py # Tool calling support (@tool decorator)
Built With
Examples
The examples/ directory contains 22 ready-to-run workflows:
| Example | What it demonstrates |
|---|---|
hello-world.yaml |
Minimal two-node pipeline |
diamond.yaml |
Diamond dependency pattern |
fan-out-fan-in.yaml |
Parallel research with aggregation |
human-in-the-loop.yaml |
Approval gates and conditional branching |
multi-provider-research.yaml |
Multiple LLM providers in one workflow |
a2a-multi-agent.yaml |
Remote agents via A2A protocol |
conditional-routing.yaml |
Branch based on node output |
map-reduce.yaml |
MapReduce-style aggregation |
Documentation
Full docs available at alexli18.github.io/binex:
- Quickstart — install and run your first workflow
- Concepts — agents, workflows, artifacts, execution model
- CLI Reference — every command with options and examples
- Architecture — runtime internals and design decisions
- Workflow Format — complete YAML schema reference
Development
# Clone
git clone https://github.com/Alexli18/binex.git
cd binex
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests (870 tests, 96% coverage)
python -m pytest tests/
# Lint
ruff check src/
# Start dev environment (Ollama + LiteLLM + Registry)
binex dev up
Roadmap
See ROADMAP.md for the full roadmap, or a summary below:
- Web UI for execution visualization
- Plugin system for custom adapters
- Framework adapters (LangChain, CrewAI, AutoGen)
- Workflow versioning and migration
- Distributed execution across multiple runtimes
- OpenTelemetry integration for observability
See the open issues for a full list of proposed features and known issues.
Contributing
Contributions are welcome! Here's how:
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/amazing-feature) - Commit your Changes (
git commit -m 'Add amazing feature') - Push to the Branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file binex-0.1.0a0.tar.gz.
File metadata
- Download URL: binex-0.1.0a0.tar.gz
- Upload date:
- Size: 3.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59325a136225e7ea91a204c1d3d6d1d0ad5a78015ccf19fdfdc95a1de79c307e
|
|
| MD5 |
8378a7a97295f9cf9a4f4061a5497840
|
|
| BLAKE2b-256 |
2462f8bfb2b3c237b4985d09a4d52738ef4303d72976ed9089cedec4e6c852db
|
File details
Details for the file binex-0.1.0a0-py3-none-any.whl.
File metadata
- Download URL: binex-0.1.0a0-py3-none-any.whl
- Upload date:
- Size: 97.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d41d22fbac6ba9775565341a8ba0f25a5ffef0eaafe7ac02ec00f432da24f7bb
|
|
| MD5 |
a1bc077be1d0c09769e5103bf8ca91aa
|
|
| BLAKE2b-256 |
52eb896d94a6c2db38473e6c9891ae0b280d202e5f9b92e9edda6446dd9ece0d
|