Production-ready agentic workflow orchestrator with native observability, resilience, and cost control.
Project description
Deterministic LLM workflow orchestration with native observability, resilience, and cost control.
Table of Contents
- Why AgentLoom?
- Quick Start
- Architecture
- Workflow Definition (YAML)
- Python DSL
- Observability Stack
- Why not autonomous agents?
- Development
- Contributing
- License
Why AgentLoom?
Existing frameworks (LangGraph, CrewAI, AutoGen) treat observability and resilience as afterthoughts. AgentLoom is built from the ground up for production: circuit breakers, rate limiting, cost tracking, and OpenTelemetry traces are part of the core design — not plugins.
| Feature | LangGraph | CrewAI | AutoGen | AgentLoom |
|---|---|---|---|---|
| Workflow definition | Python API | Decorators | Agent chat | YAML + Python DSL |
| Observability | LangSmith ($) | Minimal | Minimal | OTel + Prometheus + Grafana |
| Circuit breaker | No | No | No | Built-in |
| Cost tracking | No | No | No | Native with budgets |
| Multi-provider fallback | Manual | No | No | Automatic |
| Dependencies | Heavy | Medium | Medium | Minimal |
Quick Start
# Install
pip install agentloom
# Install with observability (OTel + Prometheus)
pip install agentloom[all]
# Run a workflow
export OPENAI_API_KEY=sk-...
agentloom run examples/01_simple_qa.yaml
# Or with Ollama (free, local)
agentloom run examples/01_simple_qa.yaml --provider ollama --model phi4
# Validate a workflow
agentloom validate examples/03_router_workflow.yaml
# Visualize the DAG
agentloom visualize examples/03_router_workflow.yaml
Architecture
+-----------------------------------------------------+
| CLI / Python API |
+-----------------------------------------------------+
| Workflow Engine |
| +-----------+ +-----------+ +---------------+ |
| |DAG Parser | | Scheduler | | State Manager | |
| |& Validator| | (anyio) | | (Pydantic) | |
| +-----------+ +-----------+ +---------------+ |
+-----------------------------------------------------+
| Step Executors |
| +--------+ +---------+ +------+ +------------+ |
| |LLM Call| |Tool Exec| |Router| | Subworkflow| |
| +--------+ +---------+ +------+ +------------+ |
+-----------------------------------------------------+
| Provider Gateway |
| +-----------------------------------------------+ |
| | OpenAI | Anthropic | Google | Ollama | |
| | + Fallback | Circuit Breaker | Rate Limiter | |
| +-----------------------------------------------+ |
+-----------------------------------------------------+
| Observability (optional) |
| +------------+ +----------+ +----------+ |
| | OTel Traces| |Prometheus| | JSON Logs| |
| +------------+ +----------+ +----------+ |
+-----------------------------------------------------+
Workflow Definition (YAML)
name: classify-and-respond
config:
provider: openai
model: gpt-4o-mini
budget_usd: 0.50
state:
user_input: ""
steps:
- id: classify
type: llm_call
system_prompt: "Classify as: question, complaint, or request."
prompt: "Classify: {state.user_input}"
output: classification
- id: route
type: router
depends_on: [classify]
conditions:
- expression: "state.classification == 'question'"
target: answer
default: general_response
- id: answer
type: llm_call
depends_on: [route]
prompt: "Answer: {state.user_input}"
output: response
- id: general_response
type: llm_call
depends_on: [route]
prompt: "Help with: {state.user_input}"
output: response
Python DSL
from agentloom.core.dsl import workflow
wf = (
workflow("my-workflow", provider="ollama", model="phi4")
.set_state(question="What is Python?")
.add_llm_step("answer", prompt="Answer: {question}", output="answer")
.build()
)
Observability Stack
# Start Prometheus + Grafana + Jaeger
cd deploy && docker compose up -d
# Access:
# Grafana: http://localhost:3000
# Prometheus: http://localhost:9090
# Jaeger: http://localhost:16686
See Dashboard Documentation for panel descriptions, metrics reference, and troubleshooting.
Why not autonomous agents?
Most LLM frameworks focus on autonomous agents: self-directed reasoning, multi-agent delegation, unbounded tool loops. This works for demos and open-ended research, but breaks down in production where you need predictable costs, debuggable failures, and SLA compliance.
AgentLoom is not an autonomous agent framework. There are no self-directed agents, no unbounded loops, no emergent behavior. It is a deterministic workflow orchestrator that uses LLMs as execution steps within a declared DAG.
The difference matters:
- You define the DAG, not the LLM. Steps, dependencies, and routing logic are declared upfront in YAML. The model generates text within a step — it does not decide what runs next. Routers use explicit boolean conditions, not LLM judgement.
- Observability is not optional. Every step emits OpenTelemetry traces and Prometheus metrics. You can see exactly what ran, how long it took, and how much it cost. Autonomous agents are notoriously hard to debug; a static DAG with full tracing is not.
- Cost is bounded. Budget limits, circuit breakers, and rate limiters are first-class. A runaway autonomous agent can burn through an API budget in minutes. A workflow with
budget_usd: 0.50cannot. - Fallback is structural. If OpenAI is down, the gateway falls back to Anthropic or Ollama automatically. This is a routing decision at the infrastructure level, not an agent "choosing" a provider.
Autonomous agent frameworks solve a real problem — open-ended tasks where the execution path cannot be known in advance. But most LLM workloads in production are not open-ended. They are pipelines: classify, enrich, route, generate, validate. For those, you want predictability and control, not autonomy. That is what AgentLoom is for.
Development
uv sync --group dev --all-extras # install with all extras
uv run pytest # 392 tests, ~5s
uv run ruff check src/ tests/ # lint (ruff replaces flake8+isort)
uv run ruff format src/ tests/ # autoformat
uv run mypy src/ # strict type checking
Pre-commit hooks run ruff automatically on staged files — see CONTRIBUTING.md for the full workflow.
Contributing
See CONTRIBUTING.md for setup instructions, code style, and PR guidelines.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentloom-0.1.2-py3-none-any.whl.
File metadata
- Download URL: agentloom-0.1.2-py3-none-any.whl
- Upload date:
- Size: 65.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c3ab2877ad2b5cdd7e7f1858699204917fe31eb9bdd87b4761f7a18aa3f78d1
|
|
| MD5 |
3fe24a474c8af5daefaa63675a9b69a4
|
|
| BLAKE2b-256 |
f45ecd6aebd55057e2ccdfcda5cb4089fb403fb696bb79740830ff92842b8077
|
Provenance
The following attestation bundles were made for agentloom-0.1.2-py3-none-any.whl:
Publisher:
release.yml on cchinchilla-dev/agentloom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentloom-0.1.2-py3-none-any.whl -
Subject digest:
6c3ab2877ad2b5cdd7e7f1858699204917fe31eb9bdd87b4761f7a18aa3f78d1 - Sigstore transparency entry: 1186478747
- Sigstore integration time:
-
Permalink:
cchinchilla-dev/agentloom@b62b7fabb13a42fc2bf2e727378db52c9da70423 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/cchinchilla-dev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b62b7fabb13a42fc2bf2e727378db52c9da70423 -
Trigger Event:
push
-
Statement type: