Skip to main content

YAML-first framework for building LLM pipelines with LangGraph

Project description

YamlGraph

PyPI version Python 3.11+ License: MIT

A YAML-first framework for building LLM pipelines using:

  • YAML Graph Configuration - Declarative pipeline definition with schema validation
  • YAML Prompts - Declarative prompt templates with Jinja2 support
  • Pydantic Models - Structured LLM outputs
  • Multi-Provider LLMs - Support for Anthropic, Mistral, and OpenAI
  • LangGraph - Pipeline orchestration with resume support
  • Human-in-the-Loop - Interrupt nodes for user input
  • Streaming - Token-by-token LLM output
  • Async Support - FastAPI-ready async execution
  • Checkpointers - Memory, SQLite, and Redis state persistence
  • Graph-Relative Prompts - Colocate prompts with graphs
  • JSON Extraction - Auto-extract JSON from LLM responses
  • LangSmith - Observability and tracing
  • JSON Export - Result serialization

What is YAMLGraph?

YAMLGraph is a declarative LLM pipeline orchestration framework that lets you define complex AI workflows entirely in YAMLโ€”no Python required for 60-80% of use cases. Built on LangGraph, it provides multi-provider LLM support (Anthropic, OpenAI, Mistral, Replicate), parallel batch processing via map nodes (using LangGraph Send), LLM-driven conditional routing, and human-in-the-loop interrupts with checkpointing. Pipelines are version-controlled, linted, and observable via LangSmith. The key insight: by constraining the API surface to YAML + Jinja2 templates + Pydantic schemas, YAMLGraph trades some flexibility for dramatically faster prototyping, easier maintenance, and built-in best practicesโ€”making it ideal for teams who want production-ready AI pipelines without the complexity of full-code frameworks.

Installation

From PyPI

pip install yamlgraph

# With Redis support for distributed checkpointing
pip install yamlgraph[redis]

From Source

git clone https://github.com/sheikkinen/yamlgraph.git
cd yamlgraph
pip install -e ".[dev]"

Quick Start

1. Create a Prompt

Create prompts/greet.yaml:

system: |
  You are a friendly assistant.

user: |
  Say hello to {name} in a {style} way.

2. Create a Graph

Create graphs/hello.yaml:

version: "1.0"
name: hello-world

nodes:
  greet:
    type: llm
    prompt: greet
    variables:
      name: "{state.name}"
      style: "{state.style}"
    state_key: greeting

edges:
  - from: START
    to: greet
  - from: greet
    to: END

3. Set API Key

export ANTHROPIC_API_KEY=your-key-here
# Or: export MISTRAL_API_KEY=... or OPENAI_API_KEY=...

4. Run It

yamlgraph graph run graphs/hello.yaml --var name="World" --var style="enthusiastic"

Or use the Python API:

from yamlgraph.graph_loader import load_and_compile

graph = load_and_compile("graphs/hello.yaml")
app = graph.compile()
result = app.invoke({"name": "World", "style": "enthusiastic"})
print(result["greeting"])

More Examples

# Content generation pipeline
yamlgraph graph run graphs/yamlgraph.yaml --var topic="AI" --var style=casual

# Sentiment-based routing
yamlgraph graph run graphs/router-demo.yaml --var message="I love this!"

# Self-correction loop (Reflexion pattern)
yamlgraph graph run graphs/reflexion-demo.yaml --var topic="climate change"

# AI agent with shell tools
yamlgraph graph run graphs/git-report.yaml --var input="What changed recently?"

# Web research agent (requires: pip install yamlgraph[websearch])
yamlgraph graph run graphs/web-research.yaml --var topic="LangGraph tutorials"

# Code quality analysis with shell tools
yamlgraph graph run graphs/code-analysis.yaml --var path="yamlgraph" --var package="yamlgraph"

# Implementation agent - analyze code and generate plans
yamlgraph graph run examples/codegen/impl-agent.yaml \
  --var 'story=Add timeout to websearch' --var scope=yamlgraph/tools

# Meta: YAMLGraph brainstorms its own features
yamlgraph graph run graphs/feature-brainstorm.yaml --var focus="tools"

# Parallel fan-out with map nodes
yamlgraph graph run examples/storyboard/animated-character-graph.yaml \
  --var concept="A brave mouse knight" --var model=hidream

Human-in-the-Loop (Interrupt Nodes)

Create interactive workflows that pause for user input:

# graphs/interview.yaml
checkpointer:
  type: memory

nodes:
  ask_name:
    type: interrupt
    message: "What is your name?"
    resume_key: user_name

  greet:
    type: llm
    prompt: greet
    variables:
      name: "{state.user_name}"
from langgraph.types import Command
from yamlgraph.executor_async import load_and_compile_async, run_graph_async

app = await load_and_compile_async("graphs/interview.yaml")
config = {"configurable": {"thread_id": "session-1"}}

result = await run_graph_async(app, {}, config)
# result["__interrupt__"] contains the question

result = await run_graph_async(app, Command(resume="Alice"), config)
# result["greeting"] contains personalized response

Streaming

Token-by-token LLM output for real-time UX:

from yamlgraph.executor_async import execute_prompt_streaming

async for token in execute_prompt_streaming("greet", {"name": "World"}):
    print(token, end="", flush=True)

Or in YAML nodes:

nodes:
  generate:
    type: llm
    prompt: story
    stream: true

Graph-Relative Prompts

Colocate prompts with graphs for cleaner project structures:

# questionnaires/audit/graph.yaml
defaults:
  prompts_relative: true  # Resolve prompts from graph directory

nodes:
  opening:
    type: llm
    prompt: prompts/opening  # โ†’ questionnaires/audit/prompts/opening.yaml

JSON Extraction

Auto-extract JSON from LLM responses wrapped in markdown:

nodes:
  extract:
    type: llm
    prompt: extract_fields
    state_key: data
    parse_json: true  # {"key": "value"} instead of "```json..."

CLI Utilities

yamlgraph graph list                         # List available graphs
yamlgraph graph info graphs/router-demo.yaml # Show graph structure
yamlgraph graph validate graphs/*.yaml       # Validate graph schemas
yamlgraph graph lint graphs/*.yaml           # Lint graphs for common issues
yamlgraph graph codegen graphs/my-graph.yaml # Generate TypedDict for IDE support
yamlgraph graph mermaid graphs/my-graph.yaml # Generate Mermaid diagram

IDE Type Support

Generate TypedDict code from your graph for IDE autocomplete and type checking:

# Generate to stdout
yamlgraph graph codegen graphs/interview-demo.yaml

# Write to file
yamlgraph graph codegen graphs/interview-demo.yaml -o interview_state.py

# Include base fields (thread_id, errors, etc.)
yamlgraph graph codegen graphs/interview-demo.yaml --include-base

Use the generated types in your Python code:

from interview_state import InterviewDemoState

def process_result(state: InterviewDemoState) -> None:
    print(state["questions"])  # IDE autocomplete works!

JSON Schema for YAML Validation

Export JSON Schema for VS Code YAML extension support:

# Export schema to file
yamlgraph schema export --output schemas/graph-schema.json

# Print to stdout
yamlgraph schema export

# Get path to bundled schema
yamlgraph schema path

Configure VS Code (.vscode/settings.json):

{
  "yaml.schemas": {
    "./schemas/graph-schema.json": ["**/graphs/*.yaml", "**/graph.yaml"]
  }
}

Documentation

๐Ÿ“š Start here: reference/README.md - Complete index of all 18 reference docs

Reading Order

Level Document Description
๐ŸŸข Beginner Quick Start Create your first pipeline in 5 minutes
๐ŸŸข Beginner Graph YAML Node types, edges, tools, state
๐ŸŸข Beginner Prompt YAML Schema and template syntax
๐ŸŸก Intermediate Common Patterns Router, loops, agents
๐ŸŸก Intermediate Map Nodes Parallel fan-out processing
๐ŸŸก Intermediate Interrupt Nodes Human-in-the-loop
๐Ÿ”ด Advanced Subgraph Nodes Modular graph composition
๐Ÿ”ด Advanced Async Usage FastAPI integration
๐Ÿ”ด Advanced Checkpointers State persistence

Architecture

๐Ÿ—๏ธ For core developers: See ARCHITECTURE.md for internal design, extension points, and contribution guidelines.

Data Flow

flowchart TB
    subgraph Input["๐Ÿ“ฅ Input Layer"]
        CLI["CLI Command"]
        YAML_G["graphs/*.yaml"]
        YAML_P["prompts/*.yaml"]
    end

    subgraph Core["โš™๏ธ Core Processing"]
        GL["graph_loader.py<br/>YAML โ†’ StateGraph"]
        NF["node_factory.py<br/>Create Node Functions"]
        EH["error_handlers.py<br/>Skip/Retry/Fail/Fallback"]
        EX["executor.py<br/>Prompt Execution"]
    end

    subgraph LLM["๐Ÿค– LLM Layer"]
        LF["llm_factory.py"]
        ANT["Anthropic"]
        MIS["Mistral"]
        OAI["OpenAI"]
    end

    subgraph State["๐Ÿ’พ State Layer"]
        SB["state_builder.py<br/>Dynamic TypedDict"]
        CP["checkpointer.py<br/>SQLite Persistence"]
        DB[(SQLite DB)]
    end

    subgraph Output["๐Ÿ“ค Output Layer"]
        EXP["export.py"]
        JSON["JSON Export"]
        LS["LangSmith Traces"]
    end

    CLI --> GL
    YAML_G --> GL
    YAML_P --> EX
    GL --> NF
    NF --> EH
    EH --> EX
    EX --> LF
    LF --> ANT & MIS & OAI
    GL --> SB
    SB --> CP
    CP --> DB
    EX --> EXP
    EXP --> JSON
    EX --> LS

Directory Structure

yamlgraph/
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ pyproject.toml        # Package definition with CLI entry point and dependencies
โ”œโ”€โ”€ .env.sample           # Environment template
โ”‚
โ”œโ”€โ”€ graphs/               # YAML graph definitions
โ”‚   โ”œโ”€โ”€ yamlgraph.yaml    # Main pipeline definition
โ”‚   โ”œโ”€โ”€ router-demo.yaml  # Tone-based routing demo
โ”‚   โ”œโ”€โ”€ reflexion-demo.yaml # Self-refinement loop demo
โ”‚   โ””โ”€โ”€ git-report.yaml   # AI agent demo with shell tools
โ”‚
โ”œโ”€โ”€ yamlgraph/            # Main package
โ”‚   โ”œโ”€โ”€ __init__.py       # Package exports
โ”‚   โ”œโ”€โ”€ builder.py        # Graph builders (loads from YAML)
โ”‚   โ”œโ”€โ”€ graph_loader.py   # YAML โ†’ LangGraph compiler
โ”‚   โ”œโ”€โ”€ config.py         # Centralized configuration
โ”‚   โ”œโ”€โ”€ executor.py       # YAML prompt executor
โ”‚   โ”œโ”€โ”€ cli.py            # CLI commands
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ models/           # Pydantic models
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ schemas.py    # Framework schemas (ErrorType, PipelineError, GenericReport)
โ”‚   โ”‚   โ”œโ”€โ”€ state_builder.py  # Dynamic state generation from YAML
โ”‚   โ”‚   โ””โ”€โ”€ graph_schema.py   # Pydantic schema validation
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ tools/            # Tool execution
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ shell.py      # Shell command executor
โ”‚   โ”‚   โ”œโ”€โ”€ nodes.py      # Tool node factory
โ”‚   โ”‚   โ””โ”€โ”€ agent.py      # Agent node factory
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ storage/          # Persistence layer
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ database.py   # SQLite wrapper
โ”‚   โ”‚   โ””โ”€โ”€ export.py     # JSON export
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€ utils/            # Utilities
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ llm_factory.py # Multi-provider LLM creation
โ”‚       โ””โ”€โ”€ langsmith.py  # Tracing helpers
โ”‚
โ”œโ”€โ”€ prompts/              # YAML prompt templates
โ”‚   โ”œโ”€โ”€ greet.yaml
โ”‚   โ”œโ”€โ”€ analyze.yaml
โ”‚   โ”œโ”€โ”€ analyze_list.yaml # Jinja2 example with loops/filters
โ”‚   โ”œโ”€โ”€ generate.yaml
โ”‚   โ”œโ”€โ”€ summarize.yaml
โ”‚   โ””โ”€โ”€ router-demo/      # Tone routing prompts
โ”‚       โ”œโ”€โ”€ classify_tone.yaml
โ”‚       โ”œโ”€โ”€ respond_positive.yaml
โ”‚       โ”œโ”€โ”€ respond_negative.yaml
โ”‚       โ””โ”€โ”€ respond_neutral.yaml
โ”‚
โ”œโ”€โ”€ reference/            # YAML configuration reference docs
โ”‚   โ”œโ”€โ”€ README.md         # Overview and key concepts
โ”‚   โ”œโ”€โ”€ quickstart.md     # 5-minute getting started guide
โ”‚   โ”œโ”€โ”€ graph-yaml.md     # Graph YAML reference
โ”‚   โ”œโ”€โ”€ prompt-yaml.md    # Prompt YAML reference
โ”‚   โ””โ”€โ”€ patterns.md       # Common patterns and examples
โ”‚
โ”œโ”€โ”€ tests/                # Test suite
โ”‚   โ”œโ”€โ”€ conftest.py       # Shared fixtures
โ”‚   โ”œโ”€โ”€ unit/             # Unit tests
โ”‚   โ””โ”€โ”€ integration/      # Integration tests
โ”‚
โ””โ”€โ”€ outputs/              # Generated files (gitignored)

## Pipeline Flow

```mermaid
graph TD
    A["๐Ÿ“ generate"] -->|content| B{should_continue}
    B -->|"โœ“ content exists"| C["๐Ÿ” analyze"]
    B -->|"โœ— error/empty"| F["๐Ÿ›‘ END"]
    C -->|analysis| D["๐Ÿ“Š summarize"]
    D -->|final_summary| F

    style A fill:#e1f5fe
    style C fill:#fff3e0
    style D fill:#e8f5e9
    style F fill:#fce4ec

Node Outputs

Node Output Type Description
generate Inline schema Title, content, word_count, tags
analyze Inline schema Summary, key_points, sentiment, confidence
summarize str Final combined summary

Output schemas are defined inline in YAML prompt files using the schema: block.

Resume Flow

Pipelines can be resumed from any checkpoint. The resume behavior uses skip_if_exists: nodes check if their output already exists in state and skip LLM calls if so.

graph LR
    subgraph "Resume after 'analyze' completed"
        A1["Load State"] --> B1["analyze (skipped)"] --> C1["summarize"] --> D1["END"]
    end
# Resume an interrupted run (using checkpointer)
yamlgraph graph run graphs/my-graph.yaml --thread abc123

When resumed:

  • Nodes with existing outputs are skipped (no duplicate LLM calls)
  • Only nodes without outputs in state actually run
  • State is preserved via SQLite checkpointing

Key Patterns

1. YAML Prompt Templates

Simple Templating (Basic Substitution):

# prompts/generate.yaml
system: |
  You are a creative content writer...

user: |
  Write about: {topic}
  Target length: approximately {word_count} words

Advanced Templating (Jinja2):

# prompts/analyze_list.yaml
template: |
  Analyze the following {{ items|length }} items:

  {% for item in items %}
  ### {{ loop.index }}. {{ item.title }}
  Topic: {{ item.topic }}
  {% if item.tags %}
  Tags: {{ item.tags | join(", ") }}
  {% endif %}
  {% endfor %}

Template Features:

  • Auto-detection: Uses Jinja2 if {{ or {% present, otherwise simple formatting
  • Loops: {% for item in items %}...{% endfor %}
  • Conditionals: {% if condition %}...{% endif %}
  • Filters: {{ text[:50] }}, {{ items | join(", ") }}, {{ name | upper }}
  • Backward compatible: Existing {variable} prompts work unchanged

2. Structured Executor

from yamlgraph.executor import execute_prompt
from yamlgraph.models import GenericReport

result = execute_prompt(
    "generate",
    variables={"topic": "AI", "word_count": 300},
    output_model=GenericReport,
)
print(result.title)  # Typed access!

3. Multi-Provider LLM Support

from yamlgraph.executor import execute_prompt

# Use default provider (Anthropic)
result = execute_prompt(
    "greet",
    variables={"name": "Alice", "style": "formal"},
)

# Switch to Mistral
result = execute_prompt(
    "greet",
    variables={"name": "Bob", "style": "casual"},
    provider="mistral",
)

# Or set via environment variable
# PROVIDER=openai yamlgraph graph run ...

Supported providers:

  • Anthropic (default): Claude models
  • Mistral: Mistral Large and other models
  • OpenAI: GPT-4 and other models

Provider selection priority:

  1. Function parameter: execute_prompt(..., provider="mistral")
  2. YAML metadata: provider: mistral in prompt file
  3. Environment variable: PROVIDER=mistral
  4. Default: anthropic

4. YAML Graph Configuration

Pipelines are defined declaratively in YAML and compiled to LangGraph:

# graphs/yamlgraph.yaml
version: "1.0"
name: yamlgraph-demo
description: Content generation pipeline

defaults:
  provider: mistral
  temperature: 0.7

nodes:
  generate:
    type: llm
    prompt: generate
    output_schema:  # Inline schema - no Python model needed!
      title: str
      content: str
      word_count: int
      tags: list[str]
    temperature: 0.8
    variables:
      topic: "{state.topic}"
      word_count: "{state.word_count}"
      style: "{state.style}"
    state_key: generated

  analyze:
    type: llm
    prompt: analyze
    output_schema:  # Inline schema
      summary: str
      key_points: list[str]
      sentiment: str
      confidence: float
    temperature: 0.3
    variables:
      content: "{state.generated.content}"
    state_key: analysis
    requires: [generated]

  summarize:
    type: llm
    prompt: summarize
    temperature: 0.5
    state_key: final_summary
    requires: [generated, analysis]

edges:
  - from: START
    to: generate
  - from: generate
    to: analyze
    condition: continue
  - from: generate
    to: END
    condition: end
  - from: analyze
    to: summarize
  - from: summarize
    to: END

Load and run:

from yamlgraph.graph_loader import load_and_compile

graph = load_and_compile("graphs/yamlgraph.yaml").compile()
result = graph.invoke(initial_state)

5. State Persistence

Use LangGraph checkpointers for state persistence:

# In graph.yaml
checkpointer:
  type: sqlite
  path: ~/.yamlgraph/checkpoints.db
# Resume from checkpoint
yamlgraph graph run graphs/my-graph.yaml --thread my-session

6. LangSmith Tracing

from yamlgraph.utils.langsmith import print_run_tree

print_run_tree(verbose=True)
# ๐Ÿ“Š Execution Tree:
# โ””โ”€ yamlgraph_pipeline (12.3s) โœ…
#    โ”œโ”€ generate (5.2s) โœ…
#    โ”œโ”€ analyze (3.1s) โœ…
#    โ””โ”€ summarize (4.0s) โœ…

Self-Correcting Pipelines

Use LangSmith tools to let agents inspect previous runs and fix errors:

from yamlgraph.utils.langsmith import get_run_details, get_run_errors, get_failed_runs

# Get details of the last run
details = get_run_details()  # or get_run_details("specific-run-id")
print(details["status"])  # "success" or "error"

# Get all errors from a run and its child nodes
errors = get_run_errors()
for e in errors:
    print(f"{e['node']}: {e['error']}")

# List recent failed runs
failures = get_failed_runs(limit=5)

As agent tools (see reference/langsmith-tools.md):

tools:
  check_last_run:
    type: python
    module: yamlgraph.tools.langsmith_tools
    function: get_run_details_tool
    description: "Get status and errors from the last pipeline run"

  get_errors:
    type: python
    module: yamlgraph.tools.langsmith_tools
    function: get_run_errors_tool
    description: "Get detailed error info from a run"

nodes:
  self_correct:
    type: agent
    prompt: error_analyzer
    tools: [check_last_run, get_errors]
    max_iterations: 3

7. Shell Tools & Agent Nodes

Define shell tools and let the LLM decide when to use them:

# graphs/git-report.yaml
tools:
  recent_commits:
    type: shell
    command: git log --oneline -n {count}
    description: "List recent commits"

  changed_files:
    type: shell
    command: git diff --name-only HEAD~{n}
    description: "List files changed in last n commits"

nodes:
  analyze:
    type: agent              # LLM decides which tools to call
    prompt: git_analyst
    tools: [recent_commits, changed_files]
    max_iterations: 8
    state_key: analysis

Run the git analysis agent:

yamlgraph git-report -q "What changed recently?"
yamlgraph git-report -q "Summarize the test directory"

8. Web Search Tools

Enable agents to search the web using DuckDuckGo (no API key required):

pip install yamlgraph[websearch]
# graphs/web-research.yaml
tools:
  search_web:
    type: websearch
    provider: duckduckgo
    max_results: 5
    description: "Search the web for current information"

nodes:
  research:
    type: agent
    prompt: web-research/researcher
    tools: [search_web]
    max_iterations: 5
    state_key: research

Run web research:

yamlgraph graph run graphs/web-research.yaml --var topic="LangGraph tutorials"

9. Code Quality Analysis

Run automated code analysis with shell-based quality tools:

# graphs/code-analysis.yaml
state:
  path: str      # Directory to analyze
  package: str   # Package name for coverage

tools:
  run_ruff:
    type: shell
    command: ruff check {path} --output-format=text 2>&1
    description: "Run ruff linter for code style issues"

  run_tests:
    type: shell
    command: python -m pytest {path} -q --tb=no 2>&1 | tail -10
    description: "Run pytest"

  run_bandit:
    type: shell
    command: bandit -r {path} -ll -q 2>&1
    description: "Security vulnerability scanner"

nodes:
  run_analysis:
    type: agent
    prompt: code-analysis/analyzer
    tools: [run_ruff, run_tests, run_coverage, run_bandit, run_radon, run_vulture]
    max_iterations: 12
    state_key: analysis_results

  generate_recommendations:
    type: llm
    prompt: code-analysis/recommend
    requires: [analysis_results]
    state_key: recommendations

Run code analysis (yamlgraph analyzes itself!):

yamlgraph graph run graphs/code-analysis.yaml --var path="yamlgraph" --var package="yamlgraph"

Tool types:

  • type: shell - Execute shell commands with variable substitution
  • type: websearch - Web search via DuckDuckGo (provider: duckduckgo)
  • type: python - Execute custom Python functions

Node types:

  • type: llm - Standard LLM call with structured output
  • type: router - Classify and route to different paths
  • type: map - Parallel fan-out over lists with Send()
  • type: python - Execute custom Python functions
  • type: agent - LLM loop that autonomously calls tools

Environment Variables

Variable Required Description
ANTHROPIC_API_KEY Yes* Anthropic API key (* if using Anthropic)
MISTRAL_API_KEY No Mistral API key (required if using Mistral)
OPENAI_API_KEY No OpenAI API key (required if using OpenAI)
PROVIDER No Default LLM provider (anthropic/mistral/openai)
ANTHROPIC_MODEL No Anthropic model (default: claude-sonnet-4-20250514)
MISTRAL_MODEL No Mistral model (default: mistral-large-latest)
OPENAI_MODEL No OpenAI model (default: gpt-4o)
LANGCHAIN_TRACING No Enable LangSmith tracing
LANGCHAIN_API_KEY No LangSmith API key
LANGCHAIN_ENDPOINT No LangSmith endpoint URL
LANGCHAIN_PROJECT No LangSmith project name

Testing

Run the test suite:

# Run all tests
pytest tests/ -v

# Run only unit tests
pytest tests/unit/ -v

# Run only integration tests
pytest tests/integration/ -v

# Run with coverage report
pytest tests/ --cov=yamlgraph --cov-report=term-missing

# Run with HTML coverage report
pytest tests/ --cov=yamlgraph --cov-report=html
# Then open htmlcov/index.html

Current coverage: 92% overall, 90% on graph_loader, 100% on builder/llm_factory.

Extending the Pipeline

Adding a New Node (YAML-First Approach)

Let's add a "fact_check" node that verifies generated content:

Step 1: Define the output schema (yamlgraph/models/schemas.py):

class FactCheck(BaseModel):
    """Structured fact-checking output."""

    claims: list[str] = Field(description="Claims identified in content")
    verified: bool = Field(description="Whether claims are verifiable")
    confidence: float = Field(ge=0.0, le=1.0, description="Verification confidence")
    notes: str = Field(description="Additional context")

Step 2: Create the prompt (prompts/fact_check.yaml):

system: |
  You are a fact-checker. Analyze the given content and identify
  claims that can be verified. Assess the overall verifiability.

user: |
  Content to fact-check:
  {content}

  Identify key claims and assess their verifiability.

Step 3: State is auto-generated

State fields are now generated automatically from your YAML graph config. The state_key in your node config determines where output is stored:

# Node output stored in state.fact_check automatically
fact_check:
  type: llm
  prompt: fact_check
  state_key: fact_check  # This creates the state field

Step 4: Add the node to your graph (graphs/yamlgraph.yaml):

nodes:
  generate:
    type: prompt
    prompt: generate
    output_schema:  # Inline schema - no Python model needed!
      title: str
      content: str
    variables:
      topic: topic
    state_key: generated

  fact_check:  # โœจ New node - just YAML!
    type: prompt
    prompt: fact_check
    output_schema:  # Define schema inline
      is_accurate: bool
      issues: list[str]
    requires: [generated]
    variables:
      content: generated.content
    state_key: fact_check

  analyze:
    # ... existing config ...

edges:
  - from: START
    to: generate
  - from: generate
    to: fact_check
    condition:
      type: has_value
      field: generated
  - from: fact_check
    to: analyze
  # ... rest of edges ...

That's it! No Python node code needed. The graph loader dynamically generates the node function.

Resulting pipeline:

graph TD
    A[generate] --> B{has generated?}
    B -->|yes| C[fact_check]
    C --> D[analyze]
    D --> E[summarize]
    E --> F[END]
    B -->|no| F

Adding Conditional Branching

Route to different nodes based on analysis results (all in YAML):

edges:
  - from: analyze
    to: rewrite_node
    condition:
      type: field_equals
      field: analysis.sentiment
      value: negative

  - from: analyze
    to: enhance_node
    condition:
      type: field_equals
      field: analysis.sentiment
      value: positive

  - from: analyze
    to: summarize  # Default fallback

Add a New Prompt

  1. Create prompts/new_prompt.yaml:
system: Your system prompt...
user: Your user prompt with {variables}...
  1. Call it:
result = execute_prompt("new_prompt", variables={"var": "value"})

Add Structured Output

  1. Define model in yamlgraph/models/schemas.py:
class MyOutput(BaseModel):
    field: str = Field(description="...")
  1. Use with executor:
result = execute_prompt("prompt", output_model=MyOutput)

Known Issues & Future Improvements

This project demonstrates solid production patterns with declarative YAML-based configuration.

Completed Features

Feature Status Notes
YAML Graph Configuration โœ… Declarative pipeline definition in graphs/yamlgraph.yaml
Jinja2 Templating โœ… Hybrid auto-detection (simple {var} + advanced Jinja2)
Multi-Provider LLMs โœ… Factory pattern supporting Anthropic/Mistral/OpenAI
Dynamic Node Generation โœ… Nodes compiled from YAML at runtime

Implemented Patterns

Feature Status Notes
Branching/Routing โœ… type: router for LLM-based conditional routing
Self-Correction Loops โœ… Reflexion pattern with critique โ†’ refine cycles
Tool/Agent Patterns โœ… Shell tools + agent nodes with LangChain tool binding
Per-Node Error Handling โœ… on_error: skip/retry/fail/fallback
Conversation Memory โœ… Message accumulation via AgentState.messages
Native Checkpointing โœ… SqliteSaver from langgraph-checkpoint-sqlite
State Export โœ… JSON/Markdown export with export_result()
LangSmith Share Links โœ… Auto-generate public trace URLs after runs

LangGraph Features

Feature Status Notes
Fan-out/Fan-in โœ… type: map with Send() for item-level parallelism
Human-in-the-Loop โœ… type: interrupt nodes with resume_key
Streaming โœ… stream: true on nodes, execute_prompt_streaming()
Sub-graphs โœ… type: subgraph for nested graph composition

Potential Enhancements

Short-term (Quick Wins)

  1. Add in operator to conditions - Support status in ["done", "complete"] expressions
  2. Document agent max_iterations - Expose in YAML schema for agent nodes
  3. Add --dry-run flag - Validate graph without execution

Medium-term (Feature Improvements)

  1. Async map node execution - Use asyncio.gather() for parallel branches
  2. State field collision warnings - Log when YAML fields override base fields
  3. Map node error aggregation - Summary with success/failure counts per branch
  4. Add streaming - --stream CLI flag for real-time output

Long-term (Architecture)

  1. Plugin system - Custom node types via entry points
  2. Hot-reload for development - File watcher for prompt/graph YAML changes
  3. OpenTelemetry integration - Complement LangSmith with standard observability
  4. Sub-graphs - Nested graph composition for complex workflows
  5. Human-in-the-loop - interrupt_before / interrupt_after demonstration

Security

Shell Command Injection Protection

Shell tools (defined in graphs/*.yaml with type: tool) execute commands with variable substitution. All user-provided variable values are sanitized using shlex.quote() to prevent shell injection attacks.

# In graph YAML - command template is trusted
tools:
  git_log:
    type: shell
    command: "git log --author={author} -n {count}"

Security model:

  • โœ… Command templates (from YAML) are trusted configuration
  • โœ… Variable values (from user input/LLM) are escaped with shlex.quote()
  • โœ… Complex types (lists, dicts) are JSON-serialized then quoted
  • โœ… No eval() - condition expressions parsed with regex, not evaluated

Example protection:

# Malicious input is safely escaped
variables = {"author": "$(rm -rf /)"}
# Executed as: git log --author='$(rm -rf /)'  (quoted, harmless)

See yamlgraph/tools/shell.py for implementation details.

โš ๏ธ Security Considerations

Shell tools execute real commands on your system. While variables are sanitized:

  1. Command templates are trusted - Only use shell tools from trusted YAML configs
  2. No sandboxing - Commands run with your user permissions
  3. Agent autonomy - Agent nodes may call tools unpredictably
  4. Review tool definitions - Audit tools: section in graph YAML before running

For production deployments, consider:

  • Running in a container with limited permissions
  • Restricting available tools to read-only operations
  • Implementing approval workflows for sensitive operations

Documentation

  • Reference Documentation - Complete guides for YAMLGraph features, node types, and advanced usage
  • [Analyzing YAML-Driven LangGraph Repositories](docs/Analyzing YAML-Driven LangGraph Repositories.md) - Comprehensive technical analysis of YAMLGraph's architecture, LangGraph fundamentals, and strategic positioning in the AI orchestration landscape

For examples and tutorials, see the examples/ directory.

License

MIT

Remember

Prompts in yaml templates, graphs in yaml, shared executor, pydantic, data stored in sqlite, langgraph, langsmith, venv, tdd red-green-refactor, modules < 400 lines, kiss

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yamlgraph-0.4.1.tar.gz (353.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yamlgraph-0.4.1-py3-none-any.whl (453.4 kB view details)

Uploaded Python 3

File details

Details for the file yamlgraph-0.4.1.tar.gz.

File metadata

  • Download URL: yamlgraph-0.4.1.tar.gz
  • Upload date:
  • Size: 353.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yamlgraph-0.4.1.tar.gz
Algorithm Hash digest
SHA256 2e0998bbd2e4d7bdbb1b70f5be11a15fcf8fdd6d9b4f0fda97c5c1916fc0a2df
MD5 bac765f4c7bddf869408760d7a6a283b
BLAKE2b-256 f9c4c532652e3d3240792098b35aee7fd6d0ef2cee8b9586f75e3512e238eb2e

See more details on using hashes here.

Provenance

The following attestation bundles were made for yamlgraph-0.4.1.tar.gz:

Publisher: workflow.yml on sheikkinen/yamlgraph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yamlgraph-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: yamlgraph-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 453.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yamlgraph-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a2915caddb056fd8d94a7c498a2bc7364f4c85731d22c2e53c84f4ecf4053379
MD5 bb9e0e267c2997ffe1e127f1432ea149
BLAKE2b-256 d87c5bf00384f18e9a1549f8ca71fb995f917ff92c297aa4fd7446d4ca3e1323

See more details on using hashes here.

Provenance

The following attestation bundles were made for yamlgraph-0.4.1-py3-none-any.whl:

Publisher: workflow.yml on sheikkinen/yamlgraph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page