Skip to main content

TDD/OKR-driven agentic workflow framework with verifiable state deliverables

Project description

fsm-agent-flow

Ask DeepWiki PyPI - Version

A TDD/OKR-driven workflow framework for LLM-powered applications. Each state declares an objective and key results that get validated before advancing — like running tests after writing code.

Why

Most LLM workflow frameworks either give you too little structure (raw prompt chains) or too much (rigid agent frameworks). fsm-agent-flow sits in the middle:

  • States have acceptance criteria — key results are checked before moving on
  • Failed states retry with feedback — the validator tells the LLM what went wrong
  • The framework doesn't care what happens inside a state — call an LLM, run a script, bridge to CrewAI, or nest another workflow
  • No global singletons — tools are scoped per state, contexts are explicit
  • No heavy dependencies — zero required runtime deps, bring your own LLM client

Install

pip install fsm-agent-flow

# With LLM adapters
pip install fsm-agent-flow[openai]
pip install fsm-agent-flow[litellm]
pip install fsm-agent-flow[all]

Quick Start

from fsm_agent_flow import Workflow, StateSpec, KeyResult, ExecutionContext
from fsm_agent_flow.llm.openai import OpenAIAdapter

# Tools are just functions
def search(query: str) -> str:
    """Search the web."""
    return f"Results for: {query}"

# States declare what they must accomplish
research = StateSpec(
    name="research",
    objective="Gather information on the topic",
    key_results=[
        KeyResult("has_content", "At least 200 chars", check=lambda o: len(str(o)) > 200),
        KeyResult("has_sources", "Cites sources"),  # LLM-validated (no check function)
    ],
    execute=lambda ctx: ctx.llm.run_with_tools(
        system_prompt="Research the topic using the search tool.",
        user_message=ctx.input,
    ),
    tools=[search],
    max_retries=2,
    is_initial=True,
)

writing = StateSpec(
    name="writing",
    objective="Write a structured report",
    key_results=[
        KeyResult("has_sections", "Has clear sections", check=lambda o: str(o).count("#") >= 2),
    ],
    execute=lambda ctx: ctx.llm.run_with_tools(
        system_prompt="Write a report from this research.",
        user_message=str(ctx.input),
    ),
    is_final=True,
)

# One call to run the whole workflow
llm = OpenAIAdapter(model="gpt-4o")
wf = Workflow(
    objective="Research and report",
    states=[research, writing],
    transitions={"research": "writing"},
    llm=llm,
    validator_llm=llm,
)
result = wf.run("quantum computing")

Core Concepts

States with Objectives and Key Results

Every state has an objective (what it does) and key results (how we verify it succeeded):

StateSpec(
    name="analyze",
    objective="Analyze the dataset and identify trends",
    key_results=[
        # Programmatic check — runs as code
        KeyResult("has_trends", "Identified at least 3 trends",
                  check=lambda o: len(o.get("trends", [])) >= 3),
        # LLM-validated — no check function, validator LLM evaluates
        KeyResult("actionable", "Insights are actionable with recommendations"),
    ],
    execute=my_analyze_function,
    max_retries=3,
)

The TDD Validation Loop

When a state executes, the framework:

  1. Calls state.execute(ctx) to produce output
  2. Runs all key result checks (programmatic first, then LLM)
  3. If any fail: retries with ctx.feedback explaining what went wrong
  4. If all pass: records the output and advances to the next state
  5. If retries exhausted: raises MaxRetriesExceeded

Conditional & Bidirectional Transitions

Transitions aren't limited to simple linear flows. States can branch, loop back, and route conditionally — like a real finite state machine:

# Static (linear): always goes to the same next state
transitions = {"research": "writing", "writing": None}

# Conditional (branching / bidirectional): route based on output
transitions = {
    "check_city": {"need_weather": "get_weather", "ready": "print_result"},
    "get_weather": {"wrong_city": "get_weather", "default": "check_city"},
    "print_result": None,
}

# Dynamic (callable): function decides next state
transitions = {
    "decide": lambda output: "approve" if output.get("score") > 0.8 else "reject",
    "approve": None,
    "reject": None,
}

How conditional routing works: When a transition is a dict, the framework resolves the next state by checking the execute function's output:

  1. If output is a dict with a "_transition" key, its value selects the branch
  2. If output is a str matching a key in the transition dict, use it
  3. Otherwise, fall back to the "default" key
def check_city(ctx: ExecutionContext):
    weather = ctx.shared.get("weather")
    if weather and weather["city"] == ctx.shared.get("target_city"):
        return {"_transition": "ready", "report": weather}
    return {"_transition": "need_weather"}

def get_weather(ctx: ExecutionContext):
    city = ctx.shared.get("target_city")
    data = fetch_weather_api(city)
    if data["city"] != city:
        return {"_transition": "wrong_city"}  # Loop back to retry
    ctx.shared.set("weather", data)
    return {"_transition": "default"}  # Return to check_city

This enables bidirectional flows (state A calls state B, B returns to A), retry loops, and decision branching — all without leaving the FSM model.

Tools Are Scoped Per State

No global registry. Each state declares its own tools:

research_state = StateSpec(
    name="research",
    tools=[search_web, fetch_paper],  # Only available in this state
    ...
)
writing_state = StateSpec(
    name="writing",
    tools=[save_draft],  # Different tools here
    ...
)

Tools are plain Python functions. The framework auto-generates JSON Schema signatures (OpenAI/Anthropic compatible) from type hints:

def search_web(query: str, max_results: int = 10) -> str:
    """Search the web for information."""
    ...

Shared Context

States share data through SharedContext (explicit key-value store, not a flat blob):

def step_one(ctx: ExecutionContext):
    ctx.shared.set("findings", ["a", "b", "c"])
    return "done"

def step_two(ctx: ExecutionContext):
    findings = ctx.shared.get("findings", [])
    return f"Processing {len(findings)} findings"

Execute Functions

A state's execute function receives an ExecutionContext with everything it needs:

def my_state(ctx: ExecutionContext):
    ctx.input       # Output from previous state
    ctx.shared      # SharedContext (read/write)
    ctx.history     # Previous states' outputs (read-only)
    ctx.llm         # BoundLLM with this state's tools
    ctx.retry_count # Current retry attempt
    ctx.feedback    # Validator feedback from last failed attempt

Inside execute, you can do anything:

# Option A: Use the BoundLLM tool-calling loop
result = ctx.llm.run_with_tools(system_prompt="...", user_message="...")

# Option B: Call the LLM directly (no tool loop)
response = ctx.llm.chat([Message(role="user", content="...")])

# Option C: Bridge to an external agent framework
from crewai import Agent
result = Agent(...).run(ctx.input)

# Option D: Run arbitrary code
result = my_analysis_pipeline(ctx.input)

# Option E: Nest another workflow
inner_wf = Workflow(...)
result = inner_wf.run(ctx.input)

Built-in OODA Agent

For "LLM + tools" without wiring your own agent loop, use the built-in OODA agent:

from fsm_agent_flow import run_ooda

def investigate(ctx: ExecutionContext):
    return run_ooda(ctx, task=f"Investigate: {ctx.input}",
                    tools=[search, analyze], max_cycles=3)

The OODA agent is itself a nested Workflow with 4 states (Observe, Orient, Decide, Act), dogfooding the framework.

Validators

Three options for validation:

# 1. RuleValidator (default) — only runs programmatic checks
from fsm_agent_flow import RuleValidator
wf = Workflow(..., validator=RuleValidator())

# 2. LLMValidator — runs checks + asks LLM for KRs without check functions
from fsm_agent_flow import LLMValidator
wf = Workflow(..., validator=LLMValidator(llm))

# 3. Shorthand — pass validator_llm to auto-create LLMValidator
wf = Workflow(..., validator_llm=cheap_llm)

# 4. Custom — implement the Validator protocol
class MyValidator:
    def validate(self, state, output, context) -> ValidationResult:
        ...

LLM Adapters

The framework ships with OpenAI and LiteLLM adapters:

from fsm_agent_flow.llm.openai import OpenAIAdapter
from fsm_agent_flow.llm.litellm import LiteLLMAdapter

# OpenAI (or any OpenAI-compatible API)
llm = OpenAIAdapter(model="gpt-4o", api_key="sk-...")
llm = OpenAIAdapter(model="deepseek/deepseek-r1", base_url="https://openrouter.ai/api/v1")

# LiteLLM (any provider)
llm = LiteLLMAdapter(model="anthropic/claude-sonnet-4-20250514")

Build your own by implementing the LLMAdapter protocol — see docs/claude/rules/adapters.md or ask Claude Code.

Persistence

WorkflowContext is serializable for save/resume:

# Save
data = wf.context.to_dict()
json.dump(data, open("checkpoint.json", "w"))

# Resume
data = json.load(open("checkpoint.json"))
ctx = WorkflowContext.from_dict(data)

Examples

See examples/ for complete working examples:

  • research_workflow.py — Research + writing with tool calling and TDD validation
  • ooda_example.py — Using the built-in OODA agent inside workflow states

Claude Code Integration

This repo includes a CLAUDE.md and docs/claude/rules/ that teach Claude Code the framework's architecture. When you open this project in Claude Code, it automatically understands how to:

  • Define workflows with states, transitions, and key results
  • Build custom LLM adapters
  • Write validation logic
  • Use the OODA agent
  • Debug common issues

Using with Claude Code in your own project

If you're using fsm-agent-flow as a dependency in your own project, add the following to your project's CLAUDE.md so Claude Code understands the framework:

# fsm-agent-flow

TDD/OKR-driven agentic workflow framework. See the reference docs:

@https://raw.githubusercontent.com/NewJerseyStyle/FSM-agent-flow/main/CLAUDE.md
@https://NewJerseyStyle.github.io/FSM-agent-flow/claude/rules/adapters.md
@https://NewJerseyStyle.github.io/FSM-agent-flow/claude/rules/workflows.md
@https://NewJerseyStyle.github.io/FSM-agent-flow/claude/rules/validation.md
@https://NewJerseyStyle.github.io/FSM-agent-flow/claude/rules/tools.md</pre>

This gives Claude Code full knowledge of the framework's API, patterns, and conventions when working on your codebase.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fsm_agent_flow-0.3.2.tar.gz (36.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fsm_agent_flow-0.3.2-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file fsm_agent_flow-0.3.2.tar.gz.

File metadata

  • Download URL: fsm_agent_flow-0.3.2.tar.gz
  • Upload date:
  • Size: 36.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fsm_agent_flow-0.3.2.tar.gz
Algorithm Hash digest
SHA256 75564f86235bed42c8468181577b7dd1fcb3734255fbdfa8ae67730d03042dde
MD5 b421ea28fb461f0308f6cd42220e42dc
BLAKE2b-256 63ec1390ed433ac65692132c1ceb571ad4fcbd28ab73229f4883fd4a4db4f83f

See more details on using hashes here.

Provenance

The following attestation bundles were made for fsm_agent_flow-0.3.2.tar.gz:

Publisher: python-publish.yml on NewJerseyStyle/FSM-agent-flow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fsm_agent_flow-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: fsm_agent_flow-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fsm_agent_flow-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9538f0650cd7debe1a9acb85de2d88f7f2aa392303eb695ba1376096217312c3
MD5 148b96b8ef0e15aed9d46da502c03a8d
BLAKE2b-256 071a5c8dfaf4aeb1d8e562989a57c233828d35eda100f3442ab20d2ff2357684

See more details on using hashes here.

Provenance

The following attestation bundles were made for fsm_agent_flow-0.3.2-py3-none-any.whl:

Publisher: python-publish.yml on NewJerseyStyle/FSM-agent-flow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page