Skip to main content

Orchemist — a scenario-driven orchestration engine for multi-agent AI pipelines

Project description

Orchemist

Scenario-driven orchestration for multi-agent AI pipelines.

CI PyPI License: MIT Python 3.10+


Why Orchemist?

AI agents hallucinate, drift off-topic across long chains, and produce work that looks correct until you test it. Orchemist solves this with three architectural principles:

  • Ground-truth anchoring. Every phase prompt is anchored to the original issue body. Agents cannot invent features that aren't in the spec.
  • Adversarial quality gates. Spec adversary, acceptance-test adversary, and code review phases catch weaknesses BEFORE they become implementation bugs.
  • Acceptance-test-driven development. Tests are written before implementation by a separate agent. The implementing agent must pass them — it cannot modify or bypass them.

300+ pipeline runs across content, coding, research, and compliance workflows have validated this approach.


What Is It?

Orchemist is a platform with four components:

  1. Orchestration Engine (this repo) — a Python engine that sequences multi-phase AI pipelines from YAML templates. Handles phase transitions, retries, tool execution, cost tracking, and adversarial review loops.
  2. Pipeline Templates (templates/) — YAML definitions for coding, content, research, and compliance pipelines. The coding pipeline (coding-pipeline-standard) runs 11 phases: spec, behavioral contracts, adversary review, acceptance tests, implementation, test execution, code review, fixes, and final verification.
  3. Skills Pack (ToscanAI/orchemist-skills) — the coding pipeline repackaged as Claude Code .claude/skills/ and .claude/agents/. Pure markdown, no Python runtime. Drop it into Claude Code and /orchemist:run walks the same 11-phase loop. Best on-ramp if you already use Claude Code.
  4. Orchemist IDE (ToscanAI/orchemist-ide) — a VS Code fork with a purpose-built Pipeline Explorer, live log streaming, and template editor for managing pipelines visually.

You declare your pipeline in a single YAML file. The engine handles phase sequencing, dependency resolution, output forwarding, automatic retries, tool-calling, and scenario grading.

Execution modes

Mode Backend Status
openclaw Claude sub-agents via OpenClaw gateway Battle-tested (300+ runs)
openrouter Any model via OpenRouter (Anthropic, OpenAI, Google, etc.) New — first successful end-to-end coding pipeline run 2026-04-17
standalone Direct Anthropic API Available for simple pipelines
dry-run Mock execution for template validation Stable

Note: The YAML below is simplified for illustration. Orchemist's template format is evolving to support an expanding range of workloads — from content pipelines to coding, research, compliance, and beyond. See Template Authoring for the full schema and working examples.

name: content-pipeline
phases:
  research:
    prompt: "Research the topic: {{brief}}"
    model_tier: haiku

  draft:
    prompt: "Write a 500-word article based on: {{research.output}}"
    model_tier: sonnet
    depends_on: [research]

  edit:
    prompt: "Polish and improve this draft: {{draft.output}}"
    model_tier: sonnet
    depends_on: [draft]

Quickstart

pip install orchemist
orch new --yes --output templates/my-pipeline.yaml
orch run templates/my-pipeline.yaml --mode dry-run

No API key needed for a dry run:

orch run templates/my-pipeline.yaml --mode dry-run --input '{"brief": "AI safety"}'

Live run against Claude:

export ANTHROPIC_API_KEY="sk-ant-..."
orch run templates/my-pipeline.yaml --mode standalone --input '{"brief": "AI safety"}'

Using OpenRouter (multi-provider)

To route through OpenRouter instead of calling Anthropic directly, export an OpenRouter key and select --mode openrouter:

export OPENROUTER_API_KEY="sk-or-v1-..."
orch run templates/my-pipeline.yaml --mode openrouter --input '{"brief": "AI safety"}'

orch serve captures environment variables at startup — export the key before launching the server, and restart it if you rotate the key. See docs/openrouter-setup.md for the full configuration reference, CLI-flag precedence, and troubleshooting.


Use Cases

Use Case What it does
Content Pipeline Research → Draft → Edit → SEO → Publish-ready output
Code Review Static analysis → Security scan → Architecture review → Summary report
Coding Pipeline Feature development → Unit tests → Integration tests → Deployment
Research Assistant Query expansion → Source gathering → Synthesis → Citation check
Translation Pipeline Translate → Back-translate → Consistency check → Final polish
Customer Support Intent classification → KB lookup → Response draft → Quality gate
Financial Analysis Data extraction → Trend analysis → Risk assessment → Executive summary

Each use case is a template. Browse them with orch templates list or search with orch templates search <topic>.


Features

  • YAML-first pipeline definitions — version-controlled, diff-friendly, no code required
  • Phase sequencing with dependency graphs — topological sort, parallel wave execution
  • Model tier selection per phase — haiku / sonnet / opus, set per phase or pipeline-wide
  • skill_refs injection — pass tool contexts into prompts declaratively
  • Fallback executors — Gemini fallback when Anthropic is unavailable
  • Template index & search — community index, install by GitHub shorthand user/repo
  • Scenario-based grading — YAML acceptance criteria, LLM judges, assertion graders
  • Human-in-the-loop — pause phases for review, inject feedback, resume
  • OpenClaw integration — run phases as sub-agents with full tool access
  • Local web UI — browse templates, start runs, and watch live progress in your browser (orch serve)

How It Compares

Feature Orchemist LangGraph CrewAI Autogen Dify
YAML-first Partial
Visual builder 🔜 ⚠️
Template library
Testing / grading ⚠️
Raspberry Pi support ⚠️ ⚠️ ⚠️

✅ full support · ⚠️ partial/unofficial · ❌ not supported · 🔜 planned


Architecture

flowchart TD
    subgraph Templates["YAML Templates"]
        T1["content-pipeline.yaml"]
        T2["code-review.yaml"]
        T3["…"]
    end

    Templates -- "orch run" --> Runner

    subgraph Runner["Pipeline Runner"]
        TE["Template Engine\n(YAML parse, var interp)"] --> PS["Phase Sequencer\n(topo sort, output forward,\nretry logic)"]
        PS --> EX["Executors\n(Anthropic · OpenClaw\nGemini · Dry-Run)"]
    end

    Runner --> SR

    subgraph SR["Scenario Runner (optional acceptance testing)"]
        AG["Assertion Grader"] ~~~ LJ["LLM Judge"] ~~~ UC["URL Check"]
    end

Three execution modes:

Mode How it runs API key? Start here?
standalone Direct Anthropic API (zero framework deps) Yes — ANTHROPIC_API_KEY ✅ Yes — simplest setup
dry-run Mock executor for testing / CI No ✅ Yes — no credentials needed
openclaw Sub-agent spawning via OpenClaw gateway No (uses gateway token) Requires separate OpenClaw setup

New here? Start with --mode standalone (bring your own API key) or --mode dry-run (no credentials). OpenClaw mode is for production deployments with the OpenClaw gateway.


CLI Reference

# Create a new pipeline template (interactive wizard)
orch new

# Non-interactive scaffold with defaults
orch new --yes --output ./templates/my-pipeline.yaml

# Clone an existing template as a starting point
orch new --from content-pipeline

# Interactive wizard (config_schema-driven)
orch start

# Copy a starter pipeline to your project in one command
orch quickstart

# Execute a pipeline
orch run <template-or-file> --mode standalone --input '{"brief": "..."}'
orch run <template-or-file> --mode dry-run
orch run <template-or-file> --mode openclaw

# Validate a template (checks YAML syntax + structural rules)
orch validate <template-or-file>
orch validate <template-or-file> --fix    # auto-correct simple issues

# Show execution order and model tiers
orch list-phases <template-or-file>

# Browse templates
orch templates list
orch templates info <name>
orch templates search <query>

# Install / remove templates
orch templates install user/repo          # GitHub shorthand
orch templates install https://github.com/user/repo
orch templates install ./my-template.yaml --name my-pipeline
orch templates uninstall <name>

# Task queue (for async / long-running workflows)
orch submit --type <type> --payload '{"key": "value"}'
orch status [task-id]
orch list [--state running] [--type llm_call]
orch cancel <task-id>
orch retry <task-id>
orch watch <task-id> --follow
orch health

Installation

Three install paths, ordered by setup cost. Most readers want Path A if they already use Claude Code.

Path A — Claude Code Skills Pack (no Python, ~1 minute)

If you already have Claude Code, the simplest way to try the Orchemist coding pipeline is the Skills Pack — pure markdown, no engine, no server, BYO model:

git clone https://github.com/ToscanAI/orchemist-skills.git
cd orchemist-skills
./install.sh                 # drops skills into ~/.claude/skills/ and ~/.claude/agents/
claude                       # then: /orchemist:run examples/example-issue.md

Trades off: no web UI, no queue, no daemon, no multi-provider model selection. Use the engine (Path B / C) for those.

Path B — From PyPI (engine + CLI, ~5 minutes)

pip install orchemist
orch --help
orch serve                   # local web UI at http://localhost:8000

Path C — From Source (development, ~10 minutes)

git clone https://github.com/ToscanAI/orchemist.git
cd orchemist
python3 -m venv .venv && source .venv/bin/activate
pip install .
orch --help

Relationship to OpenClaw

Layer Provides Think of it as…
OpenClaw Sub-agent spawning, tool access, model switching Operating System
Orchemist Pipeline templates, phase sequencing, quality gates Application Framework
Scenario Runner Outcome-based testing, LLM judges, grading Test Framework

The engine works standalone (direct API) or with OpenClaw (sub-agent spawning). No vendor lock-in on the model provider side.


Git as Runtime Dependency

Orchemist uses git commits as the source of truth for multi-phase pipeline handoff. Each pipeline phase commits its output, and the next phase reads from a specific commit hash — making stale reads structurally impossible and providing a full audit trail of every pipeline run.

What this means in practice:

  • Pipeline execution (coding pipelines, spec loops) requires a git repository
  • Dry-run mode does NOT require git — works anywhere
  • Standalone mode with simple linear pipelines works without git
  • Git-based features: commit-based phase handoff, diff-based adversary review, immutable test references

This is a deliberate architectural choice: git provides immutability, diffing, and audit trails that would otherwise require custom infrastructure. The trade-off is that production pipeline execution is coupled to git — which is already a prerequisite for coding pipelines (branch creation, commits, PRs).

Future: Pluggable VCS backends are on the roadmap but not prioritized for initial release. If you need non-git support, open an issue.


Current Status

Orchemist is in active development (alpha). What works, what doesn't:

Area Status
Coding pipeline (openclaw mode) Battle-tested across 300+ runs on content, coding, research, compliance pipelines
Coding pipeline (openrouter mode) First successful end-to-end run 2026-04-17 — produced correct code with 32/32 acceptance tests passing
Pipeline templates coding-pipeline-standard (11 phases) and coding-pipeline-skip-spec stable; content and research templates available
Web UI Functional for template browsing, run monitoring, and pipeline launching
Orchemist IDE Phases 2-5 complete (shell scaffold, pipeline explorer, live log streaming, template editor); schema-driven launch form in progress
OpenRouter tool calling Shipped 2026-04-16 (PR #795) — 6 tools with path sandbox, retry, JSONL logging
Cost optimization In progress — command/acceptance_run phases being routed away from LLM (#798)
Documentation Partial — OpenRouter setup guide, getting started, template authoring available; end-to-end tutorial pending
Community templates Not yet — contributions welcome

Known limitations:

  • Spec adversary occasionally doesn't comply with first-line verdict format (pipeline compensates via retry loops at cost of extra rounds)
  • Cost reporting fallback overestimates by ~3x when OpenRouter doesn't return usage.total_cost (#801)
  • Bash tool sandbox is a UX guardrail, not a security boundary — production deployments should use OS-level isolation (firejail, containers)

Contributing

Pull requests are welcome! Here's how to get started:

git clone https://github.com/ToscanAI/orchemist.git
cd orchemist
pip install -e ".[dev]"
pytest

Areas where contributions are especially welcome:

  • 📦 Community templates — add a YAML template to templates/ and submit a PR
  • 🧪 Scenarios & rubrics — improve grading quality for existing templates
  • 🔌 Executors — add support for new model providers (Gemini, Mistral, local models)
  • 📖 Documentation — improve examples, add tutorials, translate docs

Filing issues: Use the right template for the right pipeline:

Issue type Template Pipeline
Bug Bug Report coding-pipeline-v1
Feature / code change Feature Request coding-pipeline-v1
Documentation Documentation Request docs-pipeline-v1
Article / blog post Content Request content-pipeline
Research / analysis Research Request research-competitive

Please read CONTRIBUTING.md for code style and PR guidelines.


Documentation


License

MIT © Conny Lazo & Toscan

See LICENSE for the full text.


Orchemist — Tests passing. 3 execution modes. Git-native pipeline handoff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orchemist-0.10.0.tar.gz (6.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orchemist-0.10.0-py3-none-any.whl (553.5 kB view details)

Uploaded Python 3

File details

Details for the file orchemist-0.10.0.tar.gz.

File metadata

  • Download URL: orchemist-0.10.0.tar.gz
  • Upload date:
  • Size: 6.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for orchemist-0.10.0.tar.gz
Algorithm Hash digest
SHA256 4c4bcd4714444b02fa0aab9ed76bf231949661526c71c5aee02972ed39267846
MD5 d39e1c0bebb80a3c8dea43ad5992f15e
BLAKE2b-256 6b6da98974ed1799a005960d962f0f385a69946001085305471d0829b9d42369

See more details on using hashes here.

Provenance

The following attestation bundles were made for orchemist-0.10.0.tar.gz:

Publisher: publish.yaml on ToscanAI/orchemist

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file orchemist-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: orchemist-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 553.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for orchemist-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 84f495d3c31562d318d5d8fd39fa95151f5eb25af4fca5430b3ff5fdc38fd190
MD5 bb6a201b736c66f99ae5c7e10a71bae2
BLAKE2b-256 1ca0c2c8d77c37cfeb09f57ad8a52f13cea0d282c95e12c6fbad759e253cdaa5

See more details on using hashes here.

Provenance

The following attestation bundles were made for orchemist-0.10.0-py3-none-any.whl:

Publisher: publish.yaml on ToscanAI/orchemist

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page