Orchemist — a scenario-driven orchestration engine for multi-agent AI pipelines

These details have not been verified by PyPI

Project links

Project description

Orchemist

Scenario-driven orchestration for multi-agent AI pipelines.

Why Orchemist?

AI agents hallucinate, drift off-topic across long chains, and produce work that looks correct until you test it. Orchemist solves this with three architectural principles:

Ground-truth anchoring. Every phase prompt is anchored to the original issue body. Agents cannot invent features that aren't in the spec.
Adversarial quality gates. Spec adversary, acceptance-test adversary, and code review phases catch weaknesses BEFORE they become implementation bugs.
Acceptance-test-driven development. Tests are written before implementation by a separate agent. The implementing agent must pass them — it cannot modify or bypass them.

300+ internal pipeline runs (dogfood + integration testing) across content, coding, research, and compliance workflows have stress-tested this approach.

What Is It?

Orchemist is a platform with four components:

Orchestration Engine (this repo) — a Python engine that sequences multi-phase AI pipelines from YAML templates. Handles phase transitions, retries, tool execution, cost tracking, and adversarial review loops.
Pipeline Templates (templates/) — YAML definitions for coding, content, research, and compliance pipelines. The coding pipeline (coding-pipeline-standard) runs 11 phases: spec, behavioral contracts, adversary review, acceptance tests, implementation, test execution, code review, fixes, and final verification.
Skills Pack (ToscanAI/orchemist-skills) — the coding pipeline repackaged as Claude Code .claude/skills/ and .claude/agents/. Pure markdown, no Python runtime. Drop it into Claude Code and /orchemist:run walks the same 11-phase loop. Best on-ramp if you already use Claude Code.
Harness (frontend/ in this repo) — the operator web surface. Six cross-linked screens covering fleet status, run cockpit, the cross-model adversary dialogue visualizer, trust & gates, admin / activation, and skills-pack mode. Ships with the engine; launches via orch serve. See The Harness below for screenshots.

The legacy ToscanAI/orchemist-ide VS Code fork is being sunset in favour of the harness. See issue #814 for the deprecation plan.

You declare your pipeline in a single YAML file. The engine handles phase sequencing, dependency resolution, output forwarding, automatic retries, tool-calling, and scenario grading.

Execution modes

Mode	Backend	Status
openrouter	Any model via OpenRouter (Anthropic, OpenAI, Google, etc.)	Primary — production path since 2026-04-17
openclaw	Claude sub-agents via OpenClaw gateway	Deprecated — gateway no longer active; historical use only
standalone	Direct Anthropic API	Available for simple pipelines
dry-run	Mock execution for template validation	Stable

Note: The YAML below is simplified for illustration. Orchemist's template format is evolving to support an expanding range of workloads — from content pipelines to coding, research, compliance, and beyond. See Template Authoring for the full schema and working examples.

name: content-pipeline
phases:
  research:
    prompt: "Research the topic: {{brief}}"
    model_tier: haiku

  draft:
    prompt: "Write a 500-word article based on: {{research.output}}"
    model_tier: sonnet
    depends_on: [research]

  edit:
    prompt: "Polish and improve this draft: {{draft.output}}"
    model_tier: sonnet
    depends_on: [draft]

Quickstart

pip install orchemist
orch new --yes --output templates/my-pipeline.yaml
orch run templates/my-pipeline.yaml --mode dry-run

No API key needed for a dry run:

orch run templates/my-pipeline.yaml --mode dry-run --input '{"brief": "AI safety"}'

Live run against Claude:

export ANTHROPIC_API_KEY="sk-ant-..."
orch run templates/my-pipeline.yaml --mode standalone --input '{"brief": "AI safety"}'

Using OpenRouter (multi-provider)

To route through OpenRouter instead of calling Anthropic directly, export an OpenRouter key and select --mode openrouter:

export OPENROUTER_API_KEY="sk-or-v1-..."
orch run templates/my-pipeline.yaml --mode openrouter --input '{"brief": "AI safety"}'

orch serve captures environment variables at startup — export the key before launching the server, and restart it if you rotate the key. See docs/openrouter-setup.md for the full configuration reference, CLI-flag precedence, and troubleshooting.

The Harness

orch serve opens the Orchemist Harness — a six-screen operator surface designed against the canonical mockups in docs/harness-redesign-2026-05-24/screens/. Live screenshots against a real orch serve are checked in under docs/harness-redesign-2026-05-24/screenshots/live/.

Engine required (v1, #888). The harness REQUIRES a reachable engine — orch serve must be running before pages will render content. The static build still ships, but without an engine the harness short-circuits to an "Engine unreachable at <url>" error UI (with a retry button and a docs link) on every screen. There is no demo-data fallback — that v0 ergonomic was misleading because a misconfigured port looked healthy. See issue #888 for the rationale and breaking-change context.

1. Fleet Dashboard

Multi-repo status, in-flight pipeline runs, autonomy-level ramp, regression queue, and stale-detection findings — all at a glance.

Fleet Dashboard

2. Run Cockpit

Live phase-by-phase progress for a single pipeline run, with the Phase 0 existing-symbols inventory, sub-check 7d verdict chips (CONSUME / EXTEND / DIVERGENT / NEW-OK), the live tool-call stream, sealed artifact list, and a Jaccard drift indicator.

Run Cockpit

3. Adversary Loop visualizer

The marquee screen — the cross-model drafter ↔ reviewer dialogue. Each round shows the proposed spec, the reviewer's verdict, and the Jaccard convergence metric. This is the trust-engine wedge in operation: a Claude drafter critiqued by a Gemini reviewer (or any other model family) at the spec→implementation boundary, with full per-turn cost ledger.

Adversary Loop

4. Trust & Gates

Operator queue for pipeline runs awaiting a human decision, with per-row Approve / Reject calls into /api/v1/gates/{run_id}/(approve|reject). Side panel shows per-(repo, template, task) trust profile and a 14-day calibration curve.

Trust & Gates

5. Admin / Activation

The control surface: autonomy ramp (Level 3 → 5), execution-mode toggles (openrouter / standalone / openclaw / dry-run — the harness is model-agnostic by design), branch-protection audit, kill switches, webhook trigger CRUD, and feature flags for the v4.2 pipeline (phase0_hard_gate, EXTEND verdict, dialogue phase, cross-repo orchestration).

Admin / Activation

6. Skills Pack Mode

Mirror of the local Claude Code Skills Pack installation. Shows phase skills, pipeline YAMLs, local run history, and the /orchemist:run invocation preview. Lets operators "promote to remote" — replay a local run via the engine for trust calibration.

Skills Pack Mode

Try it. pip install orchemist[web] && orch serve opens the harness at http://127.0.0.1:8374. With the engine running, every screen consumes real /api/v1/* data; without it, the same screens still render with demo data and a clear "engine offline" banner so the UI is reviewable end-to-end. A Playwright e2e suite (frontend/tests-e2e/) keeps both the offline-mock screenshots and the live-engine screenshots in sync with the SVG canon on every PR.

Use Cases

Use Case	What it does
Content Pipeline	Research → Draft → Edit → SEO → Publish-ready output
Code Review	Static analysis → Security scan → Architecture review → Summary report
Coding Pipeline	Feature development → Unit tests → Integration tests → Deployment
Research Assistant	Query expansion → Source gathering → Synthesis → Citation check
Translation Pipeline	Translate → Back-translate → Consistency check → Final polish
Customer Support	Intent classification → KB lookup → Response draft → Quality gate
Financial Analysis	Data extraction → Trend analysis → Risk assessment → Executive summary

Each use case is a template. Browse them with orch templates list or search with orch templates search <topic>.

Features

✅ YAML-first pipeline definitions — version-controlled, diff-friendly, no code required
✅ Phase sequencing with dependency graphs — topological sort, parallel wave execution
✅ Model tier selection per phase — haiku / sonnet / opus, set per phase or pipeline-wide
✅ skill_refs injection — pass tool contexts into prompts declaratively
✅ Fallback executors — Gemini fallback when Anthropic is unavailable
✅ Template index & search — community index, install by GitHub shorthand user/repo
✅ Scenario-based grading — YAML acceptance criteria, LLM judges, assertion graders
✅ Human-in-the-loop — pause phases for review, inject feedback, resume
✅ OpenClaw integration — run phases as sub-agents with full tool access
✅ Local web UI — browse templates, start runs, and watch live progress in your browser (orch serve)

How It Compares

Feature	Orchemist	LangGraph	CrewAI	Autogen	Dify
YAML-first	✅	❌	❌	❌	Partial
Visual builder	🔜	⚠️	❌	❌	✅
Template library	✅	❌	❌	❌	✅
Testing / grading	✅	❌	⚠️	❌	❌
Raspberry Pi support	✅	⚠️	⚠️	⚠️	❌

✅ full support · ⚠️ partial/unofficial · ❌ not supported · 🔜 planned

Architecture

flowchart TD
    subgraph Templates["YAML Templates"]
        T1["content-pipeline.yaml"]
        T2["code-review.yaml"]
        T3["…"]
    end

    Templates -- "orch run" --> Runner

    subgraph Runner["Pipeline Runner"]
        TE["Template Engine\n(YAML parse, var interp)"] --> PS["Phase Sequencer\n(topo sort, output forward,\nretry logic)"]
        PS --> EX["Executors\n(Anthropic · OpenClaw\nGemini · Dry-Run)"]
    end

    Runner --> SR

    subgraph SR["Scenario Runner (optional acceptance testing)"]
        AG["Assertion Grader"] ~~~ LJ["LLM Judge"] ~~~ UC["URL Check"]
    end

Three execution modes:

Mode	How it runs	API key?	Start here?
`standalone`	Direct Anthropic API (zero framework deps)	Yes — `ANTHROPIC_API_KEY`	✅ Yes — simplest setup
`dry-run`	Mock executor for testing / CI	No	✅ Yes — no credentials needed
`openclaw`	Sub-agent spawning via OpenClaw gateway	No (uses gateway token)	Requires separate OpenClaw setup

New here? Start with --mode standalone (bring your own API key) or --mode dry-run (no credentials). OpenClaw mode is for production deployments with the OpenClaw gateway.

CLI Reference

# Create a new pipeline template (interactive wizard)
orch new

# Non-interactive scaffold with defaults
orch new --yes --output ./templates/my-pipeline.yaml

# Clone an existing template as a starting point
orch new --from content-pipeline

# Interactive wizard (config_schema-driven)
orch start

# Copy a starter pipeline to your project in one command
orch quickstart

# Execute a pipeline
orch run <template-or-file> --mode standalone --input '{"brief": "..."}'
orch run <template-or-file> --mode dry-run
orch run <template-or-file> --mode openclaw

# Validate a template (checks YAML syntax + structural rules)
orch validate <template-or-file>
orch validate <template-or-file> --fix    # auto-correct simple issues

# Show execution order and model tiers
orch list-phases <template-or-file>

# Browse templates
orch templates list
orch templates info <name>
orch templates search <query>

# Install / remove templates
orch templates install user/repo          # GitHub shorthand
orch templates install https://github.com/user/repo
orch templates install ./my-template.yaml --name my-pipeline
orch templates uninstall <name>

# Task queue (for async / long-running workflows)
orch submit --type <type> --payload '{"key": "value"}'
orch status [task-id]
orch list [--state running] [--type llm_call]
orch cancel <task-id>
orch retry <task-id>
orch watch <task-id> --follow
orch health

Installation

Three install paths, ordered by setup cost. Most readers want Path A if they already use Claude Code.

Path A — Claude Code Skills Pack (no Python, ~1 minute)

If you already have Claude Code, the simplest way to try the Orchemist coding pipeline is the Skills Pack — pure markdown, no engine, no server, BYO model:

git clone https://github.com/ToscanAI/orchemist-skills.git
cd orchemist-skills
./install.sh                 # drops skills into ~/.claude/skills/ and ~/.claude/agents/
claude                       # then: /orchemist:run examples/example-issue.md

Trades off: no web UI, no queue, no daemon, no multi-provider model selection. Use the engine (Path B / C) for those.

Path B — From PyPI (engine + CLI, ~5 minutes)

pip install orchemist
orch --help
orch serve                   # local web UI at http://localhost:8000

Path C — From Source (development, ~10 minutes)

git clone https://github.com/ToscanAI/orchemist.git
cd orchemist
python3 -m venv .venv && source .venv/bin/activate
pip install .
orch --help

Relationship to OpenClaw

Layer	Provides	Think of it as…
OpenClaw	Sub-agent spawning, tool access, model switching	Operating System
Orchemist	Pipeline templates, phase sequencing, quality gates	Application Framework
Scenario Runner	Outcome-based testing, LLM judges, grading	Test Framework

The engine works standalone (direct API) or with OpenClaw (sub-agent spawning). No vendor lock-in on the model provider side.

Git as Runtime Dependency

Orchemist uses git commits as the source of truth for multi-phase pipeline handoff. Each pipeline phase commits its output, and the next phase reads from a specific commit hash — making stale reads structurally impossible and providing a full audit trail of every pipeline run.

What this means in practice:

Pipeline execution (coding pipelines, spec loops) requires a git repository
Dry-run mode does NOT require git — works anywhere
Standalone mode with simple linear pipelines works without git
Git-based features: commit-based phase handoff, diff-based adversary review, immutable test references

This is a deliberate architectural choice: git provides immutability, diffing, and audit trails that would otherwise require custom infrastructure. The trade-off is that production pipeline execution is coupled to git — which is already a prerequisite for coding pipelines (branch creation, commits, PRs).

Future: Pluggable VCS backends are on the roadmap but not prioritized for initial release. If you need non-git support, open an issue.

Current Status

Orchemist is in active development (alpha). What works, what doesn't:

Area	Status
Coding pipeline (openrouter mode)	Primary path — first successful end-to-end run 2026-04-17, 32/32 acceptance tests passing
Coding pipeline (openclaw mode)	Deprecated — 300+ internal dogfood runs before gateway sunset; kept for historical reference
Pipeline templates	`coding-pipeline-standard` (11 phases) and `coding-pipeline-skip-spec` stable; content and research templates available
Web UI	Functional for template browsing, run monitoring, and pipeline launching
Orchemist IDE	Phases 2-5 complete (shell scaffold, pipeline explorer, live log streaming, template editor); schema-driven launch form in progress
OpenRouter tool calling	Shipped 2026-04-16 (PR #795) — 6 tools with path sandbox, retry, JSONL logging
Cost optimization	In progress — command/acceptance_run phases being routed away from LLM (#798)
Documentation	Partial — OpenRouter setup guide, getting started, template authoring available; end-to-end tutorial pending
Community templates	Not yet — contributions welcome

Known limitations:

Spec adversary occasionally doesn't comply with first-line verdict format (pipeline compensates via retry loops at cost of extra rounds)
Cost reporting fallback overestimates by ~3x when OpenRouter doesn't return usage.total_cost (#801)
Bash tool sandbox is a UX guardrail, not a security boundary — production deployments should use OS-level isolation (firejail, containers)

Contributing

Pull requests are welcome! Here's how to get started:

git clone https://github.com/ToscanAI/orchemist.git
cd orchemist
pip install -e ".[dev]"
pytest

Maintainers cutting a release: see docs/RELEASE-SOP.md.

Areas where contributions are especially welcome:

📦 Community templates — add a YAML template to templates/ and submit a PR
🧪 Scenarios & rubrics — improve grading quality for existing templates
🔌 Executors — add support for new model providers (Gemini, Mistral, local models)
📖 Documentation — improve examples, add tutorials, translate docs

Filing issues: Use the right template for the right pipeline:

Issue type	Template	Pipeline
Bug	Bug Report	`coding-pipeline-v1`
Feature / code change	Feature Request	`coding-pipeline-v1`
Documentation	Documentation Request	`docs-pipeline-v1`
Article / blog post	Content Request	`content-pipeline`
Research / analysis	Research Request	`research-competitive`

Please read CONTRIBUTING.md for code style and PR guidelines.

Documentation

Getting Started — detailed setup guide
Architecture — system design
API Reference — CLI commands + Python classes
Web UI — browser interface (orch serve)
Harness redesign pack — vision, duplicate audit, frontend audit, autonomy posture, SVG canon, live screenshots
Tech Stack — dependencies and choices
Security Policy — vulnerability reporting & supported versions

License

MIT © Conny Lazo & Toscan

See LICENSE for the full text.

Orchemist — Tests passing. 3 execution modes. Git-native pipeline handoff.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.13.1

Jun 10, 2026

0.12.0

Jun 8, 2026

0.11.0

May 26, 2026

0.10.0

May 25, 2026

0.9.0

Apr 17, 2026

0.8.0

Mar 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orchemist-0.13.1.tar.gz (6.2 MB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

orchemist-0.13.1-py3-none-any.whl (611.4 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file orchemist-0.13.1.tar.gz.

File metadata

Download URL: orchemist-0.13.1.tar.gz
Upload date: Jun 10, 2026
Size: 6.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for orchemist-0.13.1.tar.gz
Algorithm	Hash digest
SHA256	`367ee7948a872e46c28da5c4dfb1010ec1bf81644693ee600f1b75ce8ba9ea52`
MD5	`bd77eaeadccfedee06b48bf10cac027a`
BLAKE2b-256	`70fb2b1499960b54fee74373f179134030bb53c96912a0b4ad893acb26fda6f1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for orchemist-0.13.1.tar.gz:

Publisher: publish.yaml on ToscanAI/orchemist

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: orchemist-0.13.1.tar.gz
- Subject digest: 367ee7948a872e46c28da5c4dfb1010ec1bf81644693ee600f1b75ce8ba9ea52
- Sigstore transparency entry: 1777576840
- Sigstore integration time: Jun 10, 2026
Source repository:
- Permalink: ToscanAI/orchemist@694b1f4bc86496db3b3175f74c3dd61cb0b7de4d
- Branch / Tag: refs/tags/v0.13.1
- Owner: https://github.com/ToscanAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@694b1f4bc86496db3b3175f74c3dd61cb0b7de4d
- Trigger Event: push

File details

Details for the file orchemist-0.13.1-py3-none-any.whl.

File metadata

Download URL: orchemist-0.13.1-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 611.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for orchemist-0.13.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3928892eac8be83a1757e9ae6b60f7a56001c3d9f2add957959136c454b50a65`
MD5	`9b0c37d3d6f7a64dc87dbe992c80ff86`
BLAKE2b-256	`34cef39db248d1b33dc0c82c60ae9a30315924c5605cd2e27578cc7580d9525e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for orchemist-0.13.1-py3-none-any.whl:

Publisher: publish.yaml on ToscanAI/orchemist

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: orchemist-0.13.1-py3-none-any.whl
- Subject digest: 3928892eac8be83a1757e9ae6b60f7a56001c3d9f2add957959136c454b50a65
- Sigstore transparency entry: 1777576938
- Sigstore integration time: Jun 10, 2026
Source repository:
- Permalink: ToscanAI/orchemist@694b1f4bc86496db3b3175f74c3dd61cb0b7de4d
- Branch / Tag: refs/tags/v0.13.1
- Owner: https://github.com/ToscanAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@694b1f4bc86496db3b3175f74c3dd61cb0b7de4d
- Trigger Event: push

orchemist 0.13.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Orchemist

Scenario-driven orchestration for multi-agent AI pipelines.

Why Orchemist?

What Is It?

Execution modes

Quickstart

Using OpenRouter (multi-provider)

The Harness

1. Fleet Dashboard

2. Run Cockpit

3. Adversary Loop visualizer

4. Trust & Gates

5. Admin / Activation

6. Skills Pack Mode

Use Cases

Features

How It Compares

Architecture

CLI Reference

Installation

Path A — Claude Code Skills Pack (no Python, ~1 minute)

Path B — From PyPI (engine + CLI, ~5 minutes)

Path C — From Source (development, ~10 minutes)

Relationship to OpenClaw

Git as Runtime Dependency

Current Status

Contributing

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance