Skip to main content

AI agent orchestration framework — build, share, and run YAML pipelines

Project description

aqm  |  한국어

Build AI agent teams in YAML. No code. No API keys. Just pipelines.

An orchestration framework where multiple AI agents pass tasks through explicit queues — or discuss in real-time sessions until consensus. Define once, run anywhere, share with anyone.

  [user] ──input──► [planner] ──► [reviewer] ──approve──► [design_session] ──► [implementer]
                        ▲              │                    ┌──┬──┬──┐
                        └── reject ────┘                    ▼  ▼  ▼  ▼  round-robin
                        └── ask user ──►[user]             [arch][sec][fe]  until consensus

Why aqm?

A single AI agent writes code and reviews it with the same bias. It can't catch its own blind spots.

aqm gives you a team — each agent has a dedicated role, a separate prompt, and optionally a different LLM. A quality gate rejects bad output automatically. A session lets agents debate before deciding.

# One YAML file. That's the entire pipeline.
agents:
  - id: developer
    runtime: claude
    system_prompt: "Implement: {{ input }}"
    handoffs: [{ to: reviewer }]

  - id: reviewer
    runtime: gemini                    # Different LLM catches different bugs
    system_prompt: "Review for security: {{ input }}"
    gate:
      type: llm
      prompt: "Is this production-ready?"
      max_retries: 3                   # Auto-reject → retry up to 3 times
    handoffs:
      - { to: deployer, condition: on_approve }
      - { to: developer, condition: on_reject }

  - id: deployer
    runtime: claude
    context_strategy: none             # 85% token savings — no context needed
    system_prompt: "Deploy: {{ input }}"
pip install aqm && aqm init && aqm run "Add JWT authentication"

What makes aqm different

Problem Single Agent aqm
Same LLM reviews its own code One bias, one perspective Cross-LLM verification (Claude writes, Gemini reviews)
No forced quality checks Agent says "looks good" to itself Quality gates auto-reject and retry
Context window explodes at scale Everything in one conversation 5 context strategies — 55-85% token savings
Can't standardize team processes Every run is ad-hoc YAML pipelines — version-controlled, shareable
Complex tasks lose track of progress No built-in task tracking Chunk decomposition — agents break work into trackable units
Expensive API costs Per-token API billing adds up CLI-based — uses your existing CLI subscriptions, no extra API fees
Setup overhead API keys, SDKs, env configs Zero config — uses CLI tools you already have

Install

pip install aqm

Requires Python 3.11+. At least one LLM CLI must be installed:

Runtime Provider Install
claude Anthropic npm i -g @anthropic-ai/claude-code && claude login
gemini Google npm i -g @google/gemini-cli
codex OpenAI npm i -g @openai/codex

No API keys or SDK setup needed — aqm runs CLI tools as subprocesses. You pay for the CLI subscriptions you already have, not per-token API fees.

Quick Start

cd my-project
aqm init                              # Interactive setup wizard
aqm run "Add JWT authentication"       # Run pipeline
aqm serve                              # Web dashboard at localhost:8000

Real-World Examples

Example 1: Code Review Pipeline

Every PR goes through planning, implementation, review, and testing — automatically.

agents:
  - id: planner
    runtime: gemini
    system_prompt: "Break this into implementation steps: {{ input }}"
    handoffs: [{ to: developer }]

  - id: developer
    runtime: claude
    mcp: [{ server: github }]
    system_prompt: "Implement the plan: {{ input }}"
    handoffs: [{ to: reviewer }]

  - id: reviewer
    runtime: gemini                    # Different LLM = different perspective
    system_prompt: "Review for bugs and security issues: {{ input }}"
    gate:
      type: llm
      prompt: "Is this code production-ready? Check OWASP Top 10."
      max_retries: 3
    handoffs:
      - { to: qa, condition: on_approve }
      - { to: developer, condition: on_reject }

  - id: qa
    runtime: claude
    context_strategy: last_only        # Only needs reviewer's output → 55% fewer tokens
    system_prompt: "Write tests for: {{ input }}"
aqm run "Add user preferences with database, API, and frontend"

Example 2: Architecture Decision Session

Multiple experts debate until they agree — like a real design meeting.

agents:
  - id: architect
    runtime: claude
    system_prompt: |
      You are a software architect. Discuss: {{ input }}
      Previous discussion: {{ transcript }}

  - id: security
    runtime: gemini
    system_prompt: |
      You are a security expert. Focus on threats: {{ input }}
      Previous discussion: {{ transcript }}

  - id: design_session
    type: session
    participants: [architect, security]
    max_rounds: 5
    consensus:
      method: vote
      keyword: "VOTE: AGREE"
      require: all
    summary_agent: architect
    handoffs: [{ to: developer }]
── Round 1 ──
  [architect] JWT for stateless scaling. Token rotation every 15min...
  [security] Token revocation is the weak point. Consider hybrid...
── Round 2 ──
  [architect] Agreed — hybrid with Redis blacklist. VOTE: AGREE  ✓
  [security] Redis approach works. VOTE: AGREE  ✓
✓ Consensus reached (round 2)

Example 3: Human-in-the-Loop Deployment

AI does the work, but humans approve the critical steps.

agents:
  - id: developer
    runtime: claude
    human_input:
      mode: before
      prompt: "What features do you want? Any constraints?"
    system_prompt: "Build: {{ input }}"
    handoffs: [{ to: deployer }]

  - id: deployer
    runtime: claude
    gate: { type: human }              # Pipeline pauses for manual approval
    system_prompt: "Deploy: {{ input }}"
aqm run "Refactor auth module"
# → Developer asks for your input first
# → After coding, pipeline pauses at deployer
aqm approve T-ABC123 -r "LGTM, deploy to staging"

Features

Multi-LLM Runtimes

Mix providers per agent. Claude writes code, Gemini reviews it, Codex tests it.

agents:
  - id: planner
    runtime: gemini
    model: gemini-2.5-flash
    system_prompt: "Plan: {{ input }}"
    handoffs: [{ to: developer }]

  - id: developer
    runtime: claude
    mcp: [{ server: github }]         # Auto Code mode
    system_prompt: "Implement: {{ input }}"

Conversational Sessions

Session nodes let multiple agents discuss in rounds until consensus — like a meeting.

agents:
  - id: design_review
    type: session
    participants: [architect, frontend, security]
    turn_order: round_robin           # or: moderator
    max_rounds: 5
    consensus:
      method: vote                    # or: moderator_decides
      keyword: "VOTE: AGREE"
      require: all                    # or: majority
    summary_agent: architect
    handoffs: [{ to: implementer }]

Consensus methods:

Method How It Works
vote Each agent includes the keyword in their output. Consensus when all or majority agree.
moderator_decides Only the summary_agent can declare consensus.

Produces transcript.md meeting minutes. Mix freely: batch → session → batch.

Chunk Decomposition

Break tasks into trackable work units. Agents manage chunks via output directives.

- id: build_session
  type: session
  participants: [pm, dev]
  consensus:
    require_chunks_done: true         # All chunks must be done
  chunks:
    enabled: true
    initial:
      - "Set up project structure"
      - "Implement auth flow"
      - "Add unit tests"

Agent directives:

CHUNK_ADD: Implement drag-and-drop     → adds new chunk
CHUNK_DONE: C-001                      → marks chunk complete
CHUNK_REMOVE: C-003                    → removes chunk

Template variable {{ chunks }} injects a status table into prompts. Stored in chunks.json.

CLI:

aqm chunks list T-ABC123
aqm chunks add T-ABC123 "New feature"
aqm chunks done T-ABC123 C-001
aqm chunks remove T-ABC123 C-002

Web API: CRUD at /api/tasks/{id}/chunks with SSE chunk_update events.

Context Strategy (Token Optimization)

Each agent has a context_strategy that controls what {{ context }} contains. Saves tokens by avoiding redundant context injection.

agents:
  - id: planner
    context_strategy: both            # Full visibility (default)

  - id: developer
    context_strategy: last_only       # Only previous stage → 55% savings
    context_window: 1

  - id: deployer
    context_strategy: none            # No context → 85% savings
Strategy {{ context }} Contains Token Savings Use Case
both (default) Shared context.md + agent's private notes Full visibility, backward-compatible
shared Smart-windowed shared context.md ~same Agents that need pipeline history
last_only Only the most recent stage output ~55% Agents that only need the previous step
own Agent's private agent_{id}.md only ~85% Focused agents with their own notes
none Empty (no context injected) ~85% Self-contained agents with no context needed

Benchmarked on a 10-agent pipeline (see tests/bench_token_efficiency.py):

Strategy      Total Tokens   Savings
both              12,233        0%
last_only          5,504       55%
none               1,873       85%

Handoff Routing

Three strategies for task flow:

# Static — fixed target
handoffs:
  - to: reviewer
    condition: always

# Fan-out — multiple targets in parallel
handoffs:
  - to: qa, docs, deploy
    condition: on_approve

# Agent-decided — agent picks target at runtime
handoffs:
  - to: "*"
    condition: auto    # Agent includes HANDOFF: <id> in output

Conditions: always, on_approve, on_reject, on_pass, auto, or expressions (severity == critical)

Payload variables: {{ output }}, {{ input }}, {{ reject_reason }}, {{ gate_result }}

Human Input (Human-in-the-Loop)

agents:
  - id: planner
    human_input:
      mode: before           # Ask before agent runs
      prompt: "What specific features do you want?"

  - id: developer
    human_input: true        # Shorthand: agent can ask mid-execution via HUMAN_INPUT: <question>

Modes:

Mode Behavior
before Always pause and ask the user before the agent runs.
on_demand Agent requests input via HUMAN_INPUT: <question> directives in output.
both Combines both modes.

Gates (Quality Control)

gate:
  type: llm              # LLM auto-evaluates → approved/rejected
  prompt: "Is this production-ready?"
  max_retries: 3         # Reject → retry up to 3 times, then fail

gate:
  type: human            # Pauses pipeline → aqm approve/reject

Task Restart & Recovery

Resume failed or completed tasks from any stage — no need to start over.

How it works:

  • Before each stage, aqm snapshots all context files (context.md, agent notes, transcripts)
  • On failure, partial output from the runtime is preserved
  • aqm restart rolls back context to the chosen stage and re-executes from there
# Restart from the failed stage (auto-detected)
aqm restart T-A3F2B1

# Restart from a specific stage
aqm restart T-A3F2B1 --from-stage 3

# Re-run everything from scratch
aqm restart T-A3F2B1 --from-stage 1

Works for failed, completed, stalled, and cancelled tasks. The web dashboard also provides a restart button with stage selection.

Event Action
Before each stage Context files snapshotted to snapshots/stage_N/
Task completes successfully All snapshots cleaned up
Task fails Snapshots preserved for restart
aqm restart --from-stage N Context restored from snapshot, stages truncated, pipeline resumes

MCP Servers

Give agents real-world capabilities via Model Context Protocol.

mcp:
  - server: github
  - server: filesystem
    args: ["/path/to/dir"]
  - server: custom-db
    command: node
    args: ["./mcp-server.js"]
    env: { DATABASE_URL: "postgres://..." }

Params (Portable Pipelines)

params:
  model: claude-sonnet-4-20250514
  project_path:
    type: string
    required: true
    prompt: "Project root path?"

agents:
  - id: dev
    model: ${{ params.model }}

Override: aqm run "task" --param model=claude-opus-4-6

Imports / Extends

imports:
  - from: ./shared/reviewers.yaml
    agents: [security_reviewer]

agents:
  - id: base_reviewer
    abstract: true
    runtime: claude
    gate: { type: llm }

  - id: code_reviewer
    extends: base_reviewer
    system_prompt: "Review code: {{ input }}"

Pipeline Registry (Share & Discover)

aqm search "code review"              # Find community pipelines
aqm pull security-audit               # Install in one command
aqm publish --name my-pipeline        # Share yours

CLI Reference

# Setup
aqm init                              # Interactive setup wizard
aqm validate                          # Validate agents.yaml
aqm agents                            # Show agent graph

# Run
aqm run "Add JWT auth"                # Run default pipeline
aqm run "Fix bug" --agent bug_fixer   # Start from specific agent
aqm run "Build API" --pipeline backend # Named pipeline
aqm run "Task" --param model=opus     # Override parameters

# Manage
aqm list                              # List all tasks
aqm status T-ABC123                   # Task details
aqm cancel T-ABC123                   # Cancel task
aqm fix T-ABC123 "Fix the color"      # Follow-up with context
aqm restart T-ABC123                  # Restart from failed stage
aqm restart T-ABC123 --from-stage 2   # Restart from specific stage

# Gates & Human Input
aqm approve T-ABC123                  # Approve gate
aqm reject T-ABC123 -r "Needs tests" # Reject gate
aqm human-input T-ABC123 "response"   # Answer agent's question

# Chunks
aqm chunks list T-ABC123              # Status table
aqm chunks done T-ABC123 C-001        # Mark done

# Pipelines
aqm pipeline list                     # List pipelines
aqm pipeline create review --ai       # AI-generate
aqm pipeline default review           # Set default

# Registry
aqm search "code review"              # Search
aqm pull code-review-pipeline         # Install
aqm publish --name my-pipeline        # Share

# Dashboard
aqm serve                             # Web UI at localhost:8000

agents.yaml Reference

Entry Point (Auto-Routing)

entry_point: auto    # LLM picks the best agent based on user input
# entry_point: first  # (default) Always start with the first agent

Agent Definition

Field Type Default Description
id string Unique identifier (required)
name string "" Display name (auto-generated from id if empty)
type "agent" | "session" "agent" Node type
runtime "claude" | "gemini" | "codex" Required for type: agent
model string CLI default Model override
system_prompt string "" Jinja2 template: {{ input }}, {{ context }}, {{ transcript }}, {{ chunks }}
context_strategy "none" | "last_only" | "own" | "shared" | "both" "both" What context to inject (token optimization)
context_window int 3 Recent stages in full; older stages summarized (0 = all)
human_input boolean | object null Human-in-the-loop input (before, on_demand, both)
handoffs list[Handoff] [] Routing rules
gate object null Quality gate
mcp list[MCPServer] [] MCP server connections
claude_code_flags list[string] null Extra CLI flags for Claude
abstract boolean false Template-only agent (not executed)
extends string null Parent agent ID for inheritance

Handoff Fields

Field Type Default Description
to string Target agent ID, or comma-separated for fan-out ("qa, docs")
task string "" Task name label
condition string "always" always, on_approve, on_reject, on_pass, auto, or expression
payload string "{{ output }}" Jinja2 template: {{ output }}, {{ input }}, {{ reject_reason }}, {{ gate_result }}

Session Fields (type: session)

Field Type Default Description
participants list[string] Agent IDs (required)
turn_order "round_robin" | "moderator" "round_robin" Turn ordering
max_rounds int 10 Hard limit
consensus.method "vote" | "moderator_decides" "vote" How to detect agreement
consensus.keyword string "VOTE: AGREE" Agreement signal
consensus.require "all" | "majority" "all" Threshold
consensus.require_chunks_done boolean false Gate on chunk completion
summary_agent string null Final summary producer
chunks.enabled boolean true Enable chunk tracking
chunks.initial list[string] [] Seed chunks

config.yaml Reference

Project-level configuration at .aqm/config.yaml. All fields are optional.

pipeline:
  max_stages: 20
gate:
  model: claude-sonnet-4-20250514
  timeout: 120
timeouts:
  claude: 600
  gemini: 600
  codex: 600

Comparison

LangGraph CrewAI AutoGen aqm
Pipeline definition Python Python + YAML Python YAML only
Pipeline sharing Paid Open registry
Multi-agent discussion Group chat Session nodes + consensus voting
Task decomposition Chunk tracking
Context optimization Auto-summarize 5 strategies (55-85% savings)
Multi-LLM LangChain LiteLLM Multiple CLI subprocess (no API keys)
Cost model Per-token API Per-token API Per-token API CLI subscription (no extra fees)
Human-in-the-loop Middleware Webhooks HumanProxy First-class per-agent config
Quality gates Callbacks LLM + Human gates
Auto entry routing LLM-based entry_point: auto
Fan-out parallel Manual Manual Declarative
Real-time streaming Token-level SSE
Web dashboard Paid Paid Built-in (free)

Architecture

aqm/
├── core/
│   ├── agent.py          # AgentDefinition, ConsensusConfig, ChunksConfig, HumanInputConfig
│   ├── pipeline.py       # Pipeline loop + _run_session() + context strategy
│   ├── chunks.py         # Chunk model, ChunkManager, directive parser
│   ├── task.py           # Task, StageRecord, TaskStatus
│   ├── gate.py           # LLMGate / HumanGate
│   ├── context_file.py   # context.md + agent_{id}.md + transcript.md + smart windowing
│   ├── context.py        # Jinja2 prompt builder
│   ├── config.py         # ProjectConfig (.aqm/config.yaml)
│   └── project.py        # Project root detection
├── queue/
│   ├── base.py           # AbstractQueue interface
│   ├── sqlite.py         # SQLiteQueue (production)
│   └── file.py           # FileQueue (testing)
├── runtime/
│   ├── base.py           # AbstractRuntime interface
│   ├── claude_code.py    # Claude Code (with MCP, token streaming)
│   ├── gemini.py         # Gemini CLI
│   └── codex.py          # Codex CLI
├── web/
│   ├── app.py            # FastAPI app factory
│   ├── templates.py      # Shared CSS/layout/helpers
│   ├── pages/            # Page renderers (dashboard, agents, registry, validate, task_detail)
│   └── api/              # REST + chunk + SSE + human input endpoints
├── registry.py           # GitHub pipeline registry
└── cli.py                # Click CLI

Community

Discord | Registry | JSON Schema

Contributing

git clone https://github.com/aqm-framework/aqm
cd aqm
pip install -e ".[dev,serve]"
pytest tests/

Pipeline contributions are valued equally to code contributions. See CONTRIBUTING.md.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aqm-1.3.0.tar.gz (131.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aqm-1.3.0-py3-none-any.whl (126.1 kB view details)

Uploaded Python 3

File details

Details for the file aqm-1.3.0.tar.gz.

File metadata

  • Download URL: aqm-1.3.0.tar.gz
  • Upload date:
  • Size: 131.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for aqm-1.3.0.tar.gz
Algorithm Hash digest
SHA256 dbdee505d930d792a34bdd95b683a921156edb95fe35c5be3bfd6ab87d438f13
MD5 788fb3b3bbcef71038606202040d95c4
BLAKE2b-256 333230e72d67dff9e423428b11ed2349adf9d366e78b5b32531f4745cadf15e9

See more details on using hashes here.

File details

Details for the file aqm-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: aqm-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 126.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for aqm-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3179be9d35f070f103df86ca22db472c39bc964f1078e2522d5fef5e13c7597e
MD5 6a205890c408959ebd1bba9882f39317
BLAKE2b-256 d0a9d295cf48d6713faed31c145cb0b41ee18a1db407ce8b93c17ad48cd039a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page