Herramienta CLI headless y agentica para orquestar agentes de IA

These details have not been verified by PyPI

Project links

Project description

architect

Headless, agentic CLI tool for orchestrating AI agents over local files and remote MCP services. Designed to run unattended in CI, cron jobs, and pipelines.

Installation

Requirements: Python 3.12+

# Install from PyPI
pip install architect-ai-cli

# From the repository
git clone https://github.com/Diego303/architect-cli
cd architect-cli
pip install -e .

# Verify installation
architect --version
architect run --help

Optional extras:

pip install architect-ai-cli[dev]        # pytest, black, ruff, mypy
pip install architect-ai-cli[telemetry]  # OpenTelemetry (OTLP traces)
pip install architect-ai-cli[health]     # radon (cyclomatic complexity)

Main dependencies: litellm, click, pydantic, httpx, structlog, tenacity

Quickstart

# Set API key
export LITELLM_API_KEY="sk-..."

# Analyze a project (read-only, safe)
architect run "summarize what this project does" -a resume

# Review code
architect run "review main.py and find issues" -a review

# Generate a detailed plan (without modifying files)
architect run "plan how to add tests to the project" -a plan

# Modify files — build plans and executes in a single step
architect run "add docstrings to all functions in utils.py"

# Run without confirmations (CI/automation)
architect run "generate a README.md file for this project" --mode yolo

# See what it would do without executing anything
architect run "reorganize the folder structure" --dry-run

# Limit total execution time
architect run "refactor the auth module" --timeout 300

Commands

`architect run` — execute task

architect run PROMPT [options]

Argument:

PROMPT — Task description in natural language

Main options:

Option	Description
`-c, --config PATH`	YAML configuration file
`-a, --agent NAME`	Agent to use: `plan`, `build`, `resume`, `review`, or custom
`-m, --mode MODE`	Confirmation mode: `confirm-all`, `confirm-sensitive`, `yolo`
`-w, --workspace PATH`	Working directory (workspace root)
`--dry-run`	Simulate execution without real changes

LLM options:

Option	Description
`--model MODEL`	Model to use (`gpt-4o`, `claude-sonnet-4-6`, etc.)
`--api-base URL`	API base URL
`--api-key KEY`	Direct API key
`--no-stream`	Disable streaming
`--timeout N`	Maximum total execution time in seconds (global watchdog)

Output and reports options:

Option	Description
`-v / -vv / -vvv`	Technical verbose level (without `-v` only agent steps are shown)
`--log-level LEVEL`	Log level: `human` (default), `debug`, `info`, `warn`, `error`
`--log-file PATH`	Save structured JSON logs to file
`--json`	JSON output format (compatible with `jq`)
`--quiet`	Silent mode (only final result to stdout)
`--max-steps N`	Maximum agent steps limit
`--budget N`	Cost limit in USD (stops the agent if exceeded)
`--report FORMAT`	Generate execution report: `json`, `markdown`, `github`
`--report-file PATH`	Save report to file (format inferred from extension: `.json`, `.md`, `.html`)

Session and CI/CD options:

Option	Description
`--session ID`	Resume a previously saved session
`--confirm-mode MODE`	CI-friendly alias: `yolo`, `confirm-sensitive`, `confirm-all`
`--context-git-diff REF`	Inject `git diff REF` as context (e.g., `origin/main`)
`--exit-code-on-partial N`	Custom exit code for `partial` status (default: 2)

Analysis and evaluation options:

Option	Description
`--self-eval off\|basic\|full`	Result self-evaluation: `off` (no extra cost), `basic` (one extra call, marks as `partial` if fails), `full` (retries with correction prompt up to `max_retries` times)
`--health`	Run code health analysis before/after — shows complexity delta, long functions, and duplicates

MCP options:

Option	Description
`--disable-mcp`	Disable connection to MCP servers

`architect sessions` — list saved sessions

architect sessions

Shows a table with all saved sessions: ID, status, steps, cost, and task.

`architect resume` — resume session

architect resume SESSION_ID [options]

Resumes an interrupted session. Loads the complete state (messages, modified files, accumulated cost) and continues where it left off. If the ID doesn't exist, exits with code 3.

`architect cleanup` — clean old sessions

architect cleanup                  # removes sessions > 7 days
architect cleanup --older-than 30  # removes sessions > 30 days

`architect loop` — automatic iteration (Ralph Loop)

architect loop PROMPT --check CMD [options]

Runs an agent in a loop until all checks (shell commands) pass. Each iteration receives a clean context: only the original spec, accumulated diff, errors from the previous iteration, and an auto-generated progress.md.

# Loop until tests and lint pass
architect loop "implement feature X" \
  --check "pytest tests/" \
  --check "ruff check src/" \
  --max-iterations 10 \
  --max-cost 5.0

# With spec file and isolated worktree
architect loop "refactor the auth module" \
  --spec spec.md \
  --check "pytest" \
  --worktree \
  --model gpt-4o

Option	Description
`--check CMD`	Verification command (repeatable, required)
`--spec PATH`	Specification file (used instead of prompt)
`--max-iterations N`	Maximum iterations (default: 25)
`--max-cost N`	Cost limit in USD
`--max-time N`	Time limit in seconds
`--completion-tag TAG`	Tag the agent emits when done (default: `COMPLETE`)
`--agent NAME`	Agent to use (default: `build`)
`--model MODEL`	LLM model
`--worktree`	Run in an isolated git worktree
`--quiet`	Only final result

`architect pipeline` — run YAML workflow

architect pipeline FILE [options]

Runs a multi-step workflow defined in YAML. Each step can have its own agent, model, checks, conditions, and variables. The YAML is validated before execution — unknown fields, missing prompt, and invalid step formats are rejected with clear error messages.

# Run pipeline
architect pipeline ci/pipeline.yaml --var project=myapp --var env=staging

# Preview plan without executing
architect pipeline ci/pipeline.yaml --dry-run

# Resume from a step
architect pipeline ci/pipeline.yaml --from-step deploy

Option	Description
`--var KEY=VALUE`	Pipeline variable (repeatable)
`--from-step NAME`	Resume from a specific step
`--dry-run`	Show plan without executing
`-c, --config PATH`	YAML configuration file
`--quiet`	Only final result

Pipeline YAML format:

name: my-pipeline
steps:
  - name: analyze
    agent: plan
    prompt: "Analyze project {{project}} in {{env}} environment"
    output_var: analysis

  - name: implement
    agent: build
    prompt: "Implement: {{analysis}}"
    model: gpt-4o
    checks:
      - "pytest tests/"
      - "ruff check src/"
    checkpoint: true

  - name: deploy
    agent: build
    prompt: "Deploy to {{env}}"
    condition: "env == 'production'"

`architect parallel` — parallel execution

architect parallel --task CMD [options]

Runs multiple tasks in parallel, each in an isolated git worktree.

# Three tasks in parallel
architect parallel \
  --task "add tests to auth.py" \
  --task "add tests to users.py" \
  --task "add tests to billing.py" \
  --workers 3

# With different models per worker
architect parallel \
  --task "optimize queries" \
  --task "improve logging" \
  --models gpt-4o,claude-sonnet-4-6

Option	Description
`--task CMD`	Task to execute (repeatable)
`--workers N`	Number of parallel workers (default: 3)
`--models LIST`	Comma-separated models (round-robin across workers)
`--agent NAME`	Agent to use (default: `build`)
`--budget-per-worker N`	Cost limit per worker
`--timeout-per-worker N`	Time limit per worker
`--config PATH`	YAML configuration file for workers
`--api-base URL`	LLM API base URL for workers
`--quiet`	Only final result

# With custom config and API
architect parallel \
  --task "optimize queries" \
  --config ci/architect.yaml \
  --api-base http://proxy.internal:8000

# Clean up worktrees after execution
architect parallel-cleanup

`architect eval` — competitive multi-model evaluation

architect eval PROMPT [options]

Runs the same task with multiple models in parallel and generates a comparative ranking. Each model runs in an isolated git worktree with the same validation checks.

# Compare three models
architect eval "implement JWT authentication" \
  --models gpt-4o,claude-sonnet-4-6,gemini-2.0-flash \
  --check "pytest tests/test_auth.py -q" \
  --check "ruff check src/" \
  --budget-per-model 1.0 \
  --report-file eval_report.md

# With timeout and custom agent
architect eval "refactor utils.py" \
  --models gpt-4o,claude-sonnet-4-6 \
  --check "pytest" \
  --timeout-per-model 300 \
  --agent build \
  --max-steps 30

Option	Description
`--models LIST`	Comma-separated models (required)
`--check CMD`	Verification command (repeatable, required)
`--agent NAME`	Agent to use (default: `build`)
`--max-steps N`	Maximum steps per model (default: 50)
`--budget-per-model N`	Cost limit per model in USD
`--timeout-per-model N`	Time limit per model in seconds
`--report-file PATH`	Save report to file
`--config PATH`	YAML configuration file
`--api-base URL`	LLM API base URL

Scoring system (100 points):

Checks passed: 40 pts (proportional)
Status: 30 pts (success=30, partial=15, timeout=5, failed=0)
Efficiency: 20 pts (fewer steps = higher score)
Cost: 10 pts (lower cost = higher score)

`architect init` — initialize project with presets

architect init [options]

Generates initial configuration (.architect.md + config.yaml) from predefined presets.

# View available presets
architect init --list-presets

# Initialize Python project
architect init --preset python

# Maximum security mode (overwrite if exists)
architect init --preset paranoid --overwrite

Option	Description
`--preset NAME`	Preset to apply: `python`, `node-react`, `ci`, `paranoid`, `yolo`
`--list-presets`	Show available presets
`--overwrite`	Overwrite existing files

Available presets:

Preset	Description
`python`	Standard Python project — pytest, ruff, mypy, black, PEP 8, type hints
`node-react`	Node.js/React project — TypeScript strict, ESLint, Prettier, Jest/Vitest
`ci`	Headless CI/CD mode — yolo, no streaming, autonomous
`paranoid`	Maximum security — confirm-all, strict guardrails, code rules, max 20 steps
`yolo`	No restrictions — yolo, 100 steps, no guardrails

`architect agents` — list agents

architect agents                   # default agents
architect agents -c config.yaml   # includes custom from YAML

Lists all available agents with their confirmation mode.

`architect validate-config` — validate configuration

architect validate-config -c config.yaml

Validates the syntax and values of the configuration file before execution.

Agents

An agent defines the role, available tools, and confirmation level.

The default agent is build (used automatically if -a is not specified): it analyzes the project, creates an internal plan, and executes it in a single step, without needing a prior plan agent.

Agent	Description	Tools	Confirmation	Steps
`build`	Plans and executes modifications	all (editing, search, read, `run_command`, `dispatch_subagent`)	`confirm-sensitive`	50
`plan`	Analyzes and generates a detailed plan	`read_file`, `list_files`, `search_code`, `grep`, `find_files`	`yolo`	20
`resume`	Reads and summarizes information	`read_file`, `list_files`, `search_code`, `grep`, `find_files`	`yolo`	15
`review`	Code review and improvements	`read_file`, `list_files`, `search_code`, `grep`, `find_files`	`yolo`	20

Custom agents in config.yaml:

agents:
  deploy:
    system_prompt: |
      You are a deployment agent...
    allowed_tools:
      - read_file
      - list_files
      - run_command
    confirm_mode: confirm-all
    max_steps: 10

Confirmation Modes

Mode	Behavior
`confirm-all`	Every action requires interactive confirmation
`confirm-sensitive`	Only actions that modify the system (write, delete)
`yolo`	No confirmations — neither tools nor commands (for CI/scripts). Safety is guaranteed by the destructive commands blocklist

In environments without TTY (--mode confirm-sensitive in CI), the system raises a clear error. Use --mode yolo or --dry-run in pipelines.

Configuration

Copy config.example.yaml as a starting point:

cp config.example.yaml config.yaml

Minimal structure:

language: en                   # "en" (default) | "es" — agent prompts, logs, reports

llm:
  model: gpt-4o-mini          # or claude-sonnet-4-6, ollama/llama3, etc.
  api_key_env: LITELLM_API_KEY
  timeout: 60
  retries: 2
  stream: true

workspace:
  root: .
  allow_delete: false

logging:
  level: human                 # human (default), debug, info, warn, error
  verbose: 0

Environment Variables

Variable	Config equivalent	Description
`LITELLM_API_KEY`	`llm.api_key_env`	LLM provider API key
`ARCHITECT_MODEL`	`llm.model`	LLM model
`ARCHITECT_API_BASE`	`llm.api_base`	API base URL
`ARCHITECT_LOG_LEVEL`	`logging.level`	Logging level
`ARCHITECT_WORKSPACE`	`workspace.root`	Working directory
`ARCHITECT_LANGUAGE`	`language`	UI language (`en`, `es`)

Output and Exit Codes

stdout/stderr separation:

LLM streaming → stderr (doesn't break pipes)
Logs and progress → stderr
Agent's final result → stdout
--json output → stdout

# Parse result with jq
architect run "summarize the project" --quiet --json | jq .status

# Capture result, view logs
architect run "analyze main.py" -v 2>logs.txt

# Result only (no logs)
architect run "generate README" --quiet --mode yolo

Exit codes:

Code	Meaning
`0`	Success (`success`)
`1`	Agent failure (`failed`)
`2`	Partial — did something but didn't complete (`partial`)
`3`	Configuration error
`4`	LLM authentication error
`5`	Timeout
`130`	Interrupted (Ctrl+C)

JSON Format (`--json`)

architect run "analyze the project" -a review --quiet --json

{
  "status": "success",
  "stop_reason": null,
  "output": "The project consists of...",
  "steps": 3,
  "tools_used": [
    {"name": "list_files", "success": true},
    {"name": "read_file", "path": "src/main.py", "success": true}
  ],
  "duration_seconds": 8.5,
  "model": "gpt-4o-mini",
  "costs": {"total_usd": 0.0023, "prompt_tokens": 4200, "completion_tokens": 380}
}

stop_reason: indicates why the agent stopped. null = terminated naturally. Other values: max_steps, timeout, budget_exceeded, context_full, user_interrupt, llm_error.

When a watchdog triggers (max_steps, timeout, etc.), the agent receives a shutdown instruction and makes one last LLM call to summarize what was completed and what remains pending before terminating.

Logging

By default, architect displays agent steps in a human-readable format with icons:

🔄 Step 1 → LLM call (6 messages)
   ✓ LLM responded with 2 tool calls

   🔧 read_file → src/main.py
      ✓ OK

   🔧 edit_file → src/main.py (3→5 lines)
      ✓ OK
      🔍 Hook ruff: ✓

🔄 Step 2 → LLM call (10 messages)
   ✓ LLM responded with final text

✅ Agent completed (2 steps)
   Reason: LLM decided it was done
   Cost: $0.0042

MCP tools are visually distinguished: 🌐 mcp_github_search → query (MCP: github)

# Human-readable steps only (default — HUMAN level)
architect run "..."

# HUMAN level + technical logs per step
architect run "..." -v

# Full detail (args, LLM responses)
architect run "..." -vv

# Everything (HTTP, payloads)
architect run "..." -vvv

# No logs (result only)
architect run "..." --quiet

# Logs to JSON file + console
architect run "..." -v --log-file logs/session.jsonl

# Analyze logs afterwards
cat logs/session.jsonl | jq 'select(.event == "tool.call")'

Independent logging pipelines:

HUMAN (stderr, default): steps, tool calls, hooks — readable format with icons, no technical noise
Technical (stderr, with -v): LLM debug, tokens, retries — excludes HUMAN messages
JSON file (file, with --log-file): all structured events

See docs/logging.md for logging architecture details.

Lifecycle Hooks

Complete hook system that runs at 10 points in the agent lifecycle. Allows intercepting, blocking, or modifying operations.

hooks:
  pre_tool_use:
    - command: "python scripts/validate_tool.py"
      matcher: "write_file|edit_file"
      timeout: 5

  post_tool_use:
    - command: "ruff check {file} --fix"
      file_patterns: ["*.py"]
      timeout: 15
    - command: "mypy {file} --ignore-missing-imports"
      file_patterns: ["*.py"]
      timeout: 30

  session_start:
    - command: "echo 'Session started'"
      async: true

  agent_complete:
    - command: "python scripts/post_run.py"

Available events: pre_tool_use, post_tool_use, pre_llm_call, post_llm_call, session_start, session_end, on_error, budget_warning, context_compress, agent_complete

Exit code protocol:

0 = ALLOW (continue; if stdout contains JSON with updatedInput, the input is modified)
2 = BLOCK (abort the operation)
Other = error (warning in logs, execution continues)

Injected environment variables: ARCHITECT_EVENT, ARCHITECT_TOOL, ARCHITECT_WORKSPACE, ARCHITECT_FILE (if applicable)

Backward compatible: the post_edit section still works and maps to post_tool_use with editing tools matcher.

Guardrails

Deterministic security layer evaluated before hooks. Cannot be disabled by the LLM.

guardrails:
  # Write-only protection: blocks write/edit/delete, allows read
  protected_files:
    - "config/production.yaml"
    - ".git/**"

  # Full protection: blocks ALL access including read (v1.1.0)
  sensitive_files:
    - ".env"
    - ".env.*"
    - "*.pem"
    - "*.key"
    - "secrets/**"

  blocked_commands:
    - "rm -rf /"
    - "DROP TABLE"
  max_files_per_session: 20
  max_lines_changed: 5000
  code_rules:
    - pattern: "TODO|FIXME"
      severity: warn
      message: "Code with pending TODOs"
    - pattern: "eval\\("
      severity: block
      message: "eval() not allowed"
  quality_gates:
    - name: tests
      command: "pytest --tb=short -q"
      required: true
    - name: lint
      command: "ruff check src/"
      required: false

protected_files vs sensitive_files: protected_files blocks write/edit/delete operations but allows the agent to read the file. sensitive_files blocks all access including reads — the agent cannot see the file contents. Use sensitive_files for secrets (.env, private keys) to prevent them from being sent to the LLM provider.

Shell command detection: sensitive_files also blocks shell reads (cat, head, tail, less) and shell redirects (>, >>, | tee) targeting sensitive files.

Quality gates: executed when the agent declares completion. If a required gate fails, the agent receives feedback and keeps working until it passes.

Skills and .architect.md

The agent automatically loads project context from .architect.md, AGENTS.md, or CLAUDE.md in the workspace root and injects its content into the system prompt.

Specialized skills are discovered in .architect/skills/ and .architect/installed-skills/:

.architect/
├── skills/
│   └── django/
│       └── SKILL.md        # YAML frontmatter + content
└── installed-skills/
    └── react-patterns/
        └── SKILL.md

Each SKILL.md can have a YAML frontmatter with globs to activate only when relevant files are in play:

---
name: django
description: Django patterns for the project
globs: ["*.py", "*/models.py", "*/views.py"]
---
# Django Instructions
Use class-based views whenever possible...

# Skill management
architect skill list
architect skill create my-skill
architect skill install github-user/repo/path/to/skill
architect skill remove my-skill

Procedural Memory

The agent detects user corrections and persists them across sessions in .architect/memory.md.

memory:
  enabled: true
  auto_detect_corrections: true

When the user corrects the agent (e.g., "don't use print, use logging"), the pattern is saved and injected in future sessions as additional context in the system prompt.

The .architect/memory.md file is manually editable and follows the format:

- [2026-02-22] correction: Don't use print(), use logging
- [2026-02-22] pattern: Always run tests after editing

Internationalization (i18n)

architect supports English and Spanish for all agent-facing output: human logs, agent prompts, reports, guardrail messages, and evaluation feedback.

# config.yaml
language: es   # "en" (default) | "es"

# Or via environment variable
ARCHITECT_LANGUAGE=es architect run "analyze the project"

What changes with language:

Agent system prompts (built-in agents only — custom prompts are unchanged)
Human-readable log output (step indicators, tool results, status messages)
Report headers and labels (health delta, competitive eval, ralph progress)
Guardrail blocking messages
Self-evaluator prompts and feedback
Context manager markers

What stays in English regardless: CLI --help text, error messages, command names, JSON output keys.

The default language is English. All 160 translation keys have full parity between EN and ES.

Cost Control

costs:
  budget_usd: 2.0         # Stops the agent if it exceeds $2
  warn_at_usd: 1.5        # Warns in logs when reaching $1.5

# Budget limit via CLI
architect run "..." --budget 1.0

Accumulated cost appears in the --json output under costs and with --show-costs at the end of execution (works with both streaming and non-streaming modes). When the budget is exceeded, the agent receives a shutdown instruction and produces one last summary before terminating (stop_reason: "budget_exceeded").

MCP (Model Context Protocol)

Connect architect to remote tools via HTTP:

mcp:
  servers:
    - name: github
      url: http://localhost:3001
      token_env: GITHUB_TOKEN

    - name: database
      url: https://mcp.example.com/db
      token_env: DB_TOKEN

MCP tools are automatically discovered at startup and injected into the active agent's allowed_tools (no need to list them in the agent config). They are indistinguishable from local tools for the LLM. If a server is unavailable, the agent continues without those tools.

# With MCP
architect run "open a PR with the changes" --mode yolo

# Without MCP
architect run "analyze the project" --disable-mcp

Sessions and Resume

The agent automatically saves its state after each step. If an execution is interrupted (Ctrl+C, timeout, error), you can resume it:

# Run a long task
architect run "refactor the entire auth module" --budget 5.0
# → Interrupted by timeout or Ctrl+C

# View saved sessions
architect sessions
# ID                     Status       Steps  Cost    Task
# 20260223-143022-a1b2   interrupted  12     $1.23   refactor the entire auth module

# Resume where it left off
architect resume 20260223-143022-a1b2

# Clean up old sessions
architect cleanup --older-than 7

Sessions are saved in .architect/sessions/ as JSON files. Long messages (>50) are automatically truncated to the last 30 to keep the size manageable.

Execution Reports

Generate detailed reports of what the agent did, in three formats:

# JSON report (ideal for CI/CD)
architect run "add tests" --mode yolo --report json

# Markdown report (for documentation)
architect run "refactor utils" --mode yolo --report markdown --report-file report.md

# GitHub PR comment (with collapsible sections)
architect run "review the changes" --mode yolo --report github --report-file pr-comment.md

The report includes: summary (task, agent, model, status, duration, steps, cost), modified files with added/removed lines, executed quality gates, errors found, timeline of each step, and git diff.

Ralph Loop (Automatic Iteration)

The Ralph Loop runs an agent iteratively until all checks pass. Each iteration uses a clean context — the agent receives only:

The original spec (file or prompt)
The accumulated diff from all previous iterations
Check errors from the previous iteration
An auto-generated progress.md with history

# Iterate until tests and lint pass
architect loop "implement JWT authentication" \
  --check "pytest tests/test_auth.py" \
  --check "ruff check src/auth/" \
  --max-iterations 5 \
  --max-cost 3.0

# With detailed spec file
architect loop "implement per spec" \
  --spec requirements/auth-spec.md \
  --check "pytest" \
  --worktree

Safety nets: The loop stops if iterations (max_iterations), cost (max_cost), or time (max_time) are exhausted. The result indicates the stop reason.

Worktree: With --worktree, the loop runs in an isolated git worktree. If all checks pass, the result includes the worktree path for inspection or merge.

Pipeline Mode (Multi-Step Workflows)

Pipelines define sequential workflows where each step can have its own agent, model, checks, and configuration.

Features:

Variables: {{name}} in prompts, substituted from --var or from output_var of previous steps
Conditions: condition evaluates an expression; the step is skipped if false
Output variables: output_var captures a step's output as a variable for subsequent steps
Checks: post-step shell commands that verify the result
Checkpoints: checkpoint: true creates an automatic git commit upon step completion
Resume: --from-step allows resuming a pipeline from a specific step
Dry-run: --dry-run shows the plan without executing agents

# pipeline.yaml
name: feature-pipeline
steps:
  - name: plan
    agent: plan
    prompt: "Plan how to implement {{feature}}"
    output_var: plan_output

  - name: implement
    agent: build
    prompt: "Execute this plan: {{plan_output}}"
    model: gpt-4o
    checks:
      - "pytest tests/ -q"
    checkpoint: true

  - name: review
    agent: review
    prompt: "Review the implementation of {{feature}}"
    condition: "run_review == 'true'"

architect pipeline pipeline.yaml \
  --var feature="user auth" \
  --var run_review=true

Parallel Execution

Run multiple tasks in parallel, each in an isolated git worktree with ProcessPoolExecutor.

architect parallel \
  --task "add unit tests to auth.py" \
  --task "add unit tests to users.py" \
  --task "add unit tests to billing.py" \
  --workers 3 \
  --budget-per-worker 2.0

Each worker:

Runs in an independent git worktree (total isolation)
Can use a different model (with --models they are assigned round-robin)
Has its own budget and timeout
The result includes modified files, cost, duration, and worktree path

# Clean up worktrees afterwards
architect parallel-cleanup

Checkpoints and Rollback

Checkpoints are git commits with a special prefix (architect:checkpoint) that allow restoring the workspace to a previous point. They are created automatically in pipelines (with checkpoint: true) and can be used in the Ralph Loop.

# Checkpoints are created automatically in pipelines with checkpoint: true
# To view created checkpoints:
git log --oneline --grep="architect:checkpoint"

The CheckpointManager allows:

Creating checkpoints (stage all + commit with prefix)
Listing existing checkpoints by parsing git log
Rolling back to a specific checkpoint (by step or commit hash)
Verifying if there are changes since a checkpoint

Auto-Review

After a build execution, a reviewer with clean context can inspect the changes. The reviewer receives only the diff and the original task — without the builder's history — and has exclusive access to read-only tools.

# Enable auto-review in config
auto_review:
  enabled: true
  model: gpt-4o

The reviewer looks for:

Bugs and logic errors
Security issues
Project convention violations
Performance or readability improvements
Missing tests

If issues are found, it generates a correction prompt that can feed the builder for a fix-pass.

Code Health Delta

Automatic code quality metrics analysis before and after an execution. Shows a delta of cyclomatic complexity, long functions, duplicates, and more.

# Enable with flag
architect run "refactor the auth module" --health

# Or enable permanently in config

health:
  enabled: true
  include_patterns: ["**/*.py"]
  exclude_dirs: [".git", "venv", "__pycache__"]

Analyzed metrics:

Cyclomatic complexity (requires radon installed, falls back to AST if not)
Lines per function
New/removed functions
Duplicate code blocks (6-line sliding window, MD5 hash)
Long functions (>50 lines)
Complex functions (>10 complexity)

The report is displayed on stderr at the end of execution as a markdown table with improvement/degradation indicators.

Competitive Evaluation

Competitive evaluation runs the same task with multiple models and generates a ranking based on quality, efficiency, and cost.

architect eval "implement JWT authentication" \
  --models gpt-4o,claude-sonnet-4-6 \
  --check "pytest tests/" \
  --check "ruff check src/" \
  --budget-per-model 1.0

Each model runs in an isolated git worktree (reuses ParallelRunner infrastructure). After execution, checks are run in each worktree and a comparative ranking is generated.

Generated report: table with status, steps, cost, time, passed checks, and composite score. Worktrees remain for manual inspection.

Sub-Agents (Dispatch)

The main agent can delegate specialized sub-tasks via the dispatch_subagent tool. Each sub-agent runs with a fresh AgentLoop with isolated context and limited tools.

Sub-agent types:

Type	Available tools	Use case
`explore`	`read_file`, `list_files`, `search_code`, `grep`, `find_files`	Investigate code, search patterns
`test`	Explore + `run_command`	Run tests, verify behavior
`review`	Explore (read-only)	Review code, quality analysis

Each sub-agent has a maximum of 15 steps and its summary is truncated to 1000 characters to avoid polluting the main agent's context.

OpenTelemetry Traces

Optional traceability with OpenTelemetry for monitoring sessions, LLM calls, and tool execution.

telemetry:
  enabled: true
  exporter: otlp          # otlp | console | json-file
  endpoint: http://localhost:4317
  trace_file: .architect/traces.json  # for json-file

Supported exporters:

otlp: Sends spans via gRPC (compatible with Jaeger, Grafana Tempo, etc.)
console: Prints spans to stderr (debugging)
json-file: Writes spans to a JSON file

Semantic attributes (GenAI Semantic Conventions):

gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.usage.cost
architect.task, architect.agent, architect.session_id, architect.tool_name

Optional dependencies: opentelemetry-api, opentelemetry-sdk, opentelemetry-exporter-otlp. If not installed, a transparent NoopTracer with no performance impact is used.

Configuration Presets

Presets generate .architect.md and config.yaml with predefined configurations based on project type.

# View available presets
architect init --list-presets

# Initialize for Python project
architect init --preset python
# → Creates .architect.md (conventions) + config.yaml (hooks: ruff, mypy)

# Paranoid mode (maximum security)
architect init --preset paranoid
# → confirm-all, max 20 steps, strict code rules, quality gates

Generated files are editable — they serve as a starting point. Use --overwrite to replace existing files.

CI/CD Usage

Basic Example — GitHub Actions

- name: Refactor code
  run: |
    architect run "update obsolete imports in src/" \
      --mode yolo \
      --quiet \
      --json \
      --budget 3.0 \
      -c ci/architect.yaml \
    | tee result.json

- name: Verify result
  run: |
    STATUS=$(cat result.json | jq -r .status)
    if [ "$STATUS" != "success" ]; then
      echo "architect failed with status: $STATUS ($(cat result.json | jq -r .stop_reason))"
      exit 1
    fi

Advanced Example — with reports, dry-run, and git diff

- name: Dry run first (see what it would do)
  run: |
    architect run "add docstrings to all functions" \
      --dry-run \
      --confirm-mode yolo \
      --json

- name: Execute with PR context
  run: |
    architect run "review and improve this PR's changes" \
      --confirm-mode yolo \
      --context-git-diff origin/main \
      --report github \
      --report-file pr-report.md \
      --budget 5.0 \
      --timeout 600 \
      --exit-code-on-partial 0

- name: Comment on PR
  if: always()
  run: gh pr comment $PR_NUMBER --body-file pr-report.md

CI Config

# ci/architect.yaml
llm:
  model: gpt-4o-mini
  api_key_env: OPENAI_API_KEY
  retries: 3
  timeout: 120

workspace:
  root: .

logging:
  level: human
  verbose: 0

hooks:
  post_edit:
    - name: lint
      command: "ruff check {file} --fix"
      file_patterns: ["*.py"]

Security

Path traversal: all file operations are confined to workspace.root. Attempts to access ../../etc/passwd are blocked.
delete_file requires explicit workspace.allow_delete: true in config.
run_command: destructive commands blocklist (rm -rf /, sudo, dd, mkfs, curl|bash, etc.) always active, regardless of confirmation mode. Dynamic classification (safe/dev/dangerous) for confirmation policies in confirm-sensitive and confirm-all modes. Working directory is always confined to the workspace.
MCP tools are marked as sensitive by default (require confirmation in confirm-sensitive).
API keys are never logged, only the environment variable name.

Supported LLM Providers

Any provider supported by LiteLLM:

# OpenAI
LITELLM_API_KEY=sk-... architect run "..." --model gpt-4o

# Anthropic
LITELLM_API_KEY=sk-ant-... architect run "..." --model claude-sonnet-4-6

# Google Gemini
LITELLM_API_KEY=... architect run "..." --model gemini/gemini-2.0-flash

# Ollama (local, no API key)
architect run "..." --model ollama/llama3 --api-base http://localhost:11434

# LiteLLM Proxy (for teams)
architect run "..." --api-base http://proxy.internal:8000

Architecture

architect run PROMPT
    │
    ├── load_config()          YAML + env vars + CLI flags
    ├── configure_logging()    3 pipelines: HUMAN + technical + JSON file
    ├── ToolRegistry           local tools (fs, editing, search, run_command) + remote MCP
    ├── RepoIndexer            workspace tree → injected into system prompt
    ├── LLMAdapter             LiteLLM with selective retries + prompt caching
    ├── ContextManager         pruning: compress + enforce_window + is_critically_full
    ├── HookExecutor           10 lifecycle events, exit code protocol
    ├── GuardrailsEngine       deterministic security (before hooks)
    ├── SkillsLoader           .architect.md + skills by glob
    ├── ProceduralMemory       user corrections across sessions
    ├── CostTracker            accumulated cost + budget watchdog
    ├── SessionManager         session persistence (save/load/resume)
    ├── DryRunTracker          action recording without execution (--dry-run)
    ├── CheckpointManager      git commits with rollback (architect:checkpoint)
    ├── ArchitectTracer        OpenTelemetry spans (session/llm/tool) or NoopTracer
    ├── CodeHealthAnalyzer     quality metrics before/after (--health)
    │
    ├── RalphLoop              automatic iteration until checks pass
    │       └── agent_factory() → fresh AgentLoop per iteration (clean context)
    ├── PipelineRunner         multi-step YAML workflows with variables/conditions
    │       └── agent_factory() → fresh AgentLoop per step
    ├── ParallelRunner         parallel execution in isolated git worktrees
    │       └── ProcessPoolExecutor → workers with `architect run` in worktrees
    ├── CompetitiveEval        comparative multi-model evaluation over ParallelRunner
    ├── AutoReviewer           post-build review with clean context (diff + task only)
    ├── PresetManager          .architect.md + config.yaml generation from presets
    ├── DispatchSubagentTool   sub-task delegation (explore/test/review)
    │
    └── AgentLoop (while True — the LLM decides when to stop)
            │
            ├── _check_safety_nets()   max_steps / budget / timeout / context_full
            │       └── if triggered → _graceful_close(): last LLM call without tools
            │                         agent summarizes what was done and what remains
            ├── context_manager.manage()     compress + enforce_window if needed
            ├── hooks: pre_llm_call          → intercept before LLM
            ├── llm.completion()             → streaming chunks to stderr
            ├── hooks: post_llm_call         → intercept after LLM
            ├── if no tool_calls             → LLM_DONE, natural end
            ├── guardrails.check()           → deterministic security (before hooks)
            ├── hooks: pre_tool_use          → ALLOW / BLOCK / MODIFY
            ├── engine.execute_tool_calls()  → parallel if possible → confirm → execute
            ├── hooks: post_tool_use         → lint/test → feedback to LLM if fails
            └── repeat

Stop reasons (stop_reason in JSON output):

Reason	Description
`null` / `llm_done`	The LLM decided it was done (natural termination)
`max_steps`	Watchdog: step limit reached
`budget_exceeded`	Watchdog: cost limit exceeded
`context_full`	Watchdog: context window full (>95%)
`timeout`	Watchdog: total time exceeded
`user_interrupt`	User pressed Ctrl+C / SIGTERM (immediate cut)
`llm_error`	Unrecoverable LLM error

Design decisions:

Sync-first (predictable, debuggable; the main loop is ~300 lines without magic)
No LangChain/LangGraph (the loop is direct and controlled)
Pydantic v2 as the source of truth for schemas and validation
Tool errors returned to the LLM as results (don't break the loop)
Clean stdout for pipes, everything else to stderr
Watchdogs request graceful shutdown — the agent never terminates mid-sentence

Version History

Version	Features
v0.9.0	Incremental editing: `edit_file` (exact str-replace) and `apply_patch` (unified diff)
v0.10.0	Indexer + search: repo tree in system prompt, `search_code`, `grep`, `find_files`
v0.11.0	Context management: tool result truncation, step compression with LLM, hard limit, parallel tool calls
v0.12.0	Self-evaluation: `--self-eval basic/full` evaluates and retries automatically
v0.13.0	`run_command`: command execution (tests, linters) with 4 security layers
v0.14.0	Cost tracking: `CostTracker`, `--budget`, prompt caching, `LocalLLMCache`
v0.15.0	v3-core — core redesign: `while True` loop, safety nets with graceful shutdown, `PostEditHooks`, HUMAN log level, `StopReason`, `ContextManager.manage()`
v0.15.2	Human logging with icons — visual format aligned with v3 plan: 🔄🔧🌐✅⚡❌📦🔍, MCP distinction, new events (`llm_response`), cost in completion
v0.15.3	Fix structlog pipeline — human logging works without `--log-file`; `wrap_for_formatter` always active
v0.16.0	v4 Phase A — lifecycle hooks (10 events, exit code protocol), deterministic guardrails, skills ecosystem (.architect.md), procedural memory
v0.16.1	QA Phase A — 228 verifications, 5 bugs fixed (ToolResult import, CostTracker.total, YAML off, schema shadowing), 24 aligned scripts
v0.16.2	QA2 — `--show-costs` works with streaming, `--mode yolo` never asks for confirmation (not even for `dangerous`), `--timeout` is session watchdog (doesn't override `llm.timeout`), MCP tools auto-injected into `allowed_tools`, defensive `get_schemas`
v0.17.0	v4 Phase B — persistent sessions with resume, multi-format reports (JSON/Markdown/GitHub PR), 10 native CI/CD flags (`--dry-run`, `--report`, `--session`, `--context-git-diff`, `--confirm-mode`, `--exit-code-on-partial`), dry-run/preview mode, 3 new commands (`sessions`, `resume`, `cleanup`)
v0.18.0	v4 Phase C — Ralph Loop (automatic iteration with checks), Pipeline Mode (multi-step YAML workflows with variables, conditions, checkpoints), parallel execution in git worktrees, checkpoints with rollback, post-build auto-review with clean context, 4 new commands (`loop`, `pipeline`, `parallel`, `parallel-cleanup`)
v0.19.0	v4 Phase D — Competitive multi-model evaluation (`architect eval`), preset initialization (`architect init` with 5 presets), code health analysis (`--health` with complexity/duplicates delta), delegated sub-agents (`dispatch_subagent` with explore/test/review types), OpenTelemetry traceability (session/llm/tool spans), 7 QA bugfixes (code_rules pre-execution, dispatch wiring, telemetry wiring, health wiring, parallel config propagation)
v1.0.0	Stable release — First public version. Culmination of Plan V4 (Phases A+B+C+D) on v3 core. 15 CLI commands, 11+ tools, 4 agents, hooks + guardrails + skills + memory, sessions + reports + CI/CD, Ralph Loop + pipelines + parallel + checkpoints + auto-review, sub-agents + health + eval + telemetry + presets. 687 tests, 31 E2E checks.
v1.0.1	Bugfixes — Test fixes and general stability corrections after initial release.
v1.1.0	`sensitive_files` guardrail — New `sensitive_files` field blocks both read and write access to secret files (`.env`, `.pem`, `.key`). Shell read detection (`cat`, `head`, `tail`). `protected_files` remains write-only (backward compatible). Report improvements — `--report-file` now works without `--report` (format inferred from extension), and automatically creates parent directories with fallback. Pipeline YAML validation — Strict validation before execution: unknown fields rejected (with hints), `prompt` required, `PipelineValidationError` with all errors collected. HUMAN logging — Visual traceability for all high-level features: pipeline steps, ralph iterations with check results, auto-review status, parallel worker progress, competitive eval ranking with medals. 14 new HUMAN events across 5 modules, 14 formatter cases, 11 HumanLog helpers. 795 tests.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Mar 2, 2026

1.0.1

Feb 27, 2026

1.0.0

Feb 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

architect_ai_cli-1.1.0.tar.gz (751.0 kB view details)

Uploaded Mar 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

architect_ai_cli-1.1.0-py3-none-any.whl (213.7 kB view details)

Uploaded Mar 2, 2026 Python 3

File details

Details for the file architect_ai_cli-1.1.0.tar.gz.

File metadata

Download URL: architect_ai_cli-1.1.0.tar.gz
Upload date: Mar 2, 2026
Size: 751.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for architect_ai_cli-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3372f306578d66c2d7d68058d072d0aa9054cdce40a10590cde0c67fd717c1bb`
MD5	`8d3f1aa9e1408e7a55b4c5aa80bbb8d1`
BLAKE2b-256	`78a107fc5452afb6c5af127725a5d13e0d17bfac981112b29ae3abaa8759e4cb`

See more details on using hashes here.

File details

Details for the file architect_ai_cli-1.1.0-py3-none-any.whl.

File metadata

Download URL: architect_ai_cli-1.1.0-py3-none-any.whl
Upload date: Mar 2, 2026
Size: 213.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for architect_ai_cli-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b08bb6f3bc63ccd051b93afe38b9a162aa5732b921fe92e904bb7b83ef23a233`
MD5	`71debe1408eb6736cd0d17f7bdc56b66`
BLAKE2b-256	`276d59e8bc5e3a3a20c83dba9595050d9c88c0fafd1d99f50512eb436f10a196`

See more details on using hashes here.

architect-ai-cli 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

architect

Installation

Quickstart

Commands

architect run — execute task

architect sessions — list saved sessions

architect resume — resume session

architect cleanup — clean old sessions

architect loop — automatic iteration (Ralph Loop)

architect pipeline — run YAML workflow

architect parallel — parallel execution

architect eval — competitive multi-model evaluation

architect init — initialize project with presets

architect agents — list agents

architect validate-config — validate configuration

Agents

Confirmation Modes

Configuration

Environment Variables

Output and Exit Codes

JSON Format (--json)

Logging

Lifecycle Hooks

Guardrails

Skills and .architect.md

Procedural Memory

Internationalization (i18n)

Cost Control

MCP (Model Context Protocol)

Sessions and Resume

Execution Reports

Ralph Loop (Automatic Iteration)

Pipeline Mode (Multi-Step Workflows)

Parallel Execution

Checkpoints and Rollback

Auto-Review

Code Health Delta

Competitive Evaluation

Sub-Agents (Dispatch)

OpenTelemetry Traces

Configuration Presets

CI/CD Usage

Basic Example — GitHub Actions

Advanced Example — with reports, dry-run, and git diff

CI Config

Security

Supported LLM Providers

Architecture

Version History

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`architect run` — execute task

`architect sessions` — list saved sessions

`architect resume` — resume session

`architect cleanup` — clean old sessions

`architect loop` — automatic iteration (Ralph Loop)

`architect pipeline` — run YAML workflow

`architect parallel` — parallel execution

`architect eval` — competitive multi-model evaluation

`architect init` — initialize project with presets

`architect agents` — list agents

`architect validate-config` — validate configuration

JSON Format (`--json`)