Skip to main content

Production-ready autonomous coding harness using Claude Code SDK

Project description

claude-harness

Production-ready autonomous coding harness using Claude Code SDK. Build complete applications autonomously with a two-agent pattern (initializer + coding agents).

Key Features

๐ŸŽฏ Autonomous Development

  • Two-agent pattern (Initializer + Coding agents)
  • Auto-continues between sessions with fresh context windows
  • Progress persisted via feature_list.json and git commits

๐Ÿ”’ Production-Ready Quality

  • v3.2.2: Mandatory E2E debugging - no workarounds allowed
  • v3.2.1: E2E test execution enforced with proof required
  • Triple timeout protection (15/10/120 min)
  • Retry + skip logic (3 attempts per feature)
  • Loop detection prevents infinite hangs

๐Ÿง  Code Intelligence (v3.2.0)

  • Skills System with 5 built-in skills
  • LSP integration for code navigation
  • Auto-discovers patterns from existing code
  • Mode-specific domain knowledge (greenfield/enhancement/bugfix)

๐Ÿ›ก๏ธ Security First

  • Bash command allowlist
  • Filesystem restrictions (project dir only)
  • Secrets scanning
  • Browser cleanup hooks
  • MCP auto-configuration (Context7, Puppeteer)

Prerequisites

Required: Install the latest versions of both Claude Code and the Claude Agent SDK:

# Install Claude Code CLI (latest version required)
npm install -g @anthropic-ai/claude-code

# Install Python dependencies
pip install -r requirements.txt

Verify your installations:

claude --version  # Should be latest version
pip show claude-code-sdk  # Check SDK is installed

OAuth Token: Generate and set your Claude Code OAuth token:

# Generate the token using Claude Code CLI
claude setup-token

# Set the environment variable
export CLAUDE_CODE_OAUTH_TOKEN='your-oauth-token-here'

Installation

# Install from PyPI (recommended)
pip install claude-harness

# Or install from GitHub
pip install git+https://github.com/nirmalarya/claude-harness.git

# Or install from source (development)
git clone https://github.com/nirmalarya/claude-harness.git
cd claude-harness
pip install -e .

# Verify installation
claude-harness --version

Quick Start

# Set OAuth token (required)
export CLAUDE_CODE_OAUTH_TOKEN='your-token-here'

# Build a new app
claude-harness --project-dir ./my_project

# Test with limited iterations
claude-harness --project-dir ./my_project --max-iterations 3

# Enhancement mode (existing projects)
claude-harness --mode enhancement --project-dir ./existing-app --spec ./features.txt

๐Ÿ“– Read the full User Guide โ†’

What's New in v3.2.2

โœ… Critical Quality Fix - Mandatory E2E Debugging:

  • E2E Test Failures Now Require Debugging - Agents can't skip to code verification when E2E tests fail
  • Debugging Scripts Provided - Step-by-step scripts for common issues (backend timeout, DB connection, zombie processes)
  • Forbidden Workarounds - Explicitly blocked shortcuts that bypass real testing
  • Self-Healing - Agent fixes infrastructure issues (restart backend, start DB, create test users)
  • Quality Gate - "If E2E failed: Debugged, fixed, re-ran until passing" is now MANDATORY

โœ… Skills System (v3.2.0):

  • 5 Built-in Skills - puppeteer-testing, code-quality, project-patterns, harness-patterns, lsp-navigation
  • Auto-Discovery - Skills loaded from .claude/skills/ and ~/.claude/skills/
  • Mode-Specific - Different skills for greenfield, enhancement, and bugfix modes
  • Progressive Disclosure - SKILL.md + supporting files for rich domain knowledge

โœ… LSP Integration (v3.2.0):

  • Code Intelligence - goToDefinition, findReferences, hover, documentSymbol, etc.
  • Navigate Codebases - Find usages, jump to definitions, explore call hierarchies
  • Context-Aware - Understand existing patterns before making changes

โœ… E2E Enforcement (v3.2.1):

  • Mandatory E2E Execution - All user-facing features must pass E2E tests
  • Proof Required - Agent must show test output with exit code 0
  • No More "Trust Me" Commits - Code verification alone is insufficient

๐Ÿ“– Full changelogs: v3.2.2 | v3.2.1 | v3.2.0 | v3.1.0

Important Timing Expectations

Warning: This demo takes a long time to run!

  • First session (initialization): The agent generates a feature_list.json with 200 test cases. This takes several minutes and may appear to hang - this is normal. The agent is writing out all the features.

  • Subsequent sessions: Each coding iteration can take 5-15 minutes depending on complexity.

  • Full app: Building all 200 features typically requires many hours of total runtime across multiple sessions.

Tip: The 200 features parameter in the prompts is designed for comprehensive coverage. If you want faster demos, you can modify prompts/initializer_prompt.md to reduce the feature count (e.g., 20-50 features for a quicker demo).

How It Works

Two-Agent Pattern

  1. Initializer Agent (Session 1): Reads app_spec.txt, creates feature_list.json with 200 test cases, sets up project structure, and initializes git.

  2. Coding Agent (Sessions 2+): Picks up where the previous session left off, implements features one by one, and marks them as passing in feature_list.json.

Session Management

  • Each session runs with a fresh context window
  • Progress is persisted via feature_list.json and git commits
  • The agent auto-continues between sessions (3 second delay)
  • Press Ctrl+C to pause; run the same command to resume

Security Model

This demo uses a defense-in-depth security approach (see security.py and client.py):

  1. OS-level Sandbox: Bash commands run in an isolated environment
  2. Filesystem Restrictions: File operations restricted to the project directory only
  3. Bash Allowlist: Only specific commands are permitted:
    • File inspection: ls, cat, head, tail, wc, grep
    • Node.js: npm, node
    • Version control: git
    • Process management: ps, lsof, sleep, pkill (dev processes only)

Commands not in the allowlist are blocked by the security hook.

Project Structure

claude-harness/
โ”œโ”€โ”€ autonomous_agent.py       # Main entry point
โ”œโ”€โ”€ agent.py                  # Agent session logic
โ”œโ”€โ”€ client.py                 # Claude SDK client with skills integration
โ”œโ”€โ”€ security.py               # Bash command allowlist and validation
โ”œโ”€โ”€ skills_manager.py         # Skills discovery and loading (v3.2.0)
โ”œโ”€โ”€ lsp_plugins.py            # LSP code intelligence plugins (v3.2.0)
โ”œโ”€โ”€ progress.py               # Progress tracking utilities
โ”œโ”€โ”€ retry_manager.py          # Feature retry and skip logic
โ”œโ”€โ”€ loop_detector.py          # Infinite loop prevention
โ”œโ”€โ”€ error_handler.py          # Structured error logging
โ”œโ”€โ”€ setup_mcp.py              # MCP server auto-configuration
โ”œโ”€โ”€ prompts/
โ”‚   โ”œโ”€โ”€ app_spec.txt          # Application specification
โ”‚   โ”œโ”€โ”€ initializer_prompt.md # First session prompt
โ”‚   โ”œโ”€โ”€ coding_prompt.md      # Continuation session prompt (with v3.2.2 E2E debugging)
โ”‚   โ””โ”€โ”€ [other prompts]       # Enhancement, bugfix, validation modes
โ”œโ”€โ”€ harness_data/             # Bundled package data (v3.2.0)
โ”‚   โ””โ”€โ”€ .claude/skills/       # Built-in skills
โ”‚       โ”œโ”€โ”€ puppeteer-testing/
โ”‚       โ”œโ”€โ”€ code-quality/
โ”‚       โ”œโ”€โ”€ project-patterns/
โ”‚       โ”œโ”€โ”€ harness-patterns/
โ”‚       โ””โ”€โ”€ lsp-navigation/
โ”œโ”€โ”€ validators/               # Quality enforcement hooks
โ”‚   โ”œโ”€โ”€ e2e_hook.py           # E2E test enforcement (v3.2.1)
โ”‚   โ”œโ”€โ”€ e2e_verifier.py       # E2E debugging enforcement (v3.2.2)
โ”‚   โ”œโ”€โ”€ secrets_hook.py       # Secrets scanning
โ”‚   โ””โ”€โ”€ browser_cleanup_hook.py
โ”œโ”€โ”€ infra/
โ”‚   โ””โ”€โ”€ healer.py             # Infrastructure self-healing
โ””โ”€โ”€ requirements.txt          # Python dependencies

Generated Project Structure

After running, your project directory will contain:

my_project/
โ”œโ”€โ”€ feature_list.json         # Test cases (source of truth)
โ”œโ”€โ”€ app_spec.txt              # Copied specification
โ”œโ”€โ”€ init.sh                   # Environment setup script
โ”œโ”€โ”€ claude-progress.txt       # Session progress notes
โ”œโ”€โ”€ .claude_settings.json     # Security settings
โ””โ”€โ”€ [application files]       # Generated application code

Running the Generated Application

After the agent completes (or pauses), you can run the generated application:

cd generations/my_project

# Run the setup script created by the agent
./init.sh

# Or manually (typical for Node.js apps):
npm install
npm run dev

The application will typically be available at http://localhost:3000 or similar (check the agent's output or init.sh for the exact URL).

Command Line Options

Option Description Default
--project-dir Directory for the project ./autonomous_demo_project
--mode Mode: greenfield/enhancement/bugfix greenfield
--spec Specification file path None
--max-iterations Max agent iterations Unlimited
--model Claude model to use claude-sonnet-4-5-20250929
--session-timeout Session timeout (minutes) 120
--stall-timeout Stall timeout (minutes) 10
--max-retries Max retry attempts per feature 3
--version Show version and exit -
--help Show help and exit -

๐Ÿ“– Full command reference in User Guide โ†’

Customization

Changing the Application

Edit prompts/app_spec.txt to specify a different application to build.

Adjusting Feature Count

Edit prompts/initializer_prompt.md and change the "200 features" requirement to a smaller number for faster demos.

Modifying Allowed Commands

Edit security.py to add or remove commands from ALLOWED_COMMANDS.

Troubleshooting

"Appears to hang on first run" This is normal. The initializer agent is generating 200 detailed test cases, which takes significant time. Watch for [Tool: ...] output to confirm the agent is working.

"Command blocked by security hook" The agent tried to run a command not in the allowlist. This is the security system working as intended. If needed, add the command to ALLOWED_COMMANDS in security.py.

"OAuth token not set" Run claude setup-token to generate your token, then ensure CLAUDE_CODE_OAUTH_TOKEN is exported in your shell environment.

Attribution

This project is built upon concepts from Anthropic's autonomous coding research:

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_harness-3.2.5.tar.gz (93.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

claude_harness-3.2.5-py3-none-any.whl (108.7 kB view details)

Uploaded Python 3

File details

Details for the file claude_harness-3.2.5.tar.gz.

File metadata

  • Download URL: claude_harness-3.2.5.tar.gz
  • Upload date:
  • Size: 93.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for claude_harness-3.2.5.tar.gz
Algorithm Hash digest
SHA256 3b733459d941896975f6410b20712985dfbcc8aabb5fe19dd9219eadd5ef492c
MD5 bbea10d047f6b7613d558a2444d8adae
BLAKE2b-256 3f0601752da6144884acb735fcb3030270a75ffa46ade183d88aa4f732dce4b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_harness-3.2.5.tar.gz:

Publisher: publish-to-pypi.yml on nirmalarya/claude-harness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file claude_harness-3.2.5-py3-none-any.whl.

File metadata

  • Download URL: claude_harness-3.2.5-py3-none-any.whl
  • Upload date:
  • Size: 108.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for claude_harness-3.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 fda6b88da31e2ab6b4413327c62ffd04e46e319f1d722d56d36f4f1ae3f29e18
MD5 21180277324ba0754e7c0271948fb90a
BLAKE2b-256 1859f30f91a70c33a978c0df76f4309a7306433171ebdf4c5a6e765fc90e0c95

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_harness-3.2.5-py3-none-any.whl:

Publisher: publish-to-pypi.yml on nirmalarya/claude-harness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page