Production-ready autonomous coding harness using Claude Code SDK
Project description
claude-harness
Production-ready autonomous coding harness using Claude Code SDK. Build complete applications autonomously with a two-agent pattern (initializer + coding agents).
Key Features
๐ฏ Autonomous Development
- Two-agent pattern (Initializer + Coding agents)
- Auto-continues between sessions with fresh context windows
- Progress persisted via feature_list.json and git commits
๐ Production-Ready Quality
- v3.2.2: Mandatory E2E debugging - no workarounds allowed
- v3.2.1: E2E test execution enforced with proof required
- Triple timeout protection (15/10/120 min)
- Retry + skip logic (3 attempts per feature)
- Loop detection prevents infinite hangs
๐ง Code Intelligence (v3.2.0)
- Skills System with 5 built-in skills
- LSP integration for code navigation
- Auto-discovers patterns from existing code
- Mode-specific domain knowledge (greenfield/enhancement/bugfix)
๐ก๏ธ Security First
- Bash command allowlist
- Filesystem restrictions (project dir only)
- Secrets scanning
- Browser cleanup hooks
- MCP auto-configuration (Context7, Puppeteer)
Installation
1. Install claude-harness
# Install from PyPI (recommended)
pip install claude-harness
# Or install from GitHub
pip install git+https://github.com/nirmalarya/claude-harness.git
# Or install from source (development)
git clone https://github.com/nirmalarya/claude-harness.git
cd claude-harness
pip install -e .
This automatically installs all Python dependencies including claude-code-sdk.
2. Authentication Setup
Choose one of the following authentication methods:
Option A: OAuth Token (Recommended)
Install Claude Code CLI to generate your OAuth token:
# Install Claude Code CLI
npm install -g @anthropic-ai/claude-code
# Generate OAuth token (opens browser for authentication)
claude setup-token
# Set the token
export CLAUDE_CODE_OAUTH_TOKEN='your-oauth-token-here'
Option B: API Key
Get your API key from the Anthropic Console:
# Get API key from: https://console.anthropic.com/
# Then set it
export ANTHROPIC_API_KEY='your-api-key-here'
Note: Both authentication methods work. OAuth tokens are generated through the Claude Code CLI and provide full CLI integration. API keys are obtained from the Anthropic Console and work directly with the SDK.
3. Verify Installation
claude-harness --version
Contributing
We welcome contributions from the community! Whether it's bug fixes, new features, or documentation improvements - all contributions are appreciated.
Quick Links
- Contributing Guide - Full contribution guidelines
- Development Setup - Detailed setup instructions
- Code of Conduct - Community standards
- Issue Tracker - Report bugs or request features
Quick Start for Contributors
- Fork and clone the repository
- Create a feature branch:
git checkout -b feature/your-feature-name - Make your changes and add tests
- Run tests and linting:
pytest tests/test_*.py -v && ruff check . - Commit with conventional format:
feat: add feature description - Push and create PR with a clear description
See CONTRIBUTING.md for complete details.
Development Setup
# Install in editable mode
pip install -e .
# Install development dependencies
pip install pytest pytest-asyncio ruff mypy types-pyyaml
# Run tests
pytest tests/test_*.py -v
# Run linting
ruff check .
Quick Start
# 1. Install
pip install claude-harness
# 2. Set up authentication (choose one)
# Option A: OAuth Token
npm install -g @anthropic-ai/claude-code && claude setup-token
export CLAUDE_CODE_OAUTH_TOKEN='your-token-here'
# Option B: API Key
export ANTHROPIC_API_KEY='your-api-key-here'
# 3. Create a spec file
echo "Build a todo list web app with React" > app_spec.txt
# 4. Run the harness
claude-harness --project-dir ./my_project --spec app_spec.txt
# Or test with limited iterations
claude-harness --project-dir ./my_project --spec app_spec.txt --max-iterations 3
Other modes:
# Enhancement mode (add features to existing project)
claude-harness --mode enhancement --project-dir ./existing-app --spec features.txt
# Backlog mode with Linear (NEW in v3.3.0)
export LINEAR_API_KEY='lin_api_...'
claude-harness --mode backlog --project-dir ./my_project
๐ Read the full User Guide โ
Backlog Mode Integrations (NEW in v3.3.0)
claude-harness now supports automated issue tracking with Linear or Azure DevOps in backlog mode. The agent fetches issues, implements features, and updates status automatically.
Linear Integration
Setup:
# 1. Get API key from https://linear.app/settings/api
export LINEAR_API_KEY='lin_api_...'
# 2. Create project marker file in your project
cat > .linear_project.json <<EOF
{
"team_id": "your_team_id",
"project_id": "your_project_id",
"project_name": "My Project"
}
EOF
# 3. Run in backlog mode
claude-harness --mode backlog --project-dir ./my_project
Workflow:
- Agent fetches issues from Linear project
- Picks next "Todo" issue
- Updates to "In Progress"
- Implements feature with E2E tests
- Updates to "Done"
- Adds implementation comment to issue
- Tracks progress via META issue
Features:
- 20 Linear tools for issue management
- Auto-syncs status (Todo โ In Progress โ Done)
- Implementation comments with file changes and test results
- META issue tracking for session progress
- State persistence in
.cursor/linear-backlog-state.json
Finding Team/Project IDs:
# Method 1: From Linear URL
# https://linear.app/workspace/team/ENG/project/my-project
# ^^^ ^^^^^^^^^^
# team_key project_key
# Method 2: Use Linear tools
# Run: linear_list_teams() to get team_id
# Run: linear_list_projects(team_id) to get project_id
Azure DevOps Integration
Setup:
# Set environment variables
export ADO_ORG='your-organization'
export ADO_PROJECT='your-project'
# Run in backlog mode
claude-harness --mode backlog --project-dir ./my_project
Note: Both integrations can be configured simultaneously. The agent will use whichever is available based on environment variables.
What's New in v3.2.2
โ Critical Quality Fix - Mandatory E2E Debugging:
- E2E Test Failures Now Require Debugging - Agents can't skip to code verification when E2E tests fail
- Debugging Scripts Provided - Step-by-step scripts for common issues (backend timeout, DB connection, zombie processes)
- Forbidden Workarounds - Explicitly blocked shortcuts that bypass real testing
- Self-Healing - Agent fixes infrastructure issues (restart backend, start DB, create test users)
- Quality Gate - "If E2E failed: Debugged, fixed, re-ran until passing" is now MANDATORY
โ Skills System (v3.2.0):
- 5 Built-in Skills - puppeteer-testing, code-quality, project-patterns, harness-patterns, lsp-navigation
- Auto-Discovery - Skills loaded from
.claude/skills/and~/.claude/skills/ - Mode-Specific - Different skills for greenfield, enhancement, and bugfix modes
- Progressive Disclosure - SKILL.md + supporting files for rich domain knowledge
โ LSP Integration (v3.2.0):
- Code Intelligence - goToDefinition, findReferences, hover, documentSymbol, etc.
- Navigate Codebases - Find usages, jump to definitions, explore call hierarchies
- Context-Aware - Understand existing patterns before making changes
โ E2E Enforcement (v3.2.1):
- Mandatory E2E Execution - All user-facing features must pass E2E tests
- Proof Required - Agent must show test output with exit code 0
- No More "Trust Me" Commits - Code verification alone is insufficient
๐ Full changelogs: v3.2.2 | v3.2.1 | v3.2.0 | v3.1.0
Important Timing Expectations
Warning: This demo takes a long time to run!
-
First session (initialization): The agent generates a
feature_list.jsonwith 200 test cases. This takes several minutes and may appear to hang - this is normal. The agent is writing out all the features. -
Subsequent sessions: Each coding iteration can take 5-15 minutes depending on complexity.
-
Full app: Building all 200 features typically requires many hours of total runtime across multiple sessions.
Tip: The 200 features parameter in the prompts is designed for comprehensive coverage. If you want faster demos, you can modify prompts/initializer_prompt.md to reduce the feature count (e.g., 20-50 features for a quicker demo).
How It Works
Two-Agent Pattern
-
Initializer Agent (Session 1): Reads
app_spec.txt, createsfeature_list.jsonwith 200 test cases, sets up project structure, and initializes git. -
Coding Agent (Sessions 2+): Picks up where the previous session left off, implements features one by one, and marks them as passing in
feature_list.json.
Session Management
- Each session runs with a fresh context window
- Progress is persisted via
feature_list.jsonand git commits - The agent auto-continues between sessions (3 second delay)
- Press
Ctrl+Cto pause; run the same command to resume
Security Model
This demo uses a defense-in-depth security approach (see security.py and client.py):
- OS-level Sandbox: Bash commands run in an isolated environment
- Filesystem Restrictions: File operations restricted to the project directory only
- Bash Allowlist: Only specific commands are permitted:
- File inspection:
ls,cat,head,tail,wc,grep - Node.js:
npm,node - Version control:
git - Process management:
ps,lsof,sleep,pkill(dev processes only)
- File inspection:
Commands not in the allowlist are blocked by the security hook.
Project Structure
claude-harness/
โโโ autonomous_agent.py # Main entry point
โโโ agent.py # Agent session logic
โโโ client.py # Claude SDK client with skills integration
โโโ security.py # Bash command allowlist and validation
โโโ skills_manager.py # Skills discovery and loading (v3.2.0)
โโโ lsp_plugins.py # LSP code intelligence plugins (v3.2.0)
โโโ progress.py # Progress tracking utilities
โโโ retry_manager.py # Feature retry and skip logic
โโโ loop_detector.py # Infinite loop prevention
โโโ error_handler.py # Structured error logging
โโโ setup_mcp.py # MCP server auto-configuration
โโโ prompts/
โ โโโ app_spec.txt # Application specification
โ โโโ initializer_prompt.md # First session prompt
โ โโโ coding_prompt.md # Continuation session prompt (with v3.2.2 E2E debugging)
โ โโโ [other prompts] # Enhancement, bugfix, validation modes
โโโ harness_data/ # Bundled package data (v3.2.0)
โ โโโ .claude/skills/ # Built-in skills
โ โโโ puppeteer-testing/
โ โโโ code-quality/
โ โโโ project-patterns/
โ โโโ harness-patterns/
โ โโโ lsp-navigation/
โโโ validators/ # Quality enforcement hooks
โ โโโ e2e_hook.py # E2E test enforcement (v3.2.1)
โ โโโ e2e_verifier.py # E2E debugging enforcement (v3.2.2)
โ โโโ secrets_hook.py # Secrets scanning
โ โโโ browser_cleanup_hook.py
โโโ infra/
โ โโโ healer.py # Infrastructure self-healing
โโโ requirements.txt # Python dependencies
Generated Project Structure
After running, your project directory will contain:
my_project/
โโโ feature_list.json # Test cases (source of truth)
โโโ app_spec.txt # Copied specification
โโโ init.sh # Environment setup script
โโโ claude-progress.txt # Session progress notes
โโโ .claude_settings.json # Security settings
โโโ [application files] # Generated application code
Running the Generated Application
After the agent completes (or pauses), you can run the generated application:
cd generations/my_project
# Run the setup script created by the agent
./init.sh
# Or manually (typical for Node.js apps):
npm install
npm run dev
The application will typically be available at http://localhost:3000 or similar (check the agent's output or init.sh for the exact URL).
Command Line Options
| Option | Description | Default |
|---|---|---|
--project-dir |
Directory for the project | ./autonomous_demo_project |
--mode |
Mode: greenfield/enhancement/bugfix | greenfield |
--spec |
Specification file path | None |
--max-iterations |
Max agent iterations | Unlimited |
--model |
Claude model to use | claude-sonnet-4-5-20250929 |
--session-timeout |
Session timeout (minutes) | 120 |
--stall-timeout |
Stall timeout (minutes) | 10 |
--max-retries |
Max retry attempts per feature | 3 |
--version |
Show version and exit | - |
--help |
Show help and exit | - |
๐ Full command reference in User Guide โ
Customization
Changing the Application
Edit prompts/app_spec.txt to specify a different application to build.
Adjusting Feature Count
Edit prompts/initializer_prompt.md and change the "200 features" requirement to a smaller number for faster demos.
Modifying Allowed Commands
Edit security.py to add or remove commands from ALLOWED_COMMANDS.
Troubleshooting
"Appears to hang on first run"
This is normal. The initializer agent is generating 200 detailed test cases, which takes significant time. Watch for [Tool: ...] output to confirm the agent is working.
"Command blocked by security hook"
The agent tried to run a command not in the allowlist. This is the security system working as intended. If needed, add the command to ALLOWED_COMMANDS in security.py.
"OAuth token not set"
Run claude setup-token to generate your token, then ensure CLAUDE_CODE_OAUTH_TOKEN is exported in your shell environment.
Attribution
This project is built upon concepts from Anthropic's autonomous coding research:
- Quickstart Template: claude-quickstarts/autonomous-coding
- Engineering Blog: Effective Harnesses for Long-Running Agents
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file claude_harness-3.6.3.tar.gz.
File metadata
- Download URL: claude_harness-3.6.3.tar.gz
- Upload date:
- Size: 117.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
593c549c8e39fcaf02458d305ce8159cb850ad39a44d2c598a702e895d845db2
|
|
| MD5 |
512e9e0e31b27fa7c087477edafb9929
|
|
| BLAKE2b-256 |
ff7fa00f7732e666b858302f32bbd2a6de75efb30f85b4b8d4f7d7f0e2dba50f
|
Provenance
The following attestation bundles were made for claude_harness-3.6.3.tar.gz:
Publisher:
publish-to-pypi.yml on nirmalarya/claude-harness
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
claude_harness-3.6.3.tar.gz -
Subject digest:
593c549c8e39fcaf02458d305ce8159cb850ad39a44d2c598a702e895d845db2 - Sigstore transparency entry: 825969405
- Sigstore integration time:
-
Permalink:
nirmalarya/claude-harness@09a71f5b0075f8e0afe0fedd26386755cadf0036 -
Branch / Tag:
refs/tags/v3.6.3 - Owner: https://github.com/nirmalarya
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@09a71f5b0075f8e0afe0fedd26386755cadf0036 -
Trigger Event:
release
-
Statement type:
File details
Details for the file claude_harness-3.6.3-py3-none-any.whl.
File metadata
- Download URL: claude_harness-3.6.3-py3-none-any.whl
- Upload date:
- Size: 125.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
248056413aec8efb2637963ca9cd5b157b69590610943b1990ed87f0d5c67871
|
|
| MD5 |
4fba6c4030a16541ef8dc2163a708faa
|
|
| BLAKE2b-256 |
6b0a3e48410626fb3ce352b86b4ff2a87b7be6cdf4cbd8fdc030ad18ff93c130
|
Provenance
The following attestation bundles were made for claude_harness-3.6.3-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on nirmalarya/claude-harness
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
claude_harness-3.6.3-py3-none-any.whl -
Subject digest:
248056413aec8efb2637963ca9cd5b157b69590610943b1990ed87f0d5c67871 - Sigstore transparency entry: 825969441
- Sigstore integration time:
-
Permalink:
nirmalarya/claude-harness@09a71f5b0075f8e0afe0fedd26386755cadf0036 -
Branch / Tag:
refs/tags/v3.6.3 - Owner: https://github.com/nirmalarya
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@09a71f5b0075f8e0afe0fedd26386755cadf0036 -
Trigger Event:
release
-
Statement type: