Pipeline to convert GitHub PRs into Harbor tasks
Project description
SWE-gen
Convert merged GitHub PRs into Harbor tasks automatically.
Overview
Automates task creation from real bug fixes in open-source GitHub repos. Works with any programming language: Claude Code analyzes the repo to detect language, build system, and test framework.
Each task reverses a merged PR to recreate the buggy state, verifies tests fail on baseline, and pass after applying the fix. Fully containerized with all dependencies installed at build time.
News
- [01/2026] ๐ฅ SWE-gen-JS released: 1,000 JS/TS task dataset generated with SWE-gen
Quick Start
# Install
uv pip install swegen
# Generate a task from a merged PR
swegen create --repo axios/axios --pr 7150
# Or farm all PRs from a repo
swegen farm fastapi/fastapi
Installation
uv pip install swegen
Ensure these environment variables are set:
export GITHUB_TOKEN=<gh-token>
export OPENAI_API_KEY=<api-key>
export ANTHROPIC_API_KEY=<api-key> # or CLAUDE_CODE_OAUTH_TOKEN
Note: Cloud sandbox environments (Daytona, E2B, Modal, etc.) require additional API keys.
Usage
Commands:
swegen createโ Generate a task from a merged PRswegen farmโ Continuously process PRs from a repositoryswegen validateโ Validate existing task (NOP + Oracle)swegen analyzeโ Deep analysis with agent trials to verify task quality
Generate a Task
swegen create --repo <owner/repo> --pr <num>
Options
--output PATHโ Output directory for generated tasks (default:tasks)--state-dir PATHโ State directory for cache/logs (default:.swegen)--cc-timeout Nโ Claude Code session timeout in seconds (default: 3200)--env, -e TYPEโ Environment type:docker,daytona,e2b,modal,runloop,gke(default:docker)--no-validateโ Skip Harbor validations--forceโ Bypass local dedupe and regenerate--no-cacheโ Disable cached artifacts from previous tasks--no-require-minimum-difficultyโ Skip 3+ file and LLM substantiality checks--min-source-files Nโ Minimum number of source files required (default: 3, tests excluded)--max-source-files Nโ Maximum number of source files to avoid large refactors (default: 10, tests excluded)--no-require-issueโ Allow PRs without linked issues (uses PR body/title for instructions)-v, --verbose/-q, --quiet
Continuous PR Farming
Stream through entire PR history, process each sequentially with state persistence.
swegen farm fastapi/fastapi
swegen farm fastapi/fastapi --resume-from 2024-01-15
Options
--output PATHโ Output directory for generated tasks (default:tasks)--state-dir PATHโ State directory for cache/logs (default:.swegen)--timeout Nโ Timeout per PR in seconds (default: 300)--cc-timeout Nโ Claude Code session timeout (default: 3200)--task-delay Nโ Delay between tasks in seconds (default: 60)--api-delay Nโ Delay between GitHub API calls in seconds (default: 0.5)--env, -e TYPEโ Environment type:docker,daytona,e2b,modal,runloop,gke(default:docker)--resume-from DATEโ Resume from date or timestamp--resetโ Reset state and start from beginning--dry-runโ Preview without generation--forceโ Regenerate even if task already exists (default: true)--no-validateโ Skip Harbor validation step--require-issue/--no-require-issueโ Require PRs to have linked issues (default: True)--no-require-minimum-difficultyโ Skip 3+ file and LLM checks--min-source-files Nโ Minimum number of source files required (default: 3, tests excluded)--max-source-files Nโ Maximum number of source files to avoid large refactors (default: 10, tests excluded)--no-cacheโ Disable cached artifacts--docker-prune-batch Nโ Run docker cleanup after every N PRs (default: 5, 0 to disable)--skip-list PATHโ Path to file with task IDs to skip (one per line)-v, --verbose
Validate Existing Tasks
Verify that a task passes NOP (baseline fails) and Oracle (solution succeeds) agents:
swegen validate <task_id>
Options
--task, -t IDโ Task ID when path points to dataset root--agent TYPEโboth,nop, ororacle(default:both)--jobs-dir PATHโ Directory to store Harbor job artifacts (default:.swegen/harbor-jobs)--env, -e TYPEโ Environment type:docker,daytona,e2b,modal,runloop,gke(default:docker)--timeout-multiplier Nโ Multiply default timeouts--max-parallel Nโ Max parallel validations (default: 8)--show-passedโ Show passed tasks in batch mode--output, -o PATHโ Write results to file as they complete (batch mode only)--docker-prune-batch Nโ Run docker cleanup after every N tasks (default: 5, 0 to disable)-v, --verbose/-q, --quiet
Analyze Task Quality
Run agent trials to verify a task is well-specified and solvable:
swegen analyze <task_id>
swegen analyze <task_id> -k 5 -a claude-code
What analyze does
- Static quality check (
harbor tasks check) - Baseline validation (nop fails, oracle passes)
- Run N agent trials
- Trial classification (identifies TASK vs AGENT problems)
- Task verdict synthesis with actionable recommendations
Classification categories:
GOOD_SUCCESSโ Agent solved it correctlyBAD_SUCCESSโ Agent cheated or tests too permissiveGOOD_FAILUREโ Agent failed due to its own limitationsBAD_FAILUREโ Agent failed due to task issues (underspecified, brittle tests, etc.)HARNESS_ERRORโ Infrastructure problem
Options
-a, --agent TYPEโ Agent to run trials (default:claude-code)-m, --model MODELโ Model for agent trials (default:anthropic/claude-sonnet-4-5)-k, --n-trials Nโ Number of trials (default: 3)-n, --n-concurrent Nโ Number of concurrent trials (default: 3, 1=sequential)--jobs-dir PATHโ Directory to store job artifacts (default:.swegen/analyze-jobs)--analysis-model MODELโ Model for Claude Code classification (default:claude-sonnet-4-5)--env, -e TYPEโ Environment type:docker,daytona,e2b,modal,runloop,gke(default:docker)--skip-quality-checkโ Skip static quality check--skip-baselineโ Skip baseline validation (nop/oracle)--skip-classifyโ Skip AI-powered classification--save-to-dirโ Write trajectory-analysis.{md,json} to each trial directory (for CI integration)--classification-timeout Nโ Timeout per trial classification in seconds (default: 300)--verdict-timeout Nโ Timeout for verdict synthesis in seconds (default: 180)--timeout-multiplier Nโ Multiply default timeouts-v, --verbose
Task Requirements
Valid PR criteria
Languages: Any (Python, JavaScript, TypeScript, Go, Rust, Ruby, Java, etc.)
Valid PRs must:
- Be merged to primary branch with accessible fork
- Include test changes and corresponding fix
- Have a linked issue for high-quality instructions (bypass with
--no-require-issue) - Modify 3-10 source files (configurable with
--min-source-filesand--max-source-files, bypass with--no-require-minimum-difficulty) - Pass LLM substantiality evaluation (bypass with
--no-require-minimum-difficulty) - Fail tests on reversed baseline, pass after applying fix
- Exclude documentation-only, formatting-only, or version-bump-only changes
How It Works
Pipeline details
The pipeline uses a language-agnostic approach:
- Fetch & Analyze โ Get PR metadata via GitHub API, clone repo, identify test files
- Evaluate โ LLM evaluates PR substantiality and generates task instructions
- Generate Skeleton โ Create Dockerfile and test.sh with TODOs for Claude Code
- Claude Code Completion โ CC analyzes repo, detects language/runtime/build system, fills in skeleton
- Validation โ Run NOP (reward=0) and Oracle (reward=1) agents
- Iteration โ CC iterates until both agents pass
Key Details:
- Dockerfile clones at HEAD, then applies
bug.patchto revert to buggy BASE state - Test files stored in
task/tests/and copied at runtime (prevents agent tampering) fix.patch(solution) excludes tests/CI, contains all other PR changes- Dependencies installed at build time; runtime doesn't require internet access
- Successful tasks are cached as references to speed up future tasks from the same repo
- PR evaluation uses LLM to check substantiality and generate instructions
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swegen-0.1.1.tar.gz.
File metadata
- Download URL: swegen-0.1.1.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65339c0ea851e229765a416aaf08b3fc139b930b40531955e02a957e4332f219
|
|
| MD5 |
828333e16beecf7591c9398d635d1dfe
|
|
| BLAKE2b-256 |
00ab4ea06992d597639fb7aa27ebf0d85fb20d3c882eb3cff689296ee0d5aebf
|
File details
Details for the file swegen-0.1.1-py3-none-any.whl.
File metadata
- Download URL: swegen-0.1.1-py3-none-any.whl
- Upload date:
- Size: 107.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e15abc6acc2c4217e26d52a4b97301b5c494fd1bdbbfb9cfe5deaa1f52b53bc9
|
|
| MD5 |
42f7a2bb16eaae51359cf89cb618c25f
|
|
| BLAKE2b-256 |
c2252db2ac4245a8f9126db0e28ff1fd0d0663542ad32cd7677448b3b7c181c1
|