Pipeline to convert GitHub PRs into Harbor tasks

These details have not been verified by PyPI

Project links

Project description

SWE-gen

Convert merged GitHub PRs into Harbor tasks automatically.

Overview

Automates task creation from real bug fixes in open-source GitHub repos. Works with any programming language: Claude Code analyzes the repo to detect language, build system, and test framework.

Each task reverses a merged PR to recreate the buggy state, verifies tests fail on baseline, and pass after applying the fix. Fully containerized with all dependencies installed at build time.

News

[01/2026] 🔥 SWE-gen-JS released: 1,000 JS/TS task dataset generated with SWE-gen

Quick Start

# Install
uv pip install swegen

# Generate a task from a merged PR
swegen create --repo axios/axios --pr 7150

# Or farm all PRs from a repo
swegen farm fastapi/fastapi

Installation

uv pip install swegen

Ensure these environment variables are set:

export GITHUB_TOKEN=<gh-token>
export OPENAI_API_KEY=<api-key>
export ANTHROPIC_API_KEY=<api-key>  # or CLAUDE_CODE_OAUTH_TOKEN

Note: Cloud sandbox environments (Daytona, E2B, Modal, etc.) require additional API keys.

Usage

Commands:

swegen create — Generate a task from a merged PR
swegen farm — Continuously process PRs from a repository
swegen validate — Validate existing task (NOP + Oracle)
swegen analyze — Deep analysis with agent trials to verify task quality

Generate a Task

swegen create --repo <owner/repo> --pr <num>

Options

--output PATH — Output directory for generated tasks (default: tasks)
--state-dir PATH — State directory for cache/logs (default: .swegen)
--cc-timeout N — Claude Code session timeout in seconds (default: 3200)
--env, -e TYPE — Environment type: docker, daytona, e2b, modal, runloop, gke (default: docker)
--no-validate — Skip Harbor validations
--force — Bypass local dedupe and regenerate
--no-cache — Disable cached artifacts from previous tasks
--no-require-minimum-difficulty — Skip 3+ file and LLM substantiality checks
--min-source-files N — Minimum number of source files required (default: 3, tests excluded)
--max-source-files N — Maximum number of source files to avoid large refactors (default: 10, tests excluded)
--no-require-issue — Allow PRs without linked issues (uses PR body/title for instructions)
-v, --verbose / -q, --quiet

Continuous PR Farming

Stream through entire PR history, process each sequentially with state persistence.

swegen farm fastapi/fastapi
swegen farm fastapi/fastapi --resume-from 2024-01-15

Options

--output PATH — Output directory for generated tasks (default: tasks)
--state-dir PATH — State directory for cache/logs (default: .swegen)
--timeout N — Timeout per PR in seconds (default: 300)
--cc-timeout N — Claude Code session timeout (default: 3200)
--task-delay N — Delay between tasks in seconds (default: 60)
--api-delay N — Delay between GitHub API calls in seconds (default: 0.5)
--env, -e TYPE — Environment type: docker, daytona, e2b, modal, runloop, gke (default: docker)
--resume-from DATE — Resume from date or timestamp
--reset — Reset state and start from beginning
--dry-run — Preview without generation
--force — Regenerate even if task already exists (default: true)
--no-validate — Skip Harbor validation step
--require-issue / --no-require-issue — Require PRs to have linked issues (default: True)
--no-require-minimum-difficulty — Skip 3+ file and LLM checks
--min-source-files N — Minimum number of source files required (default: 3, tests excluded)
--max-source-files N — Maximum number of source files to avoid large refactors (default: 10, tests excluded)
--no-cache — Disable cached artifacts
--docker-prune-batch N — Run docker cleanup after every N PRs (default: 5, 0 to disable)
--skip-list PATH — Path to file with task IDs to skip (one per line)
-v, --verbose

Validate Existing Tasks

Verify that a task passes NOP (baseline fails) and Oracle (solution succeeds) agents:

swegen validate <task_id>

Options

--task, -t ID — Task ID when path points to dataset root
--agent TYPE — both, nop, or oracle (default: both)
--jobs-dir PATH — Directory to store Harbor job artifacts (default: .swegen/harbor-jobs)
--env, -e TYPE — Environment type: docker, daytona, e2b, modal, runloop, gke (default: docker)
--timeout-multiplier N — Multiply default timeouts
--max-parallel N — Max parallel validations (default: 8)
--show-passed — Show passed tasks in batch mode
--output, -o PATH — Write results to file as they complete (batch mode only)
--docker-prune-batch N — Run docker cleanup after every N tasks (default: 5, 0 to disable)
-v, --verbose / -q, --quiet

Analyze Task Quality

Run agent trials to verify a task is well-specified and solvable:

swegen analyze <task_id>
swegen analyze <task_id> -k 5 -a claude-code

What analyze does

Static quality check (harbor tasks check)
Baseline validation (nop fails, oracle passes)
Run N agent trials
Trial classification (identifies TASK vs AGENT problems)
Task verdict synthesis with actionable recommendations

Classification categories:

GOOD_SUCCESS — Agent solved it correctly
BAD_SUCCESS — Agent cheated or tests too permissive
GOOD_FAILURE — Agent failed due to its own limitations
BAD_FAILURE — Agent failed due to task issues (underspecified, brittle tests, etc.)
HARNESS_ERROR — Infrastructure problem

Options

-a, --agent TYPE — Agent to run trials (default: claude-code)
-m, --model MODEL — Model for agent trials (default: anthropic/claude-sonnet-4-5)
-k, --n-trials N — Number of trials (default: 3)
-n, --n-concurrent N — Number of concurrent trials (default: 3, 1=sequential)
--jobs-dir PATH — Directory to store job artifacts (default: .swegen/analyze-jobs)
--analysis-model MODEL — Model for Claude Code classification (default: claude-sonnet-4-5)
--env, -e TYPE — Environment type: docker, daytona, e2b, modal, runloop, gke (default: docker)
--skip-quality-check — Skip static quality check
--skip-baseline — Skip baseline validation (nop/oracle)
--skip-classify — Skip AI-powered classification
--save-to-dir — Write trajectory-analysis.{md,json} to each trial directory (for CI integration)
--classification-timeout N — Timeout per trial classification in seconds (default: 300)
--verdict-timeout N — Timeout for verdict synthesis in seconds (default: 180)
--timeout-multiplier N — Multiply default timeouts
-v, --verbose

Task Requirements

Valid PR criteria

Languages: Any (Python, JavaScript, TypeScript, Go, Rust, Ruby, Java, etc.)

Valid PRs must:

Be merged to primary branch with accessible fork
Include test changes and corresponding fix
Have a linked issue for high-quality instructions (bypass with --no-require-issue)
Modify 3-10 source files (configurable with --min-source-files and --max-source-files, bypass with --no-require-minimum-difficulty)
Pass LLM substantiality evaluation (bypass with --no-require-minimum-difficulty)
Fail tests on reversed baseline, pass after applying fix
Exclude documentation-only, formatting-only, or version-bump-only changes

How It Works

Pipeline details

The pipeline uses a language-agnostic approach:

Fetch & Analyze — Get PR metadata via GitHub API, clone repo, identify test files
Evaluate — LLM evaluates PR substantiality and generates task instructions
Generate Skeleton — Create Dockerfile and test.sh with TODOs for Claude Code
Claude Code Completion — CC analyzes repo, detects language/runtime/build system, fills in skeleton
Validation — Run NOP (reward=0) and Oracle (reward=1) agents
Iteration — CC iterates until both agents pass

Key Details:

Dockerfile clones at HEAD, then applies bug.patch to revert to buggy BASE state
Test files stored in task/tests/ and copied at runtime (prevents agent tampering)
fix.patch (solution) excludes tests/CI, contains all other PR changes
Dependencies installed at build time; runtime doesn't require internet access
Successful tasks are cached as references to speed up future tasks from the same repo
PR evaluation uses LLM to check substantiality and generate instructions

License

Apache License 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Feb 2, 2026

0.1.0

Jan 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swegen-0.1.1.tar.gz (1.2 MB view details)

Uploaded Feb 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

swegen-0.1.1-py3-none-any.whl (107.7 kB view details)

Uploaded Feb 2, 2026 Python 3

File details

Details for the file swegen-0.1.1.tar.gz.

File metadata

Download URL: swegen-0.1.1.tar.gz
Upload date: Feb 2, 2026
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for swegen-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`65339c0ea851e229765a416aaf08b3fc139b930b40531955e02a957e4332f219`
MD5	`828333e16beecf7591c9398d635d1dfe`
BLAKE2b-256	`00ab4ea06992d597639fb7aa27ebf0d85fb20d3c882eb3cff689296ee0d5aebf`

See more details on using hashes here.

File details

Details for the file swegen-0.1.1-py3-none-any.whl.

File metadata

Download URL: swegen-0.1.1-py3-none-any.whl
Upload date: Feb 2, 2026
Size: 107.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for swegen-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e15abc6acc2c4217e26d52a4b97301b5c494fd1bdbbfb9cfe5deaa1f52b53bc9`
MD5	`42f7a2bb16eaae51359cf89cb618c25f`
BLAKE2b-256	`c2252db2ac4245a8f9126db0e28ff1fd0d0663542ad32cd7677448b3b7c181c1`

See more details on using hashes here.

swegen 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SWE-gen

Overview

News

Quick Start

Installation

Usage

Generate a Task

Continuous PR Farming

Validate Existing Tasks

Analyze Task Quality

Task Requirements

How It Works

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes