Skip to main content

Pipeline to convert GitHub PRs into Harbor tasks

Project description

SWE-gen llama genie

Python License PyPI

SWE-gen

Convert merged GitHub PRs into Harbor tasks automatically.

Overview

Automates task creation from real bug fixes in open-source GitHub repos. Works with any programming language: Claude Code analyzes the repo to detect language, build system, and test framework.

Each task reverses a merged PR to recreate the buggy state, verifies tests fail on baseline, and pass after applying the fix. Fully containerized with all dependencies installed at build time.

News

  • [01/2026] ๐Ÿ”ฅ SWE-gen-JS released: 1,000 JS/TS task dataset generated with SWE-gen

Quick Start

# Install
uv pip install swegen

# Generate a task from a merged PR
swegen create --repo axios/axios --pr 7150

# Or farm all PRs from a repo
swegen farm fastapi/fastapi

Installation

uv pip install swegen

Ensure these environment variables are set:

export GITHUB_TOKEN=<gh-token>
export OPENAI_API_KEY=<api-key>
export ANTHROPIC_API_KEY=<api-key>  # or CLAUDE_CODE_OAUTH_TOKEN

Note: Cloud sandbox environments (Daytona, E2B, Modal, etc.) require additional API keys.

Usage

Commands:

  • swegen create โ€” Generate a task from a merged PR
  • swegen farm โ€” Continuously process PRs from a repository
  • swegen validate โ€” Validate existing task (NOP + Oracle)
  • swegen analyze โ€” Deep analysis with agent trials to verify task quality

Generate a Task

swegen create --repo <owner/repo> --pr <num>
Options
  • --output PATH โ€” Output directory for generated tasks (default: tasks)
  • --state-dir PATH โ€” State directory for cache/logs (default: .swegen)
  • --cc-timeout N โ€” Claude Code session timeout in seconds (default: 3200)
  • --env, -e TYPE โ€” Environment type: docker, daytona, e2b, modal, runloop, gke (default: docker)
  • --no-validate โ€” Skip Harbor validations
  • --force โ€” Bypass local dedupe and regenerate
  • --no-cache โ€” Disable cached artifacts from previous tasks
  • --no-require-minimum-difficulty โ€” Skip 3+ file and LLM substantiality checks
  • --min-source-files N โ€” Minimum number of source files required (default: 3, tests excluded)
  • --max-source-files N โ€” Maximum number of source files to avoid large refactors (default: 10, tests excluded)
  • --no-require-issue โ€” Allow PRs without linked issues (uses PR body/title for instructions)
  • -v, --verbose / -q, --quiet

Continuous PR Farming

Stream through entire PR history, process each sequentially with state persistence.

swegen farm fastapi/fastapi
swegen farm fastapi/fastapi --resume-from 2024-01-15
Options
  • --output PATH โ€” Output directory for generated tasks (default: tasks)
  • --state-dir PATH โ€” State directory for cache/logs (default: .swegen)
  • --timeout N โ€” Timeout per PR in seconds (default: 300)
  • --cc-timeout N โ€” Claude Code session timeout (default: 3200)
  • --task-delay N โ€” Delay between tasks in seconds (default: 60)
  • --api-delay N โ€” Delay between GitHub API calls in seconds (default: 0.5)
  • --env, -e TYPE โ€” Environment type: docker, daytona, e2b, modal, runloop, gke (default: docker)
  • --resume-from DATE โ€” Resume from date or timestamp
  • --reset โ€” Reset state and start from beginning
  • --dry-run โ€” Preview without generation
  • --force โ€” Regenerate even if task already exists (default: true)
  • --no-validate โ€” Skip Harbor validation step
  • --require-issue / --no-require-issue โ€” Require PRs to have linked issues (default: True)
  • --no-require-minimum-difficulty โ€” Skip 3+ file and LLM checks
  • --min-source-files N โ€” Minimum number of source files required (default: 3, tests excluded)
  • --max-source-files N โ€” Maximum number of source files to avoid large refactors (default: 10, tests excluded)
  • --no-cache โ€” Disable cached artifacts
  • --docker-prune-batch N โ€” Run docker cleanup after every N PRs (default: 5, 0 to disable)
  • --skip-list PATH โ€” Path to file with task IDs to skip (one per line)
  • -v, --verbose

Validate Existing Tasks

Verify that a task passes NOP (baseline fails) and Oracle (solution succeeds) agents:

swegen validate <task_id>
Options
  • --task, -t ID โ€” Task ID when path points to dataset root
  • --agent TYPE โ€” both, nop, or oracle (default: both)
  • --jobs-dir PATH โ€” Directory to store Harbor job artifacts (default: .swegen/harbor-jobs)
  • --env, -e TYPE โ€” Environment type: docker, daytona, e2b, modal, runloop, gke (default: docker)
  • --timeout-multiplier N โ€” Multiply default timeouts
  • --max-parallel N โ€” Max parallel validations (default: 8)
  • --show-passed โ€” Show passed tasks in batch mode
  • --output, -o PATH โ€” Write results to file as they complete (batch mode only)
  • --docker-prune-batch N โ€” Run docker cleanup after every N tasks (default: 5, 0 to disable)
  • -v, --verbose / -q, --quiet

Analyze Task Quality

Run agent trials to verify a task is well-specified and solvable:

swegen analyze <task_id>
swegen analyze <task_id> -k 5 -a claude-code
What analyze does
  1. Static quality check (harbor tasks check)
  2. Baseline validation (nop fails, oracle passes)
  3. Run N agent trials
  4. Trial classification (identifies TASK vs AGENT problems)
  5. Task verdict synthesis with actionable recommendations

Classification categories:

  • GOOD_SUCCESS โ€” Agent solved it correctly
  • BAD_SUCCESS โ€” Agent cheated or tests too permissive
  • GOOD_FAILURE โ€” Agent failed due to its own limitations
  • BAD_FAILURE โ€” Agent failed due to task issues (underspecified, brittle tests, etc.)
  • HARNESS_ERROR โ€” Infrastructure problem
Options
  • -a, --agent TYPE โ€” Agent to run trials (default: claude-code)
  • -m, --model MODEL โ€” Model for agent trials (default: anthropic/claude-sonnet-4-5)
  • -k, --n-trials N โ€” Number of trials (default: 3)
  • -n, --n-concurrent N โ€” Number of concurrent trials (default: 3, 1=sequential)
  • --jobs-dir PATH โ€” Directory to store job artifacts (default: .swegen/analyze-jobs)
  • --analysis-model MODEL โ€” Model for Claude Code classification (default: claude-sonnet-4-5)
  • --env, -e TYPE โ€” Environment type: docker, daytona, e2b, modal, runloop, gke (default: docker)
  • --skip-quality-check โ€” Skip static quality check
  • --skip-baseline โ€” Skip baseline validation (nop/oracle)
  • --skip-classify โ€” Skip AI-powered classification
  • --save-to-dir โ€” Write trajectory-analysis.{md,json} to each trial directory (for CI integration)
  • --classification-timeout N โ€” Timeout per trial classification in seconds (default: 300)
  • --verdict-timeout N โ€” Timeout for verdict synthesis in seconds (default: 180)
  • --timeout-multiplier N โ€” Multiply default timeouts
  • -v, --verbose

Task Requirements

Valid PR criteria

Languages: Any (Python, JavaScript, TypeScript, Go, Rust, Ruby, Java, etc.)

Valid PRs must:

  • Be merged to primary branch with accessible fork
  • Include test changes and corresponding fix
  • Have a linked issue for high-quality instructions (bypass with --no-require-issue)
  • Modify 3-10 source files (configurable with --min-source-files and --max-source-files, bypass with --no-require-minimum-difficulty)
  • Pass LLM substantiality evaluation (bypass with --no-require-minimum-difficulty)
  • Fail tests on reversed baseline, pass after applying fix
  • Exclude documentation-only, formatting-only, or version-bump-only changes

How It Works

Pipeline details

The pipeline uses a language-agnostic approach:

  1. Fetch & Analyze โ€” Get PR metadata via GitHub API, clone repo, identify test files
  2. Evaluate โ€” LLM evaluates PR substantiality and generates task instructions
  3. Generate Skeleton โ€” Create Dockerfile and test.sh with TODOs for Claude Code
  4. Claude Code Completion โ€” CC analyzes repo, detects language/runtime/build system, fills in skeleton
  5. Validation โ€” Run NOP (reward=0) and Oracle (reward=1) agents
  6. Iteration โ€” CC iterates until both agents pass

Key Details:

  • Dockerfile clones at HEAD, then applies bug.patch to revert to buggy BASE state
  • Test files stored in task/tests/ and copied at runtime (prevents agent tampering)
  • fix.patch (solution) excludes tests/CI, contains all other PR changes
  • Dependencies installed at build time; runtime doesn't require internet access
  • Successful tasks are cached as references to speed up future tasks from the same repo
  • PR evaluation uses LLM to check substantiality and generate instructions

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swegen-0.1.1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swegen-0.1.1-py3-none-any.whl (107.7 kB view details)

Uploaded Python 3

File details

Details for the file swegen-0.1.1.tar.gz.

File metadata

  • Download URL: swegen-0.1.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for swegen-0.1.1.tar.gz
Algorithm Hash digest
SHA256 65339c0ea851e229765a416aaf08b3fc139b930b40531955e02a957e4332f219
MD5 828333e16beecf7591c9398d635d1dfe
BLAKE2b-256 00ab4ea06992d597639fb7aa27ebf0d85fb20d3c882eb3cff689296ee0d5aebf

See more details on using hashes here.

File details

Details for the file swegen-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: swegen-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 107.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for swegen-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e15abc6acc2c4217e26d52a4b97301b5c494fd1bdbbfb9cfe5deaa1f52b53bc9
MD5 42f7a2bb16eaae51359cf89cb618c25f
BLAKE2b-256 c2252db2ac4245a8f9126db0e28ff1fd0d0663542ad32cd7677448b3b7c181c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page