Point it at any repo โ sow ideas, run experiments, and harvest better code autonomously.
Project description
๐งโ๐พ PaperFarm: Planting GPUs & APIs ๐ฑ, Harvesting Papers & SOTAs ๐พ
๐ฌ Point it at any repo โ sow ideas, run experiments, and harvest better code autonomously
๐ฑ Sow ideas. ๐ Run experiments. ๐พ Harvest evidence. ๐
Quick Start ยท How It Works ยท Agents ยท TUI Dashboard ยท CLI Reference ยท Examples
๐พ Key Features
-
๐ One
runCommand:paperfarm run .bootstraps a scout analysis, then enters the research loop โ plan, review, experiment, repeat. -
๐ค Multi-Agent Support: Works with Claude Code, Codex CLI, Aider, and Gemini CLI โ pick your favorite.
-
๐ฌ Skill-Based Loop: Scout โ Manager โ Critic โ Experiment โ each phase is a markdown "skill" that an agent executes faithfully.
-
๐ฅ๏ธ Research TUI: Live dashboard with frontier status, metric charts, and structured log viewer. Keyboard controls for pause/resume/skip.
-
๐ก๏ธ Safety First: Every experiment is a git commit. Failed experiments auto-rollback via
rollback.sh. Results logged toresults.tsvwith FileLock concurrency safety. -
๐ก Headless Mode:
--headlessfor CI, scripts, or remote servers โ no TUI needed. -
โก Parallel Workers: Run experiments across multiple GPUs in isolated git worktrees โ workers can't interfere with each other.
๐ฑ Quick Start
pip install PaperFarm
cd your-project
paperfarm run .
This launches a research session:
- ๐ฑ Scout โ survey the field: analyze your codebase, search related work, design evaluation metrics
- ๐ Manager โ plan the crop: propose hypotheses, design experiments, maintain the frontier backlog
- ๐ Critic โ inspect the plan: review experiment specs before execution, review evidence after
- ๐พ Experiment โ plant, test, harvest: implement one change, evaluate, record to
results.tsv - ๐ Repeat โ until all frontier items are done or
max_roundsis reached
Headless Mode
paperfarm run . --headless \
--goal "Reduce val_loss below 0.3" \
--agent-name codex
Parallel Workers
paperfarm run . --headless --workers 4 --agent-name codex
๐ How It Works
PaperFarm creates a .research/ directory in your repo with everything needed for autonomous research.
๐ .research/ Directory Structure
| File | Purpose |
|---|---|
config.yaml |
Research configuration (metrics, limits, agent settings) |
graph.json |
Hypothesis โ experiment spec โ frontier โ evidence graph |
results.tsv |
Experiment results ledger (timestamp, frontier_id, status, metric, value) |
activity.json |
Live phase/worker status for TUI polling |
log.jsonl |
Append-only structured event log |
evaluation.md |
How to measure the primary metric (written by scout) |
project-understanding.md |
Project analysis (written by scout) |
research-strategy.md |
Research direction and focus areas (written by scout) |
literature.md |
Related work and prior art (written by scout) |
scripts/record.py |
Helper script agents call to append results (FileLock-safe) |
scripts/rollback.sh |
Helper script to revert failed experiments |
๐ The Research Loop
Bootstrap
โโ Scout โ analyze codebase, define strategy and evaluation
Research Loop (repeats until done)
โโ Manager โ propose hypotheses, design experiments, maintain frontier
โโ Critic โ preflight review: approve or reject experiment specs
โโ Experiment โ claim frontier item, implement change, evaluate, record
โโ Critic โ post-run review: assess evidence, update claims
Each phase is a markdown skill template (skills/*.md) loaded by SkillRunner, variable-substituted with [GOAL] and [TAG], then passed to the agent as a prompt. The agent reads/writes .research/ state files directly.
๐งฐ Skill Templates
| Skill | Role | What It Does |
|---|---|---|
scout.md |
Bootstrap | Analyze project, search related work, define strategy and evaluation |
manager.md |
Planning | Propose hypotheses, design experiment specs, populate frontier |
critic.md |
Review | Pre-approve experiments (preflight), post-review evidence (post-run) |
experiment.md |
Execution | Claim frontier item, implement, evaluate, record via record.py |
Skills reference these .research/ files directly. The experiment agent calls python .research/scripts/record.py --frontier-id F-1 --status keep --value 0.87 to record results, and bash .research/scripts/rollback.sh to revert failed changes.
๐ก๏ธ Field Safety
| Feature | Description |
|---|---|
| Isolated git commits | Every experiment is a separate commit โ nothing is lost |
| Auto-rollback | Failed experiments are reverted via rollback.sh |
| FileLock results | record.py uses FileLock for concurrent-safe writes to results.tsv |
| Max rounds | Stops after N rounds (config.yaml: limits.max_rounds) |
| Pause / Resume / Skip | TUI keyboard controls or activity.json control flags |
| Parallel isolation | Workers run in separate git worktrees โ no interference |
๐ค Supported Agents
| Agent | Flag | How It's Invoked |
|---|---|---|
| Claude Code | --agent-name claude-code |
claude -p <prompt> --verbose |
| Codex CLI | --agent-name codex |
codex exec --full-auto <prompt> |
| Aider | --agent-name aider |
aider --yes-always --no-git --message-file <file> |
| Gemini CLI | --agent-name gemini |
gemini -p <prompt> |
Default is claude-code. All agents receive the same skill prompt and work against the same .research/ state files.
๐ Interactive TUI Dashboard
Launch with TUI (default, no --headless):
paperfarm run . --agent-name claude-code
โโโโโโโโโโโโโโโโโโโโโโโโโ PaperFarm โโโโโโโโโโโโโโโโโโโโโโโโโ
โ Phase: experiment | Round: 3 | Hyps: 5 | Exps: 4/7 | Best: 1.92 โ
โ scout โฃ manager โฃ critic โฃ EXPERIMENT โ
โโโ[Execution]โโ[Metrics]โโ[Logs]โโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Frontier Panel โ Worker Panel โ
โ frontier-001 keep 2.62 โ (idle) โ
โ frontier-002 keep 2.40 โ โ
โ frontier-003 keep 2.31 โ โ
โ frontier-006 keep 1.92 โ โ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ p Pause r Resume s Skip q Quit ^p paletteโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ 3 Tabs & Keyboard Shortcuts
3 tabs:
- Execution โ Frontier items with status/priority, worker activity panel
- Metrics โ Experiment results chart over time
- Logs โ Structured event log from
log.jsonl
Keyboard shortcuts: p pause, r resume, s skip current experiment, q quit.
Polls .research/ state files every second โ attach to a running session anytime to monitor progress.
๐ Installation
Python 3.10+ required. Supports Linux, macOS, and Windows.
pip install (recommended)
pip install PaperFarm
cd your-project
paperfarm run .
From source (for development)
git clone https://github.com/shatianming5/PaperFarm.git
cd PaperFarm
pip install -e ".[dev]"
pytest
๐ฅ๏ธ CLI Reference
paperfarm run REPO [OPTIONS] Launch or resume a research session
paperfarm status REPO Show current research state
paperfarm results REPO Display experiment results table
run Options
| Option | Default | Description |
|---|---|---|
--goal TEXT |
"" |
Research goal (injected into skill templates as [GOAL]) |
--tag TEXT |
auto | Session tag (injected as [TAG]) |
--workers N |
0 |
Parallel workers (0 = serial) |
--headless |
off | Run without TUI |
--agent-name TEXT |
claude-code |
Which agent CLI to use |
โ๏ธ Configuration
The scout agent fills .research/config.yaml during bootstrap. You can also edit it manually:
protocol: research-v1
metrics:
primary:
name: val_loss # or test_accuracy, ops_per_sec, etc.
direction: minimize # minimize | maximize
limits:
max_rounds: 20 # max research loop iterations
timeout_minutes: 0 # 0 = no timeout
workers:
max: 0 # 0 = serial
gpu_mem_per_worker_mb: 8192
agent:
name: claude-code
config: {} # passed to agent adapter
๐ก Project Structure
src/paperfarm/
โโโ cli.py # Typer CLI (run / status / results)
โโโ agent.py # Agent adapters (ClaudeCode, Codex, Aider, Gemini)
โโโ skill_runner.py # Loads skills, substitutes [GOAL]/[TAG], drives the loop
โโโ state.py # .research/ state file access layer
โโโ parallel.py # WorkerPool for multi-GPU parallel experiments
โโโ skills/
โ โโโ protocol.yaml # Bootstrap + loop step order
โ โโโ scout.md # ๐ฑ Scout skill template
โ โโโ manager.md # ๐ Manager skill template
โ โโโ critic.md # ๐ Critic skill template
โ โโโ experiment.md # ๐พ Experiment skill template
โ โโโ scripts/
โ โโโ record.py # CLI tool for recording results (FileLock-safe)
โ โโโ rollback.sh # Revert failed experiments
โโโ tui/
โโโ app.py # Textual TUI app (polling-based)
โโโ widgets.py # StatsBar, PhaseStrip, FrontierPanel, etc.
โโโ styles.css # TUI styling
๐ฝ Examples
See examples/ for ready-to-run setups:
| Example | Task | Metric | Result |
|---|---|---|---|
| ๐ฎ CartPole RL | Maximize DQN reward on CartPole-v1 | avg_reward | 266.7 |
| โก Code Perf | Optimize JSON parser throughput | ops/sec | 45K โ 545K |
| ๐ง nanoGPT | Reduce Shakespeare char-level val_loss | val_loss | 2.62 โ 1.92 (-27%) |
| ๐ผ๏ธ CIFAR-10 | Maximize CIFAR-10 test accuracy | test_accuracy | 67.7% (WIP) |
| ๐ฆ YOLO Tiny | Maximize YOLOv8 mAP50 on COCO8 | mAP50 | 0.875 |
| ๐ HF GLUE | Optimize SST-2 fine-tuning | eval_accuracy | (needs GPU) |
| ๐๏ธ Whisper | Reduce Whisper word error rate | WER | (needs GPU) |
| ๐ฅ Liger-Kernel | Optimize Triton GPU kernels | throughput | (needs GPU) |
Running an Example
cd examples/cartpole
paperfarm run . --agent-name codex --headless \
--goal "Maximize CartPole-v1 average reward to 500"
๐งโ๐พ Contributing
Contributions are welcome! Please:
- Open an issue to discuss the proposed change
- Fork the repository and create your feature branch
- Submit a pull request with a clear description
๐ License
This project is licensed under the MIT License.
Star History
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paperfarm-0.2.0b1.tar.gz.
File metadata
- Download URL: paperfarm-0.2.0b1.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95626efee8d7490f585b83b732428e7ea70024c6a40617917e9bd7558f08ba68
|
|
| MD5 |
e731e5a8ee551a4b92eec45af4313a91
|
|
| BLAKE2b-256 |
711fdc5cb9335fc762e5b611cf139187ee6d25a30edb216b13f5a6cd8eea76c5
|
Provenance
The following attestation bundles were made for paperfarm-0.2.0b1.tar.gz:
Publisher:
publish.yml on shatianming5/PaperFarm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
paperfarm-0.2.0b1.tar.gz -
Subject digest:
95626efee8d7490f585b83b732428e7ea70024c6a40617917e9bd7558f08ba68 - Sigstore transparency entry: 1122699476
- Sigstore integration time:
-
Permalink:
shatianming5/PaperFarm@1912c43ac3f97ad73d0deed867cb8bed71d62fee -
Branch / Tag:
refs/tags/v0.2.0b1 - Owner: https://github.com/shatianming5
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1912c43ac3f97ad73d0deed867cb8bed71d62fee -
Trigger Event:
push
-
Statement type:
File details
Details for the file paperfarm-0.2.0b1-py3-none-any.whl.
File metadata
- Download URL: paperfarm-0.2.0b1-py3-none-any.whl
- Upload date:
- Size: 40.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc9cbdb7a11f552b03d69bcf0a2d1b2cff2292f4871fbb973d07f973c3bbae2d
|
|
| MD5 |
7849cae5b629a54aeee55c9142816a56
|
|
| BLAKE2b-256 |
2ff1fcfafbcfcb40d523a8f5351ae41a9cab633526bd918e2023867ff3de1dca
|
Provenance
The following attestation bundles were made for paperfarm-0.2.0b1-py3-none-any.whl:
Publisher:
publish.yml on shatianming5/PaperFarm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
paperfarm-0.2.0b1-py3-none-any.whl -
Subject digest:
fc9cbdb7a11f552b03d69bcf0a2d1b2cff2292f4871fbb973d07f973c3bbae2d - Sigstore transparency entry: 1122699483
- Sigstore integration time:
-
Permalink:
shatianming5/PaperFarm@1912c43ac3f97ad73d0deed867cb8bed71d62fee -
Branch / Tag:
refs/tags/v0.2.0b1 - Owner: https://github.com/shatianming5
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1912c43ac3f97ad73d0deed867cb8bed71d62fee -
Trigger Event:
push
-
Statement type: