Skip to main content

Point it at any repo โ€” sow ideas, run experiments, and harvest better code autonomously.

Project description

๐Ÿง‘โ€๐ŸŒพ PaperFarm: Planting GPUs & APIs ๐ŸŒฑ, Harvesting Papers & SOTAs ๐ŸŒพ

PyPI Downloads Python 3.10+ License: MIT GitHub stars

๐Ÿ”ฌ Point it at any repo โ€” sow ideas, run experiments, and harvest better code autonomously

๐ŸŒฑ Sow ideas. ๐Ÿšœ Run experiments. ๐ŸŒพ Harvest evidence. ๐Ÿ“„

Quick Start ยท How It Works ยท Agents ยท TUI Dashboard ยท CLI Reference ยท Examples


๐ŸŒพ Key Features

  • ๐Ÿš€ One run Command: paperfarm run . bootstraps a scout analysis, then enters the research loop โ€” plan, review, experiment, repeat.

  • ๐Ÿค– Multi-Agent Support: Works with Claude Code, Codex CLI, Aider, and Gemini CLI โ€” pick your favorite.

  • ๐Ÿ”ฌ Skill-Based Loop: Scout โ†’ Manager โ†’ Critic โ†’ Experiment โ€” each phase is a markdown "skill" that an agent executes faithfully.

  • ๐Ÿ–ฅ๏ธ Research TUI: Live dashboard with frontier status, metric charts, and structured log viewer. Keyboard controls for pause/resume/skip.

  • ๐Ÿ›ก๏ธ Safety First: Every experiment is a git commit. Failed experiments auto-rollback via rollback.sh. Results logged to results.tsv with FileLock concurrency safety.

  • ๐Ÿ“ก Headless Mode: --headless for CI, scripts, or remote servers โ€” no TUI needed.

  • โšก Parallel Workers: Run experiments across multiple GPUs in isolated git worktrees โ€” workers can't interfere with each other.


๐ŸŒฑ Quick Start

pip install PaperFarm

cd your-project
paperfarm run .

This launches a research session:

  1. ๐ŸŒฑ Scout โ€” survey the field: analyze your codebase, search related work, design evaluation metrics
  2. ๐Ÿšœ Manager โ€” plan the crop: propose hypotheses, design experiments, maintain the frontier backlog
  3. ๐Ÿ” Critic โ€” inspect the plan: review experiment specs before execution, review evidence after
  4. ๐ŸŒพ Experiment โ€” plant, test, harvest: implement one change, evaluate, record to results.tsv
  5. ๐Ÿ”„ Repeat โ€” until all frontier items are done or max_rounds is reached

Headless Mode

paperfarm run . --headless \
  --goal "Reduce val_loss below 0.3" \
  --agent-name codex

Parallel Workers

paperfarm run . --headless --workers 4 --agent-name codex

๐Ÿšœ How It Works

PaperFarm creates a .research/ directory in your repo with everything needed for autonomous research.

๐Ÿ“‚ .research/ Directory Structure
File Purpose
config.yaml Research configuration (metrics, limits, agent settings)
graph.json Hypothesis โ†’ experiment spec โ†’ frontier โ†’ evidence graph
results.tsv Experiment results ledger (timestamp, frontier_id, status, metric, value)
activity.json Live phase/worker status for TUI polling
log.jsonl Append-only structured event log
evaluation.md How to measure the primary metric (written by scout)
project-understanding.md Project analysis (written by scout)
research-strategy.md Research direction and focus areas (written by scout)
literature.md Related work and prior art (written by scout)
scripts/record.py Helper script agents call to append results (FileLock-safe)
scripts/rollback.sh Helper script to revert failed experiments
๐Ÿ”„ The Research Loop
Bootstrap
  โ””โ”€ Scout โ€” analyze codebase, define strategy and evaluation

Research Loop (repeats until done)
  โ”œโ”€ Manager  โ€” propose hypotheses, design experiments, maintain frontier
  โ”œโ”€ Critic   โ€” preflight review: approve or reject experiment specs
  โ”œโ”€ Experiment โ€” claim frontier item, implement change, evaluate, record
  โ””โ”€ Critic   โ€” post-run review: assess evidence, update claims

Each phase is a markdown skill template (skills/*.md) loaded by SkillRunner, variable-substituted with [GOAL] and [TAG], then passed to the agent as a prompt. The agent reads/writes .research/ state files directly.

๐Ÿงฐ Skill Templates
Skill Role What It Does
scout.md Bootstrap Analyze project, search related work, define strategy and evaluation
manager.md Planning Propose hypotheses, design experiment specs, populate frontier
critic.md Review Pre-approve experiments (preflight), post-review evidence (post-run)
experiment.md Execution Claim frontier item, implement, evaluate, record via record.py

Skills reference these .research/ files directly. The experiment agent calls python .research/scripts/record.py --frontier-id F-1 --status keep --value 0.87 to record results, and bash .research/scripts/rollback.sh to revert failed changes.


๐Ÿ›ก๏ธ Field Safety

Feature Description
Isolated git commits Every experiment is a separate commit โ€” nothing is lost
Auto-rollback Failed experiments are reverted via rollback.sh
FileLock results record.py uses FileLock for concurrent-safe writes to results.tsv
Max rounds Stops after N rounds (config.yaml: limits.max_rounds)
Pause / Resume / Skip TUI keyboard controls or activity.json control flags
Parallel isolation Workers run in separate git worktrees โ€” no interference

๐Ÿค– Supported Agents

Agent Flag How It's Invoked
Claude Code --agent-name claude-code claude -p <prompt> --verbose
Codex CLI --agent-name codex codex exec --full-auto <prompt>
Aider --agent-name aider aider --yes-always --no-git --message-file <file>
Gemini CLI --agent-name gemini gemini -p <prompt>

Default is claude-code. All agents receive the same skill prompt and work against the same .research/ state files.


๐Ÿ“Š Interactive TUI Dashboard

Launch with TUI (default, no --headless):

paperfarm run . --agent-name claude-code

PaperFarm overview dashboard

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ PaperFarm โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Phase: experiment | Round: 3 | Hyps: 5 | Exps: 4/7 | Best: 1.92 โ”‚
โ”‚ scout  โ€ฃ  manager  โ€ฃ  critic  โ€ฃ  EXPERIMENT               โ”‚
โ”œโ”€โ”€[Execution]โ”€โ”€[Metrics]โ”€โ”€[Logs]โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                             โ”‚
โ”‚  Frontier Panel              โ”‚  Worker Panel                โ”‚
โ”‚  frontier-001  keep   2.62   โ”‚  (idle)                      โ”‚
โ”‚  frontier-002  keep   2.40   โ”‚                              โ”‚
โ”‚  frontier-003  keep   2.31   โ”‚                              โ”‚
โ”‚  frontier-006  keep   1.92   โ”‚                              โ”‚
โ”‚                              โ”‚                              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ p Pause   r Resume   s Skip   q Quit             ^p paletteโ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
๐Ÿ“‘ 3 Tabs & Keyboard Shortcuts

3 tabs:

  • Execution โ€” Frontier items with status/priority, worker activity panel
  • Metrics โ€” Experiment results chart over time
  • Logs โ€” Structured event log from log.jsonl

Keyboard shortcuts: p pause, r resume, s skip current experiment, q quit.

Polls .research/ state files every second โ€” attach to a running session anytime to monitor progress.


๐Ÿšœ Installation

Python 3.10+ required. Supports Linux, macOS, and Windows.

pip install (recommended)

pip install PaperFarm

cd your-project
paperfarm run .

From source (for development)

git clone https://github.com/shatianming5/PaperFarm.git
cd PaperFarm
pip install -e ".[dev]"
pytest

๐Ÿ–ฅ๏ธ CLI Reference

paperfarm run REPO [OPTIONS]    Launch or resume a research session
paperfarm status REPO           Show current research state
paperfarm results REPO          Display experiment results table

run Options

Option Default Description
--goal TEXT "" Research goal (injected into skill templates as [GOAL])
--tag TEXT auto Session tag (injected as [TAG])
--workers N 0 Parallel workers (0 = serial)
--headless off Run without TUI
--agent-name TEXT claude-code Which agent CLI to use

โš™๏ธ Configuration

The scout agent fills .research/config.yaml during bootstrap. You can also edit it manually:

protocol: research-v1

metrics:
  primary:
    name: val_loss           # or test_accuracy, ops_per_sec, etc.
    direction: minimize      # minimize | maximize

limits:
  max_rounds: 20             # max research loop iterations
  timeout_minutes: 0         # 0 = no timeout

workers:
  max: 0                     # 0 = serial
  gpu_mem_per_worker_mb: 8192

agent:
  name: claude-code
  config: {}                 # passed to agent adapter

๐Ÿก Project Structure

src/paperfarm/
โ”œโ”€โ”€ cli.py              # Typer CLI (run / status / results)
โ”œโ”€โ”€ agent.py            # Agent adapters (ClaudeCode, Codex, Aider, Gemini)
โ”œโ”€โ”€ skill_runner.py     # Loads skills, substitutes [GOAL]/[TAG], drives the loop
โ”œโ”€โ”€ state.py            # .research/ state file access layer
โ”œโ”€โ”€ parallel.py         # WorkerPool for multi-GPU parallel experiments
โ”œโ”€โ”€ skills/
โ”‚   โ”œโ”€โ”€ protocol.yaml   # Bootstrap + loop step order
โ”‚   โ”œโ”€โ”€ scout.md        # ๐ŸŒฑ Scout skill template
โ”‚   โ”œโ”€โ”€ manager.md      # ๐Ÿšœ Manager skill template
โ”‚   โ”œโ”€โ”€ critic.md       # ๐Ÿ” Critic skill template
โ”‚   โ”œโ”€โ”€ experiment.md   # ๐ŸŒพ Experiment skill template
โ”‚   โ””โ”€โ”€ scripts/
โ”‚       โ”œโ”€โ”€ record.py   # CLI tool for recording results (FileLock-safe)
โ”‚       โ””โ”€โ”€ rollback.sh # Revert failed experiments
โ””โ”€โ”€ tui/
    โ”œโ”€โ”€ app.py          # Textual TUI app (polling-based)
    โ”œโ”€โ”€ widgets.py      # StatsBar, PhaseStrip, FrontierPanel, etc.
    โ””โ”€โ”€ styles.css      # TUI styling

๐ŸŒฝ Examples

See examples/ for ready-to-run setups:

Example Task Metric Result
๐ŸŽฎ CartPole RL Maximize DQN reward on CartPole-v1 avg_reward 266.7
โšก Code Perf Optimize JSON parser throughput ops/sec 45K โ†’ 545K
๐Ÿง  nanoGPT Reduce Shakespeare char-level val_loss val_loss 2.62 โ†’ 1.92 (-27%)
๐Ÿ–ผ๏ธ CIFAR-10 Maximize CIFAR-10 test accuracy test_accuracy 67.7% (WIP)
๐Ÿ“ฆ YOLO Tiny Maximize YOLOv8 mAP50 on COCO8 mAP50 0.875
๐Ÿ“ HF GLUE Optimize SST-2 fine-tuning eval_accuracy (needs GPU)
๐ŸŽ™๏ธ Whisper Reduce Whisper word error rate WER (needs GPU)
๐Ÿ”ฅ Liger-Kernel Optimize Triton GPU kernels throughput (needs GPU)

Running an Example

cd examples/cartpole
paperfarm run . --agent-name codex --headless \
  --goal "Maximize CartPole-v1 average reward to 500"

๐Ÿง‘โ€๐ŸŒพ Contributing

Contributions are welcome! Please:

  1. Open an issue to discuss the proposed change
  2. Fork the repository and create your feature branch
  3. Submit a pull request with a clear description

๐Ÿ“„ License

This project is licensed under the MIT License.


Star History

Star History Chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paperfarm-0.2.0b1.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paperfarm-0.2.0b1-py3-none-any.whl (40.7 kB view details)

Uploaded Python 3

File details

Details for the file paperfarm-0.2.0b1.tar.gz.

File metadata

  • Download URL: paperfarm-0.2.0b1.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for paperfarm-0.2.0b1.tar.gz
Algorithm Hash digest
SHA256 95626efee8d7490f585b83b732428e7ea70024c6a40617917e9bd7558f08ba68
MD5 e731e5a8ee551a4b92eec45af4313a91
BLAKE2b-256 711fdc5cb9335fc762e5b611cf139187ee6d25a30edb216b13f5a6cd8eea76c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for paperfarm-0.2.0b1.tar.gz:

Publisher: publish.yml on shatianming5/PaperFarm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file paperfarm-0.2.0b1-py3-none-any.whl.

File metadata

  • Download URL: paperfarm-0.2.0b1-py3-none-any.whl
  • Upload date:
  • Size: 40.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for paperfarm-0.2.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 fc9cbdb7a11f552b03d69bcf0a2d1b2cff2292f4871fbb973d07f973c3bbae2d
MD5 7849cae5b629a54aeee55c9142816a56
BLAKE2b-256 2ff1fcfafbcfcb40d523a8f5351ae41a9cab633526bd918e2023867ff3de1dca

See more details on using hashes here.

Provenance

The following attestation bundles were made for paperfarm-0.2.0b1-py3-none-any.whl:

Publisher: publish.yml on shatianming5/PaperFarm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page