Automated research sprint platform for HPC clusters

These details have not been verified by PyPI

Project description

ResearchLoop

Automated AI research sprints on HPC clusters.

ResearchLoop automates multi-step AI research pipelines on SLURM and SGE clusters. You describe a research idea, and ResearchLoop submits it to your HPC cluster where Claude Code executes a full research pipeline -- coding, red-teaming, fixing, reporting -- inside a single job. Results are reported back via webhooks, Slack, or push notifications, and you can monitor everything from a web dashboard or the CLI.

The platform is built for researchers who run experiments on shared HPC infrastructure and want to iterate faster without babysitting jobs. Define your studies, point ResearchLoop at your cluster, and let it handle the rest: job submission, progress tracking, artifact collection, and even automatic generation of follow-up research ideas.

ResearchLoop's auto-loop feature chains sprints together automatically. After each sprint completes, Claude analyzes the results and proposes the next experiment. You set how many iterations to run, and the system handles the rest -- turning a single research question into a sustained investigation.

How it works

ResearchLoop has two components:

Orchestrator (researchloop serve) -- a lightweight server that manages studies and sprints in SQLite, submits jobs to HPC clusters via SSH, receives completion webhooks, stores artifacts, and serves the web dashboard.
Sprint Runner -- runs inside each SLURM/SGE job on the HPC cluster. Chains claude -p calls through the research pipeline (research, red-team, fix, report, summarize), then sends artifacts and results back to the orchestrator.

You (CLI / Dashboard / Slack)
        |
        v
Orchestrator (Docker / Fly.io)          HPC Cluster
+--------------------------+             +----------------------------+
| FastAPI API + Dashboard  |---SSH------>| SLURM / SGE scheduler      |
| SQLite metadata          |             |                            |
| Artifact storage         |<--webhook--| Sprint Runner               |
| Slack bot                |<--upload---| 1. claude -p "research"     |
| ntfy.sh notifications    |             | 2. claude -p "red-team"    |
+--------------------------+             | 3. claude -p "fix"          |
                                         | 4. claude -p "report"       |
                                         | 5. claude -p "summarize"    |
                                         +----------------------------+

Core concepts

Concept	Description
Study	A sustained research effort (e.g., "synthetic SAE improvements"). Tied to a cluster, has its own context and configuration.
Sprint	A single research attempt within a study. Gets a short ID (`sp-a3f7b2`), its own directory, and runs the full pipeline.
Auto-loop	Automatic sequential sprint execution. After each sprint, Claude analyzes results and generates the next research idea.

Sprint pipeline

Each sprint runs these steps inside a single SLURM/SGE job:

Research -- execute the research idea (coding, experiments, analysis)
Red-team -- critique the work, find flaws (up to N rounds with fix steps)
Fix -- address issues found by the red-team
Report -- generate a comprehensive markdown report
Summarize -- write a short summary for notifications and the dashboard

All steps share a single Claude session (via --resume), so Claude maintains full context of the sprint's work across steps.

Features

HPC cluster integration -- submit, monitor, and cancel jobs on SLURM and SGE clusters via SSH
Multi-step research pipeline -- research, red-team, fix, report, summarize with configurable rounds
Auto-loop -- chain sprints automatically with AI-generated follow-up ideas
Web dashboard -- monitor studies, sprints, and loops from a browser with live status refresh
Slack bot -- start sprints, check status, and have research conversations via Slack DMs or channels
CLI -- full remote management from the command line with token-based auth
Progress tracking -- live progress.md and output.log streaming from cluster to dashboard
Notifications -- push notifications via ntfy.sh and Slack with PDF report attachments
Per-sprint security -- webhook tokens, CSRF protection, signed session cookies, bcrypt password hashing
Context hierarchy -- global, cluster, and study-level context files and inline configuration

Quick start

Prerequisites

Python 3.10+
uv (recommended) or pip
SSH access to an HPC cluster with SLURM or SGE
Claude Code CLI installed and authenticated on the HPC cluster

Install

pip install git+https://github.com/chanind/researchloop.git

Or for development:

git clone https://github.com/chanind/researchloop.git
cd researchloop
uv sync

Initialize a project

researchloop init
# Creates researchloop.toml and artifacts/ directory

Configure

Edit researchloop.toml:

shared_secret = "change-me"
orchestrator_url = "https://your-server.fly.dev"

[[cluster]]
name = "hpc"
host = "login.cluster.example.com"
user = "researcher"
key_path = "~/.ssh/id_ed25519"
scheduler_type = "slurm"                       # "slurm", "sge", or "local"
working_dir = "/scratch/researcher/researchloop"

[cluster.job_options]
gres = "gpu:1"
mem = "64G"
cpus-per-task = "8"

[[study]]
name = "my-research"
cluster = "hpc"
description = "Investigating feature X"
max_sprint_duration_hours = 8
red_team_max_rounds = 3

Start the server and run a sprint

# Start the orchestrator
researchloop serve

# In another terminal, connect the CLI to the server
researchloop connect https://localhost:8080

# Submit a sprint
researchloop sprint run "try approach X on dataset Y" --study my-research

# Check status
researchloop sprint list
researchloop sprint show sp-a3f7b2

Configuration reference

Complete `researchloop.toml` example

# -- Top-level settings --
db_path = "researchloop.db"              # SQLite database location
artifact_dir = "artifacts"               # Local directory for uploaded artifacts
shared_secret = "your-secret"            # Auth between runner and orchestrator
orchestrator_url = "https://example.com" # Public URL for webhooks
claude_command = ""                      # Override claude command globally

# Global context (included in all sprints)
context = "Always use Python 3.10+ features."
context_paths = ["./global-context.md"]  # Files to include as context

# -- Cluster configuration --
[[cluster]]
name = "hpc"
host = "login.cluster.example.com"
port = 22
user = "researcher"
key_path = "~/.ssh/id_ed25519"
scheduler_type = "slurm"                 # "slurm", "sge", or "local"
working_dir = "/scratch/user/researchloop"
max_concurrent_jobs = 4
claude_command = "claude --dangerously-skip-permissions"

# Context specific to this cluster
context = "GPUs are NVIDIA L40. Check CUDA_VISIBLE_DEVICES."
context_paths = ["./cluster-notes.md"]

# Environment variables set in SLURM jobs
[cluster.environment]
# ANTHROPIC_API_KEY = "sk-ant-..."       # Only if not using claude login

# SLURM job options (passed as #SBATCH directives)
[cluster.job_options]
gres = "gpu:l40:1"
cpus-per-task = "8"
mem = "64G"

# -- Study configuration --
[[study]]
name = "my-study"
cluster = "hpc"                          # Must match a cluster name
description = "Research into X"
claude_md_path = "./studies/my-study/CLAUDE.md"  # Study-specific context file
sprints_dir = "/scratch/user/my-study"   # Where sprints go (default: working_dir/<study>)
max_sprint_duration_hours = 8            # SLURM time limit
red_team_max_rounds = 3                  # Red-team/fix cycles
allow_loop = true                        # Allow auto-loops for this study
claude_command = ""                      # Override claude command for this study

# Inline study context (included in research prompts)
context = """
Focus on improving F1 score. Use batch size 1024.
"""

# Per-study SLURM overrides
[study.job_options]
gres = "gpu:a100:2"

# -- Notifications --
[ntfy]
url = "https://ntfy.sh"                 # Self-hosted ntfy server URL
topic = "researchloop"                   # ntfy topic name

# -- Slack integration --
[slack]
bot_token = ""                           # xoxb-... (prefer env var)
signing_secret = ""                      # Slack signing secret (prefer env var)
channel_id = "C0123456789"               # Channel or user ID for notifications
allowed_user_ids = ["U0123456789"]       # Users allowed to interact with bot
restrict_to_channel = false              # If true, only respond in channel_id

# -- Dashboard --
[dashboard]
enabled = true
host = "0.0.0.0"
port = 8080
password_hash = ""                       # bcrypt hash (prefer env var or first-run setup)

Environment variable overrides

All secrets and sensitive settings can be set via environment variables with the RESEARCHLOOP_ prefix. Environment variables take precedence over TOML values.

Environment variable	Overrides
`RESEARCHLOOP_SHARED_SECRET`	`shared_secret`
`RESEARCHLOOP_ORCHESTRATOR_URL`	`orchestrator_url`
`RESEARCHLOOP_DB_PATH`	`db_path`
`RESEARCHLOOP_ARTIFACT_DIR`	`artifact_dir`
`RESEARCHLOOP_SLACK_BOT_TOKEN`	`slack.bot_token`
`RESEARCHLOOP_SLACK_SIGNING_SECRET`	`slack.signing_secret`
`RESEARCHLOOP_SLACK_CHANNEL_ID`	`slack.channel_id`
`RESEARCHLOOP_SLACK_ALLOWED_USER_IDS`	`slack.allowed_user_ids` (comma-separated)
`RESEARCHLOOP_NTFY_TOPIC`	`ntfy.topic`
`RESEARCHLOOP_NTFY_URL`	`ntfy.url`
`RESEARCHLOOP_DASHBOARD_PASSWORD`	Auto-hashed on startup
`RESEARCHLOOP_DASHBOARD_PASSWORD_HASH`	`dashboard.password_hash`
`RESEARCHLOOP_DASHBOARD_PORT`	`dashboard.port`
`RESEARCHLOOP_DASHBOARD_HOST`	`dashboard.host`

Deployment

Docker

FROM python:3.12-slim

RUN apt-get update && \
    apt-get install -y --no-install-recommends openssh-client curl git && \
    rm -rf /var/lib/apt/lists/*

# Install Claude CLI
RUN curl -fsSL https://claude.ai/install.sh | bash

# Install researchloop
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
RUN uv venv /app/.venv && \
    uv pip install --python /app/.venv/bin/python --no-cache \
    "researchloop @ git+https://github.com/chanind/researchloop.git"

WORKDIR /app
COPY researchloop.toml .
ENV PATH="/root/.local/bin:/root/.claude/bin:/app/.venv/bin:$PATH"
ENV RESEARCHLOOP_DB_PATH="/data/researchloop.db"
ENV RESEARCHLOOP_ARTIFACT_DIR="/data/artifacts"

EXPOSE 8080
CMD ["researchloop", "serve"]

Fly.io

ResearchLoop works well on Fly.io with a persistent volume for the database and artifacts:

# fly.toml
app = "my-researchloop"
primary_region = "iad"

[build]

[[mounts]]
  source = "researchloop_data"
  destination = "/data"

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = "stop"
  auto_start_machines = true
  min_machines_running = 0

[[vm]]
  size = "shared-cpu-1x"
  memory = "2gb"

Set secrets:

fly secrets set \
  RESEARCHLOOP_SHARED_SECRET="your-secret" \
  RESEARCHLOOP_ORCHESTRATOR_URL="https://my-researchloop.fly.dev" \
  SSH_PRIVATE_KEY="$(cat ~/.ssh/id_ed25519)" \
  RESEARCHLOOP_DASHBOARD_PASSWORD="your-password" \
  -a my-researchloop

Deploy:

fly deploy

SSH key setup for Docker/Fly.io

The orchestrator needs SSH access to your HPC cluster. Add an entrypoint script that writes the key from a secret:

#!/bin/bash
set -euo pipefail
if [ -n "${SSH_PRIVATE_KEY:-}" ]; then
    mkdir -p ~/.ssh
    echo "$SSH_PRIVATE_KEY" > ~/.ssh/id_ed25519
    chmod 600 ~/.ssh/id_ed25519
    cat > ~/.ssh/config <<EOF
Host *
    StrictHostKeyChecking no
    UserKnownHostsFile /dev/null
    LogLevel ERROR
EOF
    chmod 600 ~/.ssh/config
fi
mkdir -p /data/artifacts
exec "$@"

Dashboard

The web dashboard provides a browser-based interface for managing ResearchLoop. It is served by the orchestrator at /dashboard/.

Features

Studies list -- overview of all configured studies with sprint counts
Study detail -- view study configuration, submit new sprints with GPU/memory overrides
Sprint list -- filterable list of all sprints across studies
Sprint detail -- live status with progress.md display, tool log, script output, report rendering (markdown to HTML), PDF download, and artifact listing
Auto-loop management -- start, stop, and resume loops with context guidance and job option overrides
Loop detail -- progress tracking with links to individual loop sprints
Refresh -- pull live status from the cluster via SSH (detects current pipeline step, reads logs)

Authentication

On first visit, the dashboard prompts you to set a password. Alternatively, set RESEARCHLOOP_DASHBOARD_PASSWORD as an environment variable and the password is auto-hashed on startup.

Sessions use signed cookies (7-day expiry) with a signing key persisted in the database. All mutating dashboard actions are protected by CSRF tokens.

CLI authentication

The CLI authenticates to the orchestrator using password-based token auth:

researchloop connect https://your-server.fly.dev
# Prompts for password, saves token to ~/.config/researchloop/credentials.json

researchloop status        # Check connection
researchloop disconnect    # Remove saved credentials

Slack integration

Setup

Go to api.slack.com/apps and create a new app
Enable Event Subscriptions with request URL: https://your-server.fly.dev/api/slack/events
Subscribe to bot events: app_mention, message.im
Add OAuth Scopes: chat:write, files:write
Install the app to your workspace
Set environment variables:

RESEARCHLOOP_SLACK_BOT_TOKEN="xoxb-..."
RESEARCHLOOP_SLACK_SIGNING_SECRET="..."
RESEARCHLOOP_SLACK_CHANNEL_ID="C0123456789"          # For notifications
RESEARCHLOOP_SLACK_ALLOWED_USER_IDS="U01,U02"        # Comma-separated

Commands

Command	Description
`sprint run <study> <idea>`	Submit a new sprint
`sprint list`	List recent sprints
`auth status`	Check if Claude CLI is authenticated
`help`	Show available commands

Conversational mode

Beyond commands, the Slack bot supports free-form conversations. Messages in a thread are tracked as a Claude session (via --resume), so the bot remembers context within a thread. The bot can:

Discuss research ideas and help plan sprints
Review results from completed sprints
Look up papers and references (web search)
Execute actions (start sprints, loops) when you ask

Notifications

When sprints complete or fail, the bot sends notifications to the configured channel. Completed sprint notifications include the summary and a link to the dashboard. If a PDF report was generated, it is uploaded as an attachment.

CLI reference

researchloop [OPTIONS] COMMAND

Options:
  -c, --config PATH    Path to researchloop.toml
  --version            Show version
  --help               Show help

Commands:
  init                 Initialize a new project with example config
  serve                Start the orchestrator server
  connect [URL]        Authenticate CLI to a remote orchestrator
  disconnect           Remove saved credentials
  status               Show connection status

  study list           List all configured studies
  study show NAME      Show study details and recent sprints
  study init NAME      Scaffold a new study directory with starter CLAUDE.md

  sprint run IDEA      Submit a new sprint (-s/--study required)
  sprint list          List sprints (--study, --limit options)
  sprint show ID       Show sprint details, artifacts, and summary
  sprint cancel ID     Cancel a running sprint

  loop start           Start an auto-loop (-s/--study, -n/--count, -m/--context)
  loop status          Show all auto-loops
  loop stop LOOP_ID    Stop a running auto-loop

  cluster list         List configured clusters
  cluster check        Test SSH connectivity (--name for specific cluster)

API endpoints

The orchestrator exposes a REST API at /api/:

Method	Path	Auth	Description
`POST`	`/api/auth`	Password	Get API token
`GET`	`/api/studies`	Token/Secret	List all studies
`GET`	`/api/sprints`	Token/Secret	List sprints (`?study_name=`, `?limit=`)
`GET`	`/api/sprints/{id}`	Token/Secret	Get sprint details
`POST`	`/api/sprints`	Token/Secret	Create and submit a sprint
`POST`	`/api/sprints/{id}/cancel`	Token/Secret	Cancel a sprint
`POST`	`/api/loops`	Token/Secret	Start an auto-loop
`POST`	`/api/loops/{id}/stop`	Token/Secret	Stop an auto-loop
`POST`	`/api/webhook/sprint-complete`	Webhook token	Sprint completion callback
`POST`	`/api/webhook/heartbeat`	Webhook token	Runner heartbeat with logs
`POST`	`/api/artifacts/{sprint_id}`	Webhook token	Upload artifact file
`POST`	`/api/slack/events`	Slack signature	Slack Events API handler

Authentication uses either a bearer token (from /api/auth) or the X-Shared-Secret header. Webhook endpoints use per-sprint X-Webhook-Token headers.

Development

Setup

git clone https://github.com/chanind/researchloop.git
cd researchloop
uv sync

Run tests

# Unit tests (339 tests, ~3s)
uv run pytest tests/ -v -m "not integration"

# Integration tests (requires Docker for SLURM container)
docker build -t researchloop-slurm-test tests/docker/slurm/
uv run pytest tests/integration/ -v --timeout=120

Code quality

uv run ruff check .             # Lint
uv run ruff format --check .    # Format check
uv run pyright researchloop/    # Type check

Project structure

researchloop/
  core/
    config.py          TOML config loading into dataclasses
    models.py          SprintStatus enum, Sprint/Study/AutoLoop dataclasses
    orchestrator.py    Orchestrator class + create_app() FastAPI factory
    credentials.py     CLI credential storage (~/.config/researchloop/)
    auth.py            Claude CLI auth checking
  db/
    database.py        Async SQLite wrapper (WAL mode, auto-migrations)
    migrations.py      Schema definitions (7 tables + indexes)
    queries.py         Async CRUD functions (parameterized SQL, return dicts)
  clusters/
    ssh.py             SSHConnection + SSHManager (connection pooling)
    monitor.py         JobMonitor (polls active jobs, heartbeat tracking)
  schedulers/
    base.py            BaseScheduler ABC
    slurm.py           SlurmScheduler (sbatch/squeue/sacct/scancel)
    sge.py             SGEScheduler (qsub/qstat/qacct/qdel)
    local.py           LocalScheduler (subprocesses, for testing)
  sprints/
    manager.py         SprintManager (create/submit/cancel/handle_completion)
    auto_loop.py       AutoLoopController (start/stop/resume, idea generation)
  studies/
    manager.py         StudyManager (config-to-DB sync, cluster resolution)
  runner/
    pipeline.py        Pipeline class (research pipeline steps)
    claude.py          run_claude() wrapper + render_template()
    upload.py          upload_artifacts(), send_webhook(), send_heartbeat()
    main.py            Runner CLI entry point (researchloop-runner)
    templates/         Jinja2 prompt templates (6 templates)
    job_templates/     SLURM (slurm.sh.j2) and SGE (sge.sh.j2) job scripts
  comms/
    base.py            BaseNotifier ABC
    ntfy.py            NtfyNotifier (ntfy.sh push notifications)
    slack.py           SlackNotifier + verify_slack_signature()
    conversation.py    ConversationManager (Slack threads to Claude sessions)
    router.py          NotificationRouter (fan-out to all backends)
  dashboard/
    app.py             ASGI app entry point
    auth.py            Password auth (bcrypt + signed session cookies + CSRF)
    routes.py          Dashboard HTML routes
    templates/         Jinja2 HTML templates (9 templates)
  cli.py               Click CLI entry point

CI

GitHub Actions runs on every push and PR to main:

Lint -- ruff check, ruff format --check, pyright
Test -- pytest on Python 3.10, 3.12, 3.13
Integration -- builds a Docker SLURM container and runs integration tests

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.1

May 3, 2026

0.3.0

May 1, 2026

0.2.0

Apr 13, 2026

This version

0.1.0

Mar 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

researchloop-0.1.0.tar.gz (240.4 kB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

researchloop-0.1.0-py3-none-any.whl (108.9 kB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file researchloop-0.1.0.tar.gz.

File metadata

Download URL: researchloop-0.1.0.tar.gz
Upload date: Mar 20, 2026
Size: 240.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for researchloop-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`38540efc6c4f73244554cbf050616d1b706c1770992021f3c5d85ef9a0b47f6d`
MD5	`61855b047bdc033bc6f85f9c9211690d`
BLAKE2b-256	`1f7dbf8fa94ef22a7079913539701714496eb3555ffb8743bb7dd13b7a4bbd9f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for researchloop-0.1.0.tar.gz:

Publisher: release.yml on researchloop/researchloop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: researchloop-0.1.0.tar.gz
- Subject digest: 38540efc6c4f73244554cbf050616d1b706c1770992021f3c5d85ef9a0b47f6d
- Sigstore transparency entry: 1141294370
- Sigstore integration time: Mar 20, 2026
Source repository:
- Permalink: researchloop/researchloop@4d67004be3e53eb3af11c36a45b6ea03ecfb73be
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/researchloop
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@4d67004be3e53eb3af11c36a45b6ea03ecfb73be
- Trigger Event: push

File details

Details for the file researchloop-0.1.0-py3-none-any.whl.

File metadata

Download URL: researchloop-0.1.0-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 108.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for researchloop-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a08f0c258b15c40d827d8079e9109be700c42126268ba8a319f4b73bd3399bcf`
MD5	`563fbeaa5d33dc3e30748e801d0dc183`
BLAKE2b-256	`ceea9216a5610923ccf7ad0e8499e1564af306afa3376066af72089be6200e71`

See more details on using hashes here.

Provenance

The following attestation bundles were made for researchloop-0.1.0-py3-none-any.whl:

Publisher: release.yml on researchloop/researchloop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: researchloop-0.1.0-py3-none-any.whl
- Subject digest: a08f0c258b15c40d827d8079e9109be700c42126268ba8a319f4b73bd3399bcf
- Sigstore transparency entry: 1141294468
- Sigstore integration time: Mar 20, 2026
Source repository:
- Permalink: researchloop/researchloop@4d67004be3e53eb3af11c36a45b6ea03ecfb73be
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/researchloop
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@4d67004be3e53eb3af11c36a45b6ea03ecfb73be
- Trigger Event: push

researchloop 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ResearchLoop

How it works

Core concepts

Sprint pipeline

Features

Quick start

Prerequisites

Install

Initialize a project

Configure

Start the server and run a sprint

Configuration reference

Complete researchloop.toml example

Environment variable overrides

Deployment

Docker

Fly.io

SSH key setup for Docker/Fly.io

Dashboard

Features

Authentication

CLI authentication

Slack integration

Setup

Commands

Conversational mode

Notifications

CLI reference

API endpoints

Development

Setup

Run tests

Code quality

Project structure

CI

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Complete `researchloop.toml` example