Automated research sprint platform for HPC clusters
Project description
ResearchLoop
Run AI-automated research experiments on your HPC cluster. Monitor from anywhere.
ResearchLoop submits AI-powered research experiments to your SLURM or SGE cluster, then reports back the results. You describe a research idea in natural language, it handles the rest: submitting the job, running a multi-step pipeline with Claude Code, red-teaming the results, generating a report, and notifying you when it's done.
pip install researchloop
# Submit an experiment to your cluster
researchloop sprint run "Investigate whether batch normalization improves convergence" --study my-project
# Start an auto-loop: 5 experiments, each building on the last
researchloop loop start --study my-project --count 5 --context "Focus on improving F1 score"
Monitor everything from a web dashboard, Slack, or the CLI -- no need to SSH in and check on jobs.
Why ResearchLoop?
If you run experiments on shared HPC clusters, you know the pain: SSH in, write a script, submit with sbatch, wait, check logs, repeat. ResearchLoop automates this loop:
- You describe what to investigate (via CLI, dashboard, or Slack)
- ResearchLoop submits a job to your cluster via SSH
- Claude runs the full experiment -- writes code, runs it, analyzes results
- A red-team step critiques the work and Claude fixes any issues
- You get a report with a summary, PDF, and all artifacts
The auto-loop feature takes this further: after each experiment, Claude analyzes the results and proposes the next one. You set how many iterations, and walk away.
Get started in 5 minutes
Prerequisites: Python 3.10+, SSH access to an HPC cluster, Claude Code installed on the cluster.
1. Install and initialize
pip install researchloop
researchloop init
2. Edit researchloop.toml
shared_secret = "pick-a-secret"
orchestrator_url = "http://localhost:8080"
[[cluster]]
name = "my-cluster"
host = "login.cluster.example.com"
user = "researcher"
key_path = "~/.ssh/id_ed25519"
scheduler_type = "slurm"
working_dir = "/scratch/researcher/researchloop"
[cluster.job_options]
gres = "gpu:1"
mem = "64G"
cpus-per-task = "8"
[[study]]
name = "my-project"
cluster = "my-cluster"
description = "Investigating X"
3. Start the server and run your first sprint
researchloop serve &
researchloop connect http://localhost:8080
researchloop sprint run "Try approach X on dataset Y" --study my-project
That's it. ResearchLoop SSHes to your cluster, submits the job, and you can monitor progress from the dashboard at http://localhost:8080/dashboard/.
Three ways to interact
Web dashboard
Browse to /dashboard/ to see all your studies, sprints, and loops. Submit new sprints, start loops with custom GPU/memory settings, refresh live status from the cluster, and read reports -- all from the browser.
Slack bot
Get sprint notifications in your Slack channel and run commands from a thread:
sprint run my-project "investigate feature X under condition Y"
sprint list
loop start my-project 5
help
See the Slack setup guide for configuration.
CLI
researchloop sprint run "idea" --study my-project # Submit a sprint
researchloop sprint list # List recent sprints
researchloop sprint show sp-a3f7b2 # View details
researchloop loop start --study my-project --count 5 # Auto-loop
researchloop loop stop loop-b4e1c9 # Stop a loop
Customizing your studies
Each study can have its own context, cluster settings, and configuration:
[[study]]
name = "sae-research"
cluster = "my-cluster"
max_sprint_duration_hours = 12
red_team_max_rounds = 2
allow_loop = true
# Tell Claude what this study is about and how to approach it
context = """
You are researching sparse autoencoder architectures.
Always train for 200M samples. Use batch size 1024.
Validate on the variation models listed in ~/reference/models.txt.
"""
# Or point to a file with detailed instructions
claude_md_path = "./studies/sae-research/CLAUDE.md"
# Override GPU/memory for this study
[study.job_options]
gres = "gpu:a100:2"
mem = "128G"
The context hierarchy is: global > cluster > study. All levels are merged and included in every sprint's prompt.
Deployment
For production, deploy the orchestrator as a Docker container on Fly.io, Railway, or any platform that supports persistent volumes:
pip install researchloop
# See deployment guide for Docker/Fly.io setup
Full deployment guide: researchloop.github.io/researchloop/deployment
Documentation
Full docs at researchloop.github.io/researchloop, including:
- Configuration reference -- all TOML options and environment variables
- Deployment guide -- Docker, Fly.io, SSH key setup
- Dashboard guide -- web UI features and authentication
- Slack integration -- setup, commands, notifications
- CLI reference -- all commands with examples
- Security -- authentication, CSRF, webhook tokens
- Development -- contributing, testing, architecture
Contributing
git clone https://github.com/researchloop/researchloop.git
cd researchloop
uv sync
uv run pytest tests/ -m "not integration" # Unit tests
uv run ruff check . && uv run pyright researchloop/ # Lint + type check
Integration tests run against a real SLURM scheduler in Docker -- see development guide.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file researchloop-0.3.0.tar.gz.
File metadata
- Download URL: researchloop-0.3.0.tar.gz
- Upload date:
- Size: 8.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de93e9148606755be91b5e7645feecfe29e3df486c6172722a0ce37898da5188
|
|
| MD5 |
f6296c3ea866e0729dbc974c37500188
|
|
| BLAKE2b-256 |
7ea4ac71e7ca48e4ab548623c395af1ba8286e546d00d2a56594935d3257017e
|
Provenance
The following attestation bundles were made for researchloop-0.3.0.tar.gz:
Publisher:
release.yml on researchloop/researchloop
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
researchloop-0.3.0.tar.gz -
Subject digest:
de93e9148606755be91b5e7645feecfe29e3df486c6172722a0ce37898da5188 - Sigstore transparency entry: 1417778157
- Sigstore integration time:
-
Permalink:
researchloop/researchloop@73df293dbb091d924d9f661962c837da5b096acd -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/researchloop
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@73df293dbb091d924d9f661962c837da5b096acd -
Trigger Event:
push
-
Statement type:
File details
Details for the file researchloop-0.3.0-py3-none-any.whl.
File metadata
- Download URL: researchloop-0.3.0-py3-none-any.whl
- Upload date:
- Size: 129.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db3961dbb71f69aac64c2781eb1570b80485cf13f3c1ed302ca023ea4aa9e359
|
|
| MD5 |
a683fc9b8486418fada75c86ffe588da
|
|
| BLAKE2b-256 |
7b75e308494dd67e7849256f69c198b02a6b25ba661f9942f9331e8fd32f73de
|
Provenance
The following attestation bundles were made for researchloop-0.3.0-py3-none-any.whl:
Publisher:
release.yml on researchloop/researchloop
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
researchloop-0.3.0-py3-none-any.whl -
Subject digest:
db3961dbb71f69aac64c2781eb1570b80485cf13f3c1ed302ca023ea4aa9e359 - Sigstore transparency entry: 1417778162
- Sigstore integration time:
-
Permalink:
researchloop/researchloop@73df293dbb091d924d9f661962c837da5b096acd -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/researchloop
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@73df293dbb091d924d9f661962c837da5b096acd -
Trigger Event:
push
-
Statement type: