Web interface to start and track jobs on remote systems (Slurm, ECS, AWS Batch)
Project description
ScriptHut
A Python web interface to start and track jobs on remote HPC systems (Slurm, PBS/Torque) over SSH, on AWS Batch via the AWS API, and on AWS EC2 directly (one instance per task, SSH tunnelled via SSM).
Features
- Multi-backend support - Monitor Slurm, PBS/Torque, AWS Batch, and AWS EC2 queues from a single dashboard
- Real-time job monitoring - View running and pending jobs with auto-refresh via SSE
- Task runs - Submit batches of jobs with configurable concurrency limits and dependencies
- Unified job view - See run-submitted and external jobs in one dashboard
- Git workflow integration - Clone repos on the backend before running task generators
- Persistent SSH connections - Maintains connections with keepalive and auto-reconnect
- HTMX frontend - Dynamic updates without full page reloads
- Cost estimation - Estimate run costs using EC2 spot/on-demand pricing from instances.vantage.sh
- Extensible - Abstract backend system ready for additional schedulers
Examples
See scripthut-examples for complete, self-contained workflow examples in R, Python, Julia, and Apptainer.
Installation
# Clone the repository
git clone https://github.com/yourusername/scripthut.git
cd scripthut
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install the package
pip install -e .
# For AWS Batch support (installs boto3)
pip install -e ".[batch]"
# For development (includes mypy, ruff, pytest)
pip install -e ".[dev]"
Docker
docker run -d -p 8000:8000 \
-v ./scripthut.yaml:/app/scripthut.yaml \
-v ~/.ssh:/root/.ssh:ro \
ghcr.io/tlamadon/scripthut:main
Then open http://localhost:8000.
Configuration
ScriptHut uses a YAML configuration file. Copy the example and customize:
cp scripthut.example.yaml scripthut.yaml
YAML Configuration
# scripthut.yaml
backends:
# Slurm cluster
- name: hpc-cluster
type: slurm
ssh:
host: slurm-login.cluster.edu
port: 22
user: researcher
key_path: ~/.ssh/id_rsa
account: my-allocation # optional: --account flag
login_shell: false # optional: use #!/bin/bash -l
max_concurrent: 100 # optional: max jobs across all runs
# PBS/Torque cluster
- name: pbs-cluster
type: pbs
ssh:
host: pbs-login.cluster.edu
user: researcher
key_path: ~/.ssh/id_rsa
account: my-allocation # optional: -A flag
queue: batch # optional: default queue (overrides task partition)
login_shell: false
max_concurrent: 100
# Sources: git repos or backend paths with workflow JSON files (matched via workflows_glob)
sources:
- name: ml-jobs
type: git
url: git@github.com:org/ml-pipelines.git
branch: main
deploy_key: ~/.ssh/ml-jobs-deploy-key
backend: hpc-cluster
settings:
data_dir: ~/.cache/scripthut # base for all stored data
poll_interval: 60
server_host: 127.0.0.1
server_port: 8000
Configuration Options
Backends
Common fields (Slurm and PBS):
| Field | Description |
|---|---|
name |
Unique identifier for the backend |
type |
Backend type: slurm, pbs, or ecs |
ssh.host |
SSH hostname |
ssh.port |
SSH port (default: 22) |
ssh.user |
SSH username |
ssh.key_path |
Path to SSH private key |
ssh.cert_path |
Path to SSH certificate (optional) |
ssh.known_hosts |
Path to known_hosts file (optional) |
account |
Account to charge jobs to (Slurm --account, PBS -A) |
login_shell |
Use #!/bin/bash -l in submission scripts (default: false) |
max_concurrent |
Max concurrent jobs across all runs (default: 100) |
clone_dir |
Path on the backend whose disk usage is shown in the UI (default: ~/scripthut-repos) |
PBS-specific:
| Field | Description |
|---|---|
queue |
Default PBS queue (overrides task partition field) |
AWS Batch backend (type: batch):
AWS Batch is an API-based backend — there is no SSH host. Install the [batch] extra for boto3 support. AWS credentials come from the standard AWS credential chain (see AWS Credentials below).
backends:
- name: aws-batch
type: batch
aws:
profile: my-profile # optional, omit to use the default chain
region: us-east-1
job_queue: my-batch-queue # default queue; task.partition overrides
# --- Option 1: use an existing job definition (recommended) ---
job_definition: simpleJobDef-b760b37
# --- Option 2: let scripthut auto-register job definitions per image ---
# default_image: ghcr.io/org/workflow:latest
# job_role_arn: arn:aws:iam::123456789012:role/BatchJobRole
# execution_role_arn: arn:aws:iam::123456789012:role/BatchExec
retry_attempts: 1 # Batch retryStrategy.attempts (1–10)
log_group: /aws/batch/job # CloudWatch log group (default)
max_concurrent: 50
| Field | Required | Description |
|---|---|---|
aws.profile |
No | AWS CLI profile name (from ~/.aws/credentials). Omit to use the default credential chain. |
aws.region |
Yes | AWS region (e.g. us-east-1). |
aws.job_queue |
Yes | Default AWS Batch job queue. Tasks whose partition is unset use this queue. |
job_definition |
No* | Pre-registered Batch job definition to submit against. Accepts a bare name (simpleJobDef), name:revision, or full ARN. Works in two modes (see job_definition_mode). *Either job_definition or default_image must be set. |
job_definition_mode |
No | locked (default) — always submits against the configured definition, ignoring per-task image with a warning on mismatch. revisions — treats the configured definition as a template: on first submit with a new image, scripthut copies its container properties (image swapped, roles / log config / retry / tags preserved), registers a new revision of the same jobDefinitionName, and caches it per image. Requires batch:RegisterJobDefinition. |
default_image |
No* | Container image URI used when scripthut auto-registers a job definition. Ignored when job_definition is set. |
job_role_arn |
No | IAM role ARN the container assumes at runtime (jobRoleArn). Only used for auto-registration. |
execution_role_arn |
No | IAM role used by ECS to pull the image and push logs (executionRoleArn). Only used for auto-registration. |
retry_attempts |
No | retryStrategy.attempts in Batch terms. Default 1 (no retry). Allowed range 1–10. |
log_group |
No | CloudWatch Logs group where Batch writes container logs. Default: /aws/batch/job. |
How ScriptHut maps task fields to AWS Batch:
| scripthut field | AWS Batch concept |
|---|---|
task.partition |
Job queue (falls back to aws.job_queue) |
task.cpus / task.memory |
containerOverrides.resourceRequirements (VCPU / MEMORY in MiB) |
task.time_limit |
timeout.attemptDurationSeconds (min 60s; omitted if below) |
task.gres="gpu:N" |
GPU resource requirement |
task.image (or default_image) |
Container image URI |
| Generated script | containerOverrides.command = ["bash", "-c", <script>] |
Three modes for job definitions:
-
Pre-registered, locked (
job_definition: <name>— defaultjob_definition_mode: locked): scripthut submits against the existing definition verbatim and skipsRegisterJobDefinition. Sidesteps the cross-accountiam:PassRolerestriction thatRegisterJobDefinitionenforces. The image is locked to whatever the definition was registered with — task-levelimagevalues are ignored with a warning on mismatch. -
Pre-registered, revisions (
job_definition: <name>+job_definition_mode: revisions): scripthut treats the configured definition as a template. On first submit with a new image, it describes the template, clones its container properties (image swapped, roles / log config / retry / tags preserved), and callsRegisterJobDefinitionwith the samejobDefinitionName— AWS auto-increments the revision. Subsequent submits with the same image hit an in-process cache. Good for workflows where different tasks use different images but share roles/networking. Requiresbatch:RegisterJobDefinition; PassRole is still subject to the template's role ARN accounts. -
Auto-register (
default_image: <image>and optional role ARNs, nojob_definition): on first submission for a given image signature, scripthut registers a job definition namedscripthut-<hash>and reuses it. Per-taskcommand,env,vcpus,memory, andtimeoutare all applied viacontainerOverridesat submit time.
Environment variables are passed via containerOverrides.environment so the container's entrypoint sees them at process start, not only after the bash script exports them. Variable names are listed as a comment in the generated script for visibility, but values travel through the AWS API.
AWS EC2 backend (type: ec2):
Launches one dedicated EC2 instance per task, docker runs the container inside via user-data, and reconciles completion over SSH tunnelled through AWS SSM Session Manager (no inbound port 22 on your instances, no EC2 key pair management).
backends:
- name: aws-ec2
type: ec2
aws:
profile: scripthut # optional
region: us-east-2
ami: ami-0abc123... # must have sshd + SSM Agent + docker
subnet_id: subnet-0abc...
security_group_ids: [sg-0abc...] # inbound port 22 NOT required
instance_types:
default: c5.xlarge # maps task.partition → instance type
gpu: g4dn.xlarge
default_image: ghcr.io/org/image:latest
instance_profile_arn: arn:aws:iam::123:instance-profile/ScriptHutTask
ssh_user: ec2-user # "ubuntu" for Ubuntu AMIs
max_instances: 20 # hard cap — refuses submits above this
idle_terminate_seconds: 1800 # safety-timer slack beyond task.time_limit
startup_timeout_seconds: 600
tag_prefix: scripthut # used to reconcile instances on restart
extra_tags:
Environment: research
| Field | Required | Description |
|---|---|---|
aws.region |
Yes | AWS region. |
aws.profile |
No | AWS CLI profile (omit to use default chain). |
ami |
Yes | AMI used for every task instance. Must include sshd, SSM Agent, and docker. |
subnet_id |
Yes | VPC subnet where task instances launch. |
security_group_ids |
No | Optional SG IDs. No inbound rules needed — SSH traffic is tunnelled via SSM. |
instance_types |
No | Mapping of task.partition → EC2 instance type. Keys default to {"default": "c5.xlarge"}. |
instance_profile_arn |
No | IAM instance profile for each task. At minimum include AmazonSSMManagedInstanceCore; add ECR/S3 permissions if the container needs them. |
default_image |
No | Container image used when a task has no image: set. Required unless every task specifies its own. |
ssh_user |
No | OS user scripthut logs in as (default ec2-user). |
max_instances |
No | Hard cap on concurrent task instances (default 20). |
idle_terminate_seconds |
No | Safety-shutdown slack past the task's time_limit (default 1800). |
startup_timeout_seconds |
No | Scripthut fails a task if its instance never reaches "running" (default 600). |
tag_prefix |
No | Tag namespace scripthut uses to recognize its own instances on startup (default scripthut). |
extra_tags |
No | Additional tags applied to every launched instance. |
How it works:
submit_task→RunInstanceswithInstanceInitiatedShutdownBehavior=terminateand a user-data script that pulls your image,docker runs it with the task's bash script mounted, pipes output to/var/log/scripthut/task.log, and writes/var/run/scripthut/donecontaining the exit code.- Each poll cycle, scripthut opens a one-shot SSM Session Manager port-forward to the instance, pushes an ephemeral SSH key via EC2 Instance Connect (valid 60 s), connects over the tunnel, and
cats the sentinel. - When the sentinel appears, scripthut SSH-copies the log to
<data_dir>/ec2-logs/<run_id>/<task_id>.log, then callsTerminateInstances. - On scripthut restart, a startup reconciler picks up any existing
tag:scripthut:backend=<name>instances and resumes polling them.
Safety layers (in order):
InstanceInitiatedShutdownBehavior=terminate+ ashutdown -h nowsafety timer in user-data — the instance self-destructs aftertask.time_limit + idle_terminate_secondsregardless of what scripthut does.timeout <task.time_limit>swraps thedocker runitself — guaranteed hard kill of runaway containers.- Startup reconciler adopts orphans after a scripthut restart.
Requirements on the scripthut host:
- AWS CLI (
aws) on PATH session-manager-pluginboto3(frompip install 'scripthut[batch]')
Minimum IAM for the principal scripthut runs as:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:RunInstances", "ec2:DescribeInstances",
"ec2:TerminateInstances", "ec2:CreateTags",
"ec2:DescribeSubnets", "ec2:DescribeSecurityGroups"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "<instance_profile_role_arn>"
},
{
"Effect": "Allow",
"Action": [
"ssm:StartSession", "ssm:TerminateSession", "ssm:DescribeSessions",
"ec2-instance-connect:SendSSHPublicKey"
],
"Resource": "*"
}
]
}
No CloudWatch Logs permissions required — logs stream over the SSH tunnel.
For the fastest setup, run the guided wizard:
scripthut setup-aws-ec2
It discovers your default VPC + subnet, looks up the latest Amazon Linux 2023 AMI, deploys the IAM CloudFormation stack, appends a complete backend stanza to scripthut.yaml, and prints the one-liner you still need to run by hand (attaching the controller policy to your IAM principal). All prompts have flags so you can run it non-interactively in CI.
If you'd rather just deploy the CloudFormation stack manually:
aws cloudformation deploy \
--template-file cloudformation/scripthut-ec2-iam.yaml \
--stack-name scripthut-ec2-iam \
--capabilities CAPABILITY_NAMED_IAM
See docs/configuration/index.md for the full options reference.
Logs: AWS Batch writes stdout+stderr to a single CloudWatch Log stream per job. ScriptHut reads them via logs:GetLogEvents. The "error" log tab shows a note pointing you to the "output" tab.
Git workflows on AWS Batch: There is no shared filesystem, so ScriptHut resolves the commit SHA locally (via git ls-remote) and passes it to each container via the SCRIPTHUT_GIT_REPO, SCRIPTHUT_GIT_BRANCH, and SCRIPTHUT_GIT_SHA environment variables. The generated bash script includes a runtime git clone + git checkout block so each container fetches the same ref before running the task command.
AWS Credentials
ScriptHut never stores AWS credentials in scripthut.yaml. It uses boto3's standard credential resolution chain, which picks up credentials from (in order):
- Environment variables —
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_SESSION_TOKEN(and optionallyAWS_DEFAULT_REGION). - Shared credentials file —
~/.aws/credentialswithaws.profileselecting which profile to use. Runaws configureoraws configure sso --profile <name>to populate it. - IAM role for the instance (recommended for production) — when scripthut runs on EC2 / ECS / EKS / Fargate, the role attached to the instance is used automatically. No credentials are stored on the host.
The simplest setups:
# Option A — named CLI profile
aws configure --profile scripthut
# Then set aws.profile: scripthut in scripthut.yaml
# Option B — environment variables (useful for CI or one-off runs)
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-east-1
# Option C — AWS SSO
aws sso login --profile my-sso-profile
# Then set aws.profile: my-sso-profile in scripthut.yaml
For Docker deployments, mount your credentials read-only (or prefer an IAM task role if running on AWS):
docker run -d -p 8000:8000 \
-v ./scripthut.yaml:/app/scripthut.yaml \
-v ~/.aws:/root/.aws:ro \
-e AWS_PROFILE=scripthut \
ghcr.io/tlamadon/scripthut:latest
Minimum IAM permissions required for the principal scripthut runs as:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"batch:SubmitJob",
"batch:DescribeJobs",
"batch:ListJobs",
"batch:CancelJob",
"batch:TerminateJob",
"batch:RegisterJobDefinition",
"batch:DescribeJobQueues",
"batch:DescribeComputeEnvironments"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": ["logs:GetLogEvents"],
"Resource": "arn:aws:logs:*:*:log-group:/aws/batch/job:*"
}
]
}
If you configure job_role_arn / execution_role_arn, the principal also needs iam:PassRole on those roles.
Sources
Sources are git repositories or backend filesystem paths containing workflow definitions. ScriptHut discovers workflow JSON files using a configurable glob pattern (workflows_glob, default: .hut/workflows/*.json). Use patterns like **/*.hut.json to match files recursively across any subdirectory. Each matched JSON file appears as a triggerable workflow on the Sources page.
Common fields:
| Field | Description |
|---|---|
name |
Unique identifier for the source |
type |
Source type: git or path |
backend |
Backend to submit discovered workflow tasks to |
workflows_glob |
Glob pattern to find workflow JSON files (default: .hut/workflows/*.json, supports ** for recursive) |
Git source fields (type: git):
| Field | Description |
|---|---|
url |
Git repository URL (SSH format recommended) |
branch |
Branch to track (default: main) |
deploy_key |
Path to deploy key for authentication |
clone_dir |
Parent directory on backend for clones (default: ~/scripthut-repos) |
postclone |
Shell command to run after cloning |
Path source fields (type: path):
| Field | Description |
|---|---|
path |
Directory on the backend filesystem |
Settings
| Field | Description | Default |
|---|---|---|
data_dir |
Base directory for all stored data | ~/.cache/scripthut |
poll_interval |
Seconds between job polls | 60 |
server_host |
Web server bind host | 127.0.0.1 |
server_port |
Web server bind port | 8000 |
sources_cache_dir |
Directory for cloned repos (overrides <data_dir>/sources) |
None |
Usage
# Use default config (./scripthut.yaml)
scripthut
# Specify config file
scripthut --config /path/to/config.yaml
# Override host/port
scripthut --host 0.0.0.0 --port 9000
Open http://127.0.0.1:8000 in your browser.
API Endpoints
Jobs
| Endpoint | Description |
|---|---|
GET / |
Main page with unified job list |
GET /jobs |
HTMX partial for job table |
GET /jobs/stream |
SSE endpoint for live updates |
POST /filter/toggle |
Toggle user filter on/off |
Runs
| Endpoint | Description |
|---|---|
GET /runs |
Run management page |
GET /runs/{id} |
Run detail page |
GET /runs/{id}/items |
HTMX partial for run items |
POST /runs/{id}/cancel |
Cancel all pending/running items |
GET /runs/{id}/tasks/{task_id}/script |
View submission script |
GET /runs/{id}/tasks/{task_id}/logs/{type} |
View task logs (output/error) |
Workflows
| Endpoint | Description |
|---|---|
GET /workflows |
List configured workflows (JSON) |
POST /workflows/{name}/run |
Create a new run from workflow |
GET /workflows/{name}/dry-run |
Preview tasks without submitting |
System
| Endpoint | Description |
|---|---|
GET /health |
Health check (JSON) |
GET /api/sources |
List source statuses (JSON) |
GET /sources |
Sources page (HTML) |
POST /sources/{name}/sync |
Trigger source sync |
GET /sources/{name}/workflows |
List discovered workflows |
GET /sources/{name}/workflows/{file}/dry-run |
Preview a source workflow |
POST /sources/{name}/workflows/{file}/run |
Trigger a source workflow |
Resources and Lifecycles
ScriptHut tracks several interconnected resources. Understanding their lifecycles helps you effectively monitor and manage your jobs.
Jobs
Jobs are the primary resource displayed on the dashboard. ScriptHut tracks jobs from two sources:
- Run jobs: Submitted through ScriptHut's run system
- External jobs: Detected via scheduler polling (jobs submitted outside ScriptHut)
Job States
┌─────────┐ ┌───────────┐ ┌─────────┐ ┌───────────┐
│ PENDING │────>│ SUBMITTED │────>│ RUNNING │────>│ COMPLETED │
└─────────┘ └───────────┘ └─────────┘ └───────────┘
│ │ │
│ │ │ ┌────────┐
└───────────────┴─────────────────┴─────────>│ FAILED │
└────────┘
| State | Description |
|---|---|
pending |
Job is in a run, waiting to be submitted to the scheduler |
submitted |
Job has been submitted (sbatch/qsub), waiting in scheduler queue |
running |
Job is actively executing on compute nodes |
completed |
Job finished successfully |
failed |
Job failed, was cancelled, timed out, or encountered an error |
dep_failed |
Job was skipped because a dependency failed |
Runs
Runs are batches of tasks created from a Workflow. Each run manages multiple jobs with configurable concurrency.
Run Lifecycle
┌─────────────────┐
│ Workflow │ (SSH command returns JSON task list)
└────────┬────────┘
│ POST /workflows/{name}/run
v
┌─────────────────┐
│ Run Created │ (All tasks registered as PENDING)
└────────┬────────┘
│ Submit up to max_concurrent tasks
v
┌─────────────────┐
│ Run Running │ (Mix of PENDING, SUBMITTED, RUNNING tasks)
└────────┬────────┘
│ As tasks complete, new ones are submitted
v
┌─────────────────┐
│ Run Completed │ (All tasks COMPLETED or FAILED)
└─────────────────┘
Run States
| State | Description |
|---|---|
pending |
Run created but no tasks submitted yet |
running |
Has tasks that are submitted or running |
completed |
All tasks completed successfully |
failed |
Some tasks failed (others may have completed) |
cancelled |
Run was manually cancelled |
Workflows
Workflows define how to fetch a list of tasks to run. They execute an SSH command that returns JSON.
workflows:
- name: my-batch-jobs
backend: hpc-cluster
command: "python ~/scripts/generate_tasks.py"
max_concurrent: 10
description: "Run my batch processing jobs"
Git Workflows
Workflows can optionally clone a git repository on the backend before running the command. This is useful when your task-generating script lives in a repo rather than being pre-installed on the cluster.
workflows:
- name: ml-training-git
backend: hpc-cluster
git:
repo: git@github.com:your-org/ml-pipelines.git
branch: main
deploy_key: ~/.ssh/ml-deploy-key # local path, uploaded temporarily
clone_dir: ~/scripthut-repos # parent dir on backend (default)
command: "python get_tasks.py"
max_concurrent: 5
description: "ML training from git repo"
When a workflow has a git section, ScriptHut will:
- Upload the deploy key (if any) to the backend temporarily
- Resolve the branch HEAD commit hash via
git ls-remote - Clone into
<clone_dir>/<commit_hash>/(skipped if already present) - Run the
commandinside the cloned directory - Clean up the temporary deploy key
| Field | Required | Description |
|---|---|---|
git.repo |
Yes | Git repository URL (SSH format recommended) |
git.branch |
No | Branch to clone (default: main) |
git.deploy_key |
No | Path to deploy key on local machine |
git.clone_dir |
No | Parent directory on backend (default: ~/scripthut-repos) |
Working directory resolution: When a git workflow is active, each task's working_dir is resolved relative to the clone directory:
- Default (
~or omitted) -- set to the clone directory - Relative path (e.g.,
simulations,src/analysis) -- joined as<clone_dir>/<working_dir> - Absolute path (e.g.,
/scratch/data) or home-relative (e.g.,~/other) -- used as-is
Task JSON Format
The command must return JSON in one of these formats:
// Array format
[
{"id": "task1", "name": "Process A", "command": "python process.py --id=1"},
{"id": "task2", "name": "Process B", "command": "python process.py --id=2"}
]
// Object format
{
"tasks": [
{"id": "task1", "name": "Process A", "command": "python process.py --id=1"}
]
}
Task Definition Fields
| Field | Required | Description |
|---|---|---|
id |
Yes | Unique identifier for the task |
name |
Yes | Display name for the task |
command |
Yes | Shell command to execute |
deps |
No | List of task IDs this task depends on (supports wildcards) |
working_dir |
No | Working directory (default: ~); relative paths are resolved against the git clone directory for git workflows |
partition |
No | Scheduler partition/queue (default: normal). On PBS backends, the config-level queue field takes precedence |
cpus |
No | CPUs per task (default: 1) |
memory |
No | Memory allocation (default: 4G). Automatically converted to PBS format (e.g., 4G becomes 4gb) |
time_limit |
No | Time limit (default: 1:00:00) |
output_file |
No | Custom stdout log path |
error_file |
No | Custom stderr log path |
env |
No | Task-level env rules (list of EnvRule entries with set / append / init / if / include). See Environments |
generates_source |
No | Path to a JSON file this task creates on the backend; new tasks are appended to the run on completion |
Task Dependencies
Tasks can declare dependencies on other tasks via the deps field. A task will only be submitted once all its dependencies have completed successfully. If a dependency fails, the task is marked as dep_failed and skipped.
{
"tasks": [
{"id": "setup", "name": "Setup", "command": "bash setup.sh"},
{"id": "build", "name": "Build", "command": "make", "deps": ["setup"]},
{"id": "test", "name": "Test", "command": "make test", "deps": ["build"]}
]
}
Wildcard Dependencies
Dependencies support glob-style wildcard patterns (*, ?, [...]), which makes it easy to express "depend on all tasks in a group" without listing them individually.
Use dot-notation in task IDs to create logical groups, then use wildcards to depend on entire groups:
{
"tasks": [
{"id": "setup.init", "name": "Setup", "command": "bash setup.sh"},
{"id": "build.x", "name": "Build X", "command": "make x", "deps": ["setup.*"]},
{"id": "build.y", "name": "Build Y", "command": "make y", "deps": ["setup.*"]},
{"id": "final.merge", "name": "Finalize", "command": "make dist", "deps": ["build.*"]}
]
}
In this example, final.merge depends on "build.*" which automatically expands to ["build.x", "build.y"]. This is equivalent to listing them explicitly but stays correct as you add or remove tasks in the build group.
Supported patterns:
| Pattern | Matches |
|---|---|
build.* |
All tasks starting with build. |
step.? |
step.1, step.2, but not step.10 |
data.[ab] |
data.a and data.b |
Tasks with dot-notation IDs are also displayed hierarchically in the run detail UI, grouped by their prefix.
Dynamic Task Generation (generates_source)
A task can dynamically produce new tasks that get appended to the run when it completes. This is useful for two-phase workflows where the first task determines what work needs to be done (e.g., scanning a directory, querying a database) and the second phase executes that work.
To use this feature, set the generates_source field on a task to the path of a JSON file that the task will create on the backend:
{
"tasks": [
{
"id": "plan",
"name": "Plan simulations",
"command": "python plan.py --output tasks.json",
"working_dir": "~/project",
"generates_source": "tasks.json"
}
]
}
When the plan task completes, ScriptHut reads tasks.json from the backend via SSH and appends the tasks it contains to the current run. The generated JSON file uses the same format as the workflow task JSON (either {"tasks": [...]} or a bare [...] array).
| Field | Description |
|---|---|
generates_source |
Path to a JSON file the task creates on the backend. Relative paths are resolved against the task's working_dir. Absolute paths and ~-prefixed paths are used as-is. |
Generated tasks can use dependencies to control execution order. They can depend on tasks already in the run (including the generator task itself) and on other generated tasks. Wildcard dependencies are also supported:
{
"tasks": [
{
"id": "plan",
"name": "Plan",
"command": "python plan.py --output tasks.json",
"generates_source": "tasks.json"
},
{
"id": "setup",
"name": "Setup data",
"command": "bash setup.sh"
}
]
}
The generated tasks.json might contain:
{
"tasks": [
{"id": "sim-1", "name": "Sim 1", "command": "python sim.py 1", "deps": ["setup"]},
{"id": "sim-2", "name": "Sim 2", "command": "python sim.py 2", "deps": ["setup"]},
{"id": "aggregate", "name": "Aggregate", "command": "python agg.py", "deps": ["sim-*"]}
]
}
If a generated task references a dependency that doesn't exist in the run, the entire batch of generated tasks is rejected and an error is logged.
Environments
ScriptHut resolves a task's environment by walking an ordered chain of env rules from five layers — backend → server → workflow (config) → workflow (document) → task — against a seed of SCRIPTHUT_* runtime variables. The full reference lives in docs/configuration/environments.md. What follows is a quick tour.
One primitive: the EnvRule
Every env: entry, at any layer, is a rule:
env:
- set: { LOG_LEVEL: info } # always-applied
- if: { SCRIPTHUT_BACKEND: mercury } # AND across keys
set: { SCRATCH: /scratch/${USER} } # ${name} expanded
init: "module load gcc/12 cuda/11" # bash before the task
- if: { SCRIPTHUT_BACKEND: [anvil, delta] } # list = OR
init: "module load gcc cuda-toolkit"
- append: { PATH: /opt/custom/bin } # joined with ":"
Conditionals see the env as resolved so far, so a rule's if: can branch on the seed (SCRIPTHUT_BACKEND, etc.) or on values written by earlier rules.
Where rules live
| Layer | Defined in | Typical use |
|---|---|---|
| Backend | env: on each backend in scripthut.yaml |
Cluster facts: scratch path, modules bootstrap |
| Server | top-level env: in scripthut.yaml |
Org-wide defaults |
| Workflow | env: on each workflow in scripthut.yaml |
Workflow-specific overrides |
| Task | env: array in the generator's JSON output |
Per-task adjustments |
Rules concatenate in that order. set: overwrites, append: extends — later layers naturally override earlier ones.
Automatic seed (SCRIPTHUT_*)
These are populated before any user rule runs and are protected — rules cannot overwrite them:
| Variable | Always present | Description |
|---|---|---|
SCRIPTHUT_BACKEND |
yes | Backend name |
SCRIPTHUT_WORKFLOW |
yes | Workflow name |
SCRIPTHUT_RUN_ID |
yes | 8-char run id |
SCRIPTHUT_CREATED_AT |
yes | ISO timestamp |
SCRIPTHUT_GIT_REPO |
git workflows | Repo URL |
SCRIPTHUT_GIT_BRANCH |
git workflows | Branch |
SCRIPTHUT_GIT_SHA |
git workflows | Commit |
Reusable groups
Define a rule list once with env_groups: and inline it from any env: rule:
backends:
- name: mercury
env_groups:
gpu-stack: # mercury's flavor of "gpu-stack"
- init: "module load gcc/12 cuda/11"
- append: { PATH: /opt/cuda/bin }
env:
- include: [gpu-stack]
env_groups: is accepted on backends, the server, and workflows. A group defined at layer X is visible to that layer and all later layers; later layers shadow earlier ones by name. The include: rule can carry its own if: to gate the inlined rules.
Inspecting the resolved env
The task detail page in the UI has an Env tab that shows every key with full provenance (which layer / group / op contributed each value). The same data is exposed at GET /runs/{run_id}/tasks/{task_id}/env. Use this when a value isn't what you expect.
Cost Estimation
ScriptHut can estimate the cost of a run by mapping Slurm partitions to EC2 instance types and looking up pricing from instances.vantage.sh. Pricing data is fetched once and cached locally for 24 hours.
Configuration
Add a pricing section to your scripthut.yaml:
pricing:
region: us-east-1 # AWS region for price lookup
price_type: spot_avg # ondemand, spot_avg, spot_min, spot_max
partitions: # map Slurm partitions to EC2 instance types
standard: c5.xlarge
gpu: p3.2xlarge
| Field | Required | Description | Default |
|---|---|---|---|
region |
No | AWS region for pricing lookup | us-east-1 |
price_type |
No | Pricing type: ondemand, spot_avg, spot_min, spot_max |
ondemand |
partitions |
Yes | Mapping of Slurm partition names to EC2 instance types | -- |
How It Works
For each completed task with timing data from the scheduler:
cost = elapsed_hours × (task_cpus / instance_vcpus) × price_per_hour
The total estimated cost is displayed in the run detail header. Tasks on unmapped partitions or without timing data are counted separately (e.g., "8/10 tasks costed").
If the pricing section is omitted or the pricing data cannot be fetched, the cost display is simply hidden -- no errors or disruption to the rest of the UI.
Data Flow
┌─────────────────────┐
│ Run Storage │
│ (JSON files) │
└──────────┬──────────┘
│
v
┌──────────────┐ polling ┌─────────────────────────────┐ display
│ HPC Cluster │<─────────────>│ ScriptHut Server │────────────> Web UI
│ (Slurm/PBS) │ squeue/qstat │ │
└──────────────┘ │ ┌─────────────────────┐ │
^ │ │ RunManager │ │
│ sbatch/qsub │ │ - create runs │ │
│ │ │ - update states │ │
┌──────┴───────┐ │ │ - persist to JSON │ │
│ Job Backend │<──────────────│ └─────────────────────┘ │
│ (abstract) │ submit tasks │ │
└──────────────┘ └─────────────────────────────┘
^
│ create run
│
┌──────┴───────┐
│ Workflow │ (SSH command -> JSON tasks)
└──────────────┘
Architecture
src/scripthut/
├── main.py # FastAPI app, routes, background polling
├── config.py # Configuration loading (YAML + .env)
├── config_schema.py # Pydantic models for YAML schema
├── models.py # Data models (HPCJob, JobState, ConnectionStatus)
├── pricing.py # EC2-equivalent cost estimation (instances.vantage.sh)
├── ssh/
│ └── client.py # Async SSH client with connection management
├── backends/
│ ├── base.py # Abstract JobBackend interface + JobStats
│ ├── utils.py # Shared utilities (duration parsing, script body generation)
│ ├── slurm.py # Slurm implementation (squeue/sacct/sbatch/scancel/sinfo)
│ ├── pbs.py # PBS/Torque implementation (qstat/qsub/qdel/pbsnodes)
│ ├── batch.py # AWS Batch implementation (boto3 + CloudWatch Logs)
│ ├── ec2.py # AWS EC2-direct implementation (one instance per task)
│ └── ec2_ssm.py # SSM-tunnelled SSH + ephemeral EC2 Instance Connect keys
├── sources/
│ └── git.py # Git repository management with deploy keys
└── runs/
├── models.py # Run, RunItem, TaskDefinition models
├── manager.py # Run lifecycle and task submission
└── storage.py # Folder-based JSON persistence
Supported Backends
| Backend | Scheduler | Transport | Status |
|---|---|---|---|
| Slurm | sbatch, squeue, sacct, scancel, sinfo | SSH | Stable |
| PBS/Torque | qsub, qstat, qdel, pbsnodes | SSH | Stable |
| AWS Batch | AWS Batch + CloudWatch Logs API | boto3 (AWS credential chain) | Stable |
| AWS EC2 | RunInstances + DescribeInstances (one instance per task) | boto3 + SSH via SSM Session Manager | New |
| ECS | AWS ECS API | boto3 | Planned |
Adding New Backends
To add support for a new job system:
- Create a new file in
src/scripthut/backends/(e.g.,batch.py) - Implement the
JobBackendabstract class frombackends/base.py - Add a config model in
config_schema.pyand add it to theBackendConfigunion - Wire it into
init_backend()inmain.py
from scripthut.backends.base import JobBackend, JobStats
class MyBackend(JobBackend):
@property
def name(self) -> str:
return "my-backend"
async def get_jobs(self, user=None) -> list[HPCJob]: ...
async def submit_job(self, script: str) -> str: ...
async def cancel_job(self, job_id: str) -> None: ...
async def get_job_stats(self, job_ids, user=None) -> dict[str, JobStats]: ...
async def get_cluster_info(self) -> tuple[int, int] | None: ...
def generate_script(self, task, run_id, log_dir, **kwargs) -> str: ...
async def is_available(self) -> bool: ...
@property
def failure_states(self) -> dict[str, str]: ...
@property
def terminal_states(self) -> frozenset[str]: ...
Shared utilities in backends/utils.py (duration parsing, memory formatting, script body generation) can be reused across backends.
Development
# Run type checking
mypy src/
# Run linter
ruff check src/
# Run tests
pytest
Roadmap
- Phase 1: Multi-backend Slurm monitoring
- Phase 1: Git source integration with deploy keys
- Phase 2: Submit jobs to Slurm from UI (task runs)
- Phase 2: Job persistence and history
- Phase 2: Job logs viewer
- Phase 3: PBS/Torque backend support
- Phase 3: AWS Batch backend support (boto3 + CloudWatch Logs)
- Phase 3: AWS EC2-direct backend support (SSM-tunnelled SSH, one instance per task)
- Phase 3: ECS backend support
- Phase 4: Job notifications and alerts
Requirements
- Python 3.11+
- For Slurm/PBS: SSH access to the cluster's login node with key-based authentication
- For AWS Batch:
pip install 'scripthut[batch]'and AWS credentials reachable via the standard credential chain (IAM role, env vars, or~/.aws/credentialsprofile) - For AWS EC2: same boto3 install; additionally
session-manager-pluginon the scripthut host, and an AMI with sshd + SSM Agent + docker
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scripthut-0.5.0.tar.gz.
File metadata
- Download URL: scripthut-0.5.0.tar.gz
- Upload date:
- Size: 278.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0b0ed6070848b9ba36a9da0894ca8537afa8ee2d9be6e059b8c30f0f6157c42
|
|
| MD5 |
23b4d67069f2afda7fa1377dd663dbe1
|
|
| BLAKE2b-256 |
f8a3305f686ee52e49893c97da5a55453ac8c0418ad29cce69ec27cdfc37d9d4
|
Provenance
The following attestation bundles were made for scripthut-0.5.0.tar.gz:
Publisher:
publish.yml on tlamadon/scripthut
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scripthut-0.5.0.tar.gz -
Subject digest:
e0b0ed6070848b9ba36a9da0894ca8537afa8ee2d9be6e059b8c30f0f6157c42 - Sigstore transparency entry: 1592795591
- Sigstore integration time:
-
Permalink:
tlamadon/scripthut@dfab801c07bd9cfe6b450e8f4986d976dbe9da19 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/tlamadon
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dfab801c07bd9cfe6b450e8f4986d976dbe9da19 -
Trigger Event:
push
-
Statement type:
File details
Details for the file scripthut-0.5.0-py3-none-any.whl.
File metadata
- Download URL: scripthut-0.5.0-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20563e8b37a34ad4ac8e5b6ffa89f581fbe02d98c232b5103cfbdf57c118684b
|
|
| MD5 |
64da13704387e05cdff4c3feddfac876
|
|
| BLAKE2b-256 |
2cd4758750fb07613e4af22857b9030f303f3e3de87a2fa998fd44062aa0a1d0
|
Provenance
The following attestation bundles were made for scripthut-0.5.0-py3-none-any.whl:
Publisher:
publish.yml on tlamadon/scripthut
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scripthut-0.5.0-py3-none-any.whl -
Subject digest:
20563e8b37a34ad4ac8e5b6ffa89f581fbe02d98c232b5103cfbdf57c118684b - Sigstore transparency entry: 1592795840
- Sigstore integration time:
-
Permalink:
tlamadon/scripthut@dfab801c07bd9cfe6b450e8f4986d976dbe9da19 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/tlamadon
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dfab801c07bd9cfe6b450e8f4986d976dbe9da19 -
Trigger Event:
push
-
Statement type: