Skip to main content

Bootstrap AWS EC2 GPU instances for hybrid local-remote development

Project description

aws-bootstrap-g4dn


CI GitHub License PyPI - Version PyPI - Python Version

One command to go from zero to a fully configured GPU dev box on AWS โ€” with CUDA-matched PyTorch, Jupyter, SSH aliases, and a GPU benchmark ready to run.

aws-bootstrap launch          # Spot g4dn.xlarge in ~3 minutes
ssh aws-gpu1                  # You're in, venv activated, PyTorch works

โœจ Key Features

Feature Details
๐Ÿš€ One-command launch Spot (default) or on-demand, with automatic fallback on capacity errors
๐Ÿ”‘ Auto SSH config Adds aws-gpu1 alias to ~/.ssh/config โ€” no IP juggling. Cleaned up on terminate
๐Ÿ CUDA-aware PyTorch Detects the installed CUDA toolkit (nvcc) and installs PyTorch from the matching wheel index โ€” no more torch.version.cuda mismatches
โœ… PyTorch smoke test Runs a quick torch.cuda matmul after setup to verify the GPU stack works end-to-end
๐Ÿ“Š GPU benchmark included CNN (MNIST) + Transformer benchmarks with FP16/FP32/BF16 precision and tqdm progress
๐Ÿ““ Jupyter ready Lab server auto-starts as a systemd service on port 8888 โ€” just SSH tunnel and open
๐Ÿ–ฅ๏ธ status --gpu Shows CUDA toolkit version, driver max, GPU architecture, spot pricing, uptime, and estimated cost
๐ŸŒ Multi-region status status with no --region finds instances across every enabled region and labels each with its region
๐Ÿ’พ EBS data volumes Attach persistent storage at /data โ€” survives spot interruptions and termination, reattach to new instances
๐Ÿ—‘๏ธ Clean terminate Stops instances, removes SSH aliases, cleans up EBS volumes (or preserves with --keep-ebs)
๐Ÿค– Agent Skill Included Claude Code plugin lets LLM agents autonomously provision, manage, and tear down GPU instances

๐ŸŽฏ Target Workflows

  1. Jupyter server-client โ€” Jupyter runs on the instance, connect from your local browser
  2. VSCode Remote SSH โ€” opens ~/workspace with pre-configured CUDA debug/build tasks and an example .cu file
  3. NVIDIA Nsight remote debugging โ€” GPU debugging over SSH

Requirements

  1. AWS profile configured with relevant permissions (profile name can be passed via --profile or read from AWS_PROFILE env var)
  2. AWS CLI v2 โ€” see here
  3. Python 3.12+ and uv
  4. An SSH key pair (see below)

Installation

From PyPI

pip install aws-bootstrap-g4dn

With uvx (no install needed)

uvx runs the CLI directly in a temporary environment โ€” no global install required:

uvx --from aws-bootstrap-g4dn aws-bootstrap launch
uvx --from aws-bootstrap-g4dn aws-bootstrap status
uvx --from aws-bootstrap-g4dn aws-bootstrap terminate

From source (development)

git clone https://github.com/promptromp/aws-bootstrap-g4dn.git
cd aws-bootstrap-g4dn
uv venv
uv sync

All methods install the aws-bootstrap CLI.

Optional: auto-activate the venv with direnv

A sample direnv config is provided at .envrc.example. It activates the project venv (and optionally sets AWS_PROFILE) automatically when you cd into the repo:

cp .envrc.example .envrc
# edit .envrc to uncomment/set AWS_PROFILE if desired
direnv allow

.envrc is git-ignored, so your local copy stays out of version control.

SSH Key Setup

The CLI expects an Ed25519 SSH public key at ~/.ssh/id_ed25519.pub by default. If you don't have one, generate it:

ssh-keygen -t ed25519

Accept the default path (~/.ssh/id_ed25519) and optionally set a passphrase. The key pair is imported into AWS automatically on first launch.

To use a different key, pass --key-path:

aws-bootstrap launch --key-path ~/.ssh/my_other_key.pub

Robust key handling (so you never end up with an instance you can't reach):

  • Missing local key โ€” if --key-path doesn't exist, launch auto-generates an Ed25519 key pair there (instead of aborting).
  • Name collision with a different key โ€” if an AWS key pair already exists with the target --key-name but its public key differs from your local key (e.g. created from another machine), the existing AWS key pair is left untouched and your local key is imported under a deterministic derived name aws-bootstrap-key-<fp8>, which the instance is launched with. You always hold the matching private key.
  • Unreachable instance โ€” if SSH still fails with an authentication/host-key error, launch stops immediately and prints the real ssh error (no more silent 5-minute "SSH not ready" loop masking a Permission denied (publickey)).

Usage

๐Ÿš€ Launching an Instance

# Show available commands
aws-bootstrap --help

# Dry run โ€” validates AMI lookup, key import, and security group without launching
aws-bootstrap launch --dry-run

# Launch a spot g4dn.xlarge (default)
aws-bootstrap launch

# Launch on-demand in a specific region with a custom instance type
aws-bootstrap launch --on-demand --instance-type g5.xlarge --region us-east-1

# Try multiple regions in order until one has spot capacity
aws-bootstrap launch --region us-west-2 --region us-east-1 --region eu-west-1

# Keep retrying (bounded exponential backoff) until spot capacity frees up
aws-bootstrap launch --wait --wait-timeout 30m
aws-bootstrap launch --region us-west-2 --region us-east-1 --wait --wait-timeout 1h

# Launch without running the remote setup script
aws-bootstrap launch --no-setup

# Use a specific Python version in the remote venv
aws-bootstrap launch --python-version 3.13

# Use a non-default SSH port
aws-bootstrap launch --ssh-port 2222

# Attach a persistent EBS data volume (96 GB gp3, mounted at /data)
aws-bootstrap launch --ebs-storage 96

# Reattach an existing EBS volume from a previous instance
aws-bootstrap launch --ebs-volume-id vol-0abc123def456

# Use a specific AWS profile
aws-bootstrap launch --profile my-aws-profile

After launch, the CLI:

  1. Creates/attaches EBS volume (if --ebs-storage or --ebs-volume-id was specified)
  2. Adds an SSH alias (e.g. aws-gpu1) to ~/.ssh/config
  3. Runs remote setup โ€” installs utilities, creates a Python venv, installs CUDA-matched PyTorch, sets up Jupyter
  4. Mounts EBS volume at /data (if applicable โ€” formats new volumes, mounts existing ones as-is)
  5. Runs a CUDA smoke test โ€” verifies torch.cuda.is_available() and runs a quick GPU matmul
  6. Prints connection commands โ€” SSH, Jupyter tunnel, GPU benchmark, and terminate
ssh aws-gpu1                  # venv auto-activates on login

๐ŸŒ Finding Capacity (regions & --wait)

Spot InsufficientInstanceCapacity is scoped to a region and availability zone โ€” a type that's unavailable in us-west-2 right now may be plentiful in us-east-1, and capacity for a given AZ frees up continuously as other instances terminate. Two options help you get a GPU without babysitting the prompt:

  • Multiple regions โ€” pass --region more than once. Each launch attempt tries the regions in the order given, spot-first, and uses the first one with capacity:

    aws-bootstrap launch --region us-west-2 --region us-east-1 --region eu-west-1
    
  • --wait โ€” on insufficient spot capacity, keep retrying with capped, jittered exponential backoff until --wait-timeout (default 30m; accepts 90s, 30m, 1h, or bare seconds). On timeout it hard-fails (it does not silently fall back to on-demand):

    aws-bootstrap launch --region us-west-2 --region us-east-1 --wait --wait-timeout 1h
    

How --wait + multiple --region combine: a region sweep is the inner loop, backoff is the outer loop. Each cycle tries spot in every --region in order with no delay between regions; only when all regions miss does it sleep (backoff) and sweep again. So --wait --region A --region B means "try A then B instantly; if both dry, back off and retry A then B" โ€” repeating until timeout โ€” not "wait on A, then try B." Backoff escalates per sweep (not per region), region order wins every tie, and --wait-timeout is total wall-clock. See docs/capacity-and-retry.md for the full model.

Quota errors (VcpuLimitExceeded, MaxSpotInstanceCountExceeded) and SpotMaxPriceTooLow are never retried by --wait (waiting can't fix them). In multi-region mode they are not fatal on their own: the launcher prints a WARNING for that region (with a region-pinned aws-bootstrap quota โ€ฆ hint), skips it, and tries the next --region โ€” failing hard only once every region is blocked, with an aggregated message listing each region's reason and hint. Without --wait, a fully-exhausted spot pass still offers the interactive on-demand fallback (across all regions).

Region default precedence (a behavior change โ€” previously hardcoded to us-west-2): explicit --region flags โ†’ AWS_DEFAULT_REGION / active profile region โ†’ us-west-2. This applies to every command, so a profile configured for us-east-1 now operates in us-east-1 by default. The active region is shown in command output.

See docs/capacity-and-retry.md for the backoff design and recommended region lists per instance family.

๐Ÿ”ง What Remote Setup Does

The setup script runs automatically on the instance after SSH becomes available:

Step What
GPU verify Confirms nvidia-smi and nvcc are working
Utilities Installs htop, tmux, tree, jq, ffmpeg
Python venv Creates ~/venv with uv, auto-activates in ~/.bashrc. Use --python-version to pin a specific Python (e.g. 3.13)
CUDA-aware PyTorch Detects CUDA toolkit version โ†’ installs PyTorch from the matching cu{TAG} wheel index
CUDA smoke test Runs torch.cuda.is_available() + GPU matmul to verify the stack
GPU benchmark Copies gpu_benchmark.py to ~/gpu_benchmark.py
GPU smoke test notebook Copies gpu_smoke_test.ipynb to ~/gpu_smoke_test.ipynb (open in JupyterLab)
Jupyter Configures and starts JupyterLab as a systemd service on port 8888
SSH keepalive Configures server-side keepalive to prevent idle disconnects
VSCode workspace Creates ~/workspace/.vscode/ with launch.json and tasks.json (auto-detected cuda-gdb path and GPU arch), plus an example saxpy.cu

๐Ÿ“Š GPU Benchmark

A GPU throughput benchmark is pre-installed at ~/gpu_benchmark.py on every instance:

# Run both CNN and Transformer benchmarks (default)
ssh aws-gpu1 '~/venv/bin/python ~/gpu_benchmark.py'

# CNN only, quick run
ssh aws-gpu1 '~/venv/bin/python ~/gpu_benchmark.py --mode cnn --benchmark-batches 20'

# Transformer only with custom batch size
ssh aws-gpu1 '~/venv/bin/python ~/gpu_benchmark.py --mode transformer --transformer-batch-size 16'

# Run CUDA diagnostics first (tests FP16/FP32 matmul, autocast, etc.)
ssh aws-gpu1 '~/venv/bin/python ~/gpu_benchmark.py --diagnose'

# Force FP32 precision (if FP16 has issues on your GPU)
ssh aws-gpu1 '~/venv/bin/python ~/gpu_benchmark.py --precision fp32'

Reports: iterations/sec, samples/sec, peak GPU memory, and avg batch time for each model.

๐Ÿ““ Jupyter (via SSH Tunnel)

ssh -NL 8888:localhost:8888 aws-gpu1
# Then open: http://localhost:8888

Or with explicit key/IP:

ssh -i ~/.ssh/id_ed25519 -NL 8888:localhost:8888 ubuntu@<public-ip>

A GPU smoke test notebook (~/gpu_smoke_test.ipynb) is pre-installed on every instance. Open it in JupyterLab to interactively verify the CUDA stack, run FP32/FP16 matmuls, train a small CNN on MNIST, and visualise training loss and GPU memory usage.

๐Ÿ–ฅ๏ธ VSCode Remote SSH

The remote setup creates a ~/workspace folder with pre-configured CUDA debug and build tasks:

~/workspace/
โ”œโ”€โ”€ .vscode/
โ”‚   โ”œโ”€โ”€ launch.json   # CUDA debug configs (cuda-gdb path auto-detected)
โ”‚   โ””โ”€โ”€ tasks.json    # nvcc build tasks (GPU arch auto-detected, e.g. sm_75)
โ””โ”€โ”€ saxpy.cu          # Example CUDA source โ€” open and press F5 to debug

Connect directly from your terminal:

code --folder-uri vscode-remote://ssh-remote+aws-gpu1/home/ubuntu/workspace

Then install the Nsight VSCE extension on the remote when prompted. Open saxpy.cu, set a breakpoint, and press F5.

See Nsight remote profiling guide for more details on CUDA debugging and profiling workflows.

๐Ÿ“ค Structured Output

All commands support --output / -o for machine-readable output โ€” useful for scripting, piping to jq, or LLM tool-use:

# JSON output (pipe to jq)
aws-bootstrap -o json status
aws-bootstrap -o json status | jq '.instances[0].instance_id'

# YAML output
aws-bootstrap -o yaml status

# Table output
aws-bootstrap -o table status

# Works with all commands
aws-bootstrap -o json list instance-types | jq '.[].instance_type'
aws-bootstrap -o json launch --dry-run
aws-bootstrap -o json terminate --yes
aws-bootstrap -o json cleanup --dry-run

Supported formats: text (default, human-readable with color), json, yaml, table. Commands that require confirmation (terminate, cleanup) require --yes in structured output modes.

๐Ÿ“‹ Listing Resources

# List all g4dn instance types (default)
aws-bootstrap list instance-types

# List a different instance family
aws-bootstrap list instance-types --prefix p3

# List Deep Learning AMIs (default filter) โ€” each AMI is labelled with its region
aws-bootstrap list amis

# List AMIs with a custom filter
aws-bootstrap list amis --filter "ubuntu/images/hvm-ssd-gp3/ubuntu-noble*"

# Use a specific region (the active region is shown in the output header)
aws-bootstrap list instance-types --region us-east-1
aws-bootstrap list amis --region us-east-1

# --region is repeatable (-r for short) to compare across regions
aws-bootstrap list amis -r us-east-1 -r us-west-2
aws-bootstrap list instance-types --prefix g5 -r us-east-1 -r eu-west-1

list instance-types shows a Quota Family column (gvt/p/dl) โ€” the AWS vCPU quota family each type draws from. These group multiple prefixes (e.g. all G/VT types, including g5, share gvt), so the suggested --family may not look like your --prefix. The output then ends with copy-paste Next steps for that family โ€” a quota show and a quota request command pinned to the queried region โ€” so you can go straight from "is this type available?" to checking and raising your vCPU quota.

๐Ÿ–ฅ๏ธ Managing Instances

# Show all aws-bootstrap instances across every enabled region (including shutting-down).
# Each instance is labelled with its region.
aws-bootstrap status

# Include GPU info (CUDA toolkit + driver version, GPU name, architecture) via SSH
aws-bootstrap status --gpu

# Hide connection commands (shown by default for each running instance)
aws-bootstrap status --no-instructions

# Restrict the query to one region
aws-bootstrap status --region us-east-1

# Restrict to several regions (--region is repeatable, -r for short)
aws-bootstrap status --region us-east-1 --region us-west-2
aws-bootstrap status -r us-east-1 -r eu-west-1

# Terminate all aws-bootstrap instances (with confirmation prompt)
aws-bootstrap terminate

# Terminate but preserve EBS data volumes for reuse
aws-bootstrap terminate --keep-ebs

# Terminate by SSH alias (resolved via ~/.ssh/config)
aws-bootstrap terminate aws-gpu1

# Terminate by instance ID
aws-bootstrap terminate i-abc123

# Mix aliases and instance IDs
aws-bootstrap terminate aws-gpu1 i-def456

# Skip confirmation prompt
aws-bootstrap terminate --yes

# Remove stale SSH config entries for terminated instances
aws-bootstrap cleanup

# Preview what would be removed without modifying config
aws-bootstrap cleanup --dry-run

# Also find and delete orphan EBS data volumes
aws-bootstrap cleanup --include-ebs

# Preview orphan volumes without deleting
aws-bootstrap cleanup --include-ebs --dry-run

# Skip confirmation prompt
aws-bootstrap cleanup --yes

status --gpu reports both the installed CUDA toolkit version (from nvcc) and the maximum CUDA version supported by the driver (from nvidia-smi), so you can see at a glance whether they match:

CUDA: 12.8 (driver supports up to 13.0)

SSH aliases are managed automatically โ€” they're created on launch, shown in status, and cleaned up on terminate. Aliases use sequential numbering (aws-gpu1, aws-gpu2, etc.) and never reuse numbers from previous instances. You can use aliases anywhere you'd use an instance ID, e.g. aws-bootstrap terminate aws-gpu1.

EBS Data Volumes

Attach persistent EBS storage to keep datasets and model checkpoints across instance lifecycles. Volumes are mounted at /data and persist independently of the instance.

# Create a new 96 GB gp3 volume, formatted and mounted at /data
aws-bootstrap launch --ebs-storage 96

# After terminating with --keep-ebs, reattach the same volume to a new instance
aws-bootstrap terminate --keep-ebs
# Output: Preserving EBS volume: vol-0abc123...
#         Reattach with: aws-bootstrap launch --ebs-volume-id vol-0abc123...

aws-bootstrap launch --ebs-volume-id vol-0abc123def456

Key behaviors:

  • --ebs-storage and --ebs-volume-id are mutually exclusive
  • New volumes are formatted as ext4; existing volumes are mounted as-is
  • Volumes are tagged for automatic discovery by status and terminate
  • terminate deletes data volumes by default; use --keep-ebs to preserve them
  • Orphan cleanup โ€” use aws-bootstrap cleanup --include-ebs to find and delete orphan volumes (e.g. from spot interruptions or forgotten --keep-ebs volumes). Use --dry-run to preview
  • Spot-safe โ€” data volumes survive spot interruptions. If AWS reclaims your instance, the volume detaches automatically and can be reattached to a new instance with --ebs-volume-id
  • Automatic AZ matching โ€” EBS volumes are tied to a single availability zone, and an instance can only attach a volume in its own AZ. When you reattach with --ebs-volume-id, the launch automatically pins the new instance to the volume's AZ, so you never hit a "wrong AZ" attach failure. (One consequence: spot capacity is then constrained to that single AZ, so a launch may need --wait to ride out a temporary shortage. A --ebs-volume-id launch targets the volume's region.)
  • Mount failures are non-fatal โ€” the instance remains usable

EC2 vCPU Quotas

AWS accounts have service quotas that limit how many vCPUs you can run per instance family. New or lightly-used accounts often have a default quota of 0 vCPUs for GPU instance families (G and VT), which will cause errors on launch:

  • Spot: MaxSpotInstanceCountExceeded
  • On-Demand: VcpuLimitExceeded

Check your current quotas (g4dn.xlarge requires at least 4 vCPUs):

# Built-in: show all GPU family quotas
aws-bootstrap quota show

# Show only G/VT family quotas
aws-bootstrap quota show --family gvt

# Show P family quotas (P2 through P6)
aws-bootstrap quota show --family p

# The active region is shown in the output header. --region is repeatable
# (-r) to compare quotas across regions:
aws-bootstrap quota show --family gvt -r us-east-1 -r us-west-2

# Or use the AWS CLI directly:
aws service-quotas get-service-quota \
  --service-code ec2 \
  --quota-code L-3819A6DF \
  --region us-west-2

Request increases:

# `aws-bootstrap quota show` prints a ready-to-run `quota request` command
# with a --desired-value above your current quota and pinned to --region.
# The desired value must EXCEED the current quota (AWS rejects <= current),
# so pick a value accordingly (8 shown as an example):
aws-bootstrap quota show --family gvt --region us-west-2
aws-bootstrap quota request --type spot --desired-value 8 --region us-west-2

# Request a P family spot quota increase
aws-bootstrap quota request --family p --type spot --desired-value 192 --region us-west-2

# --region is repeatable: submit the same increase in several regions at once.
# All target regions are validated up front โ€” if any region's current quota is
# already >= the desired value, nothing is submitted.
aws-bootstrap quota request --type spot --desired-value 8 -r us-east-1 -r us-west-2

# Check request status (also repeatable across regions)
aws-bootstrap quota history --region us-west-2
aws-bootstrap quota history -r us-east-1 -r us-west-2

# Or use the AWS CLI directly:
aws service-quotas request-service-quota-increase \
  --service-code ec2 \
  --quota-code L-3819A6DF \
  --desired-value 8 \
  --region us-west-2

Quota codes may vary by region or account type. To list the actual codes in your region:

# List all G/VT-related quotas
aws service-quotas list-service-quotas \
  --service-code ec2 \
  --region us-west-2 \
  --query "Quotas[?contains(QuotaName, 'G and VT')].[QuotaCode,QuotaName,Value]" \
  --output table

Common quota codes:

Family Type Code Description
G/VT Spot L-3819A6DF All G and VT Spot Instance Requests
G/VT On-Demand L-DB2E81BA Running On-Demand G and VT instances
P Spot L-7212CCBC All P Spot Instance Requests
P On-Demand L-417A185B Running On-Demand P instances
DL Spot L-85EED4F7 All DL Spot Instance Requests
DL On-Demand L-6E869C2A Running On-Demand DL instances

Small increases (4-8 vCPUs) are typically auto-approved within minutes. You can also request increases via the Service Quotas console. While waiting, you can test the full launch/poll/SSH flow with a non-GPU instance type:

aws-bootstrap launch --instance-type t3.medium --ami-filter "ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-*"

Claude Code Plugin

A Claude Code plugin is included in the aws-bootstrap-skill/ directory, enabling LLM coding agents to autonomously provision and manage GPU instances.

Install from GitHub

# Add the marketplace (registers this repo as a plugin source)
/plugin marketplace add promptromp/aws-bootstrap-g4dn

# Install the plugin
/plugin install aws-bootstrap-skill@promptromp-aws-bootstrap-g4dn

Install locally (from repo checkout)

claude --plugin-dir ./aws-bootstrap-skill

See aws-bootstrap-skill/README.md for details.

Additional Resources

Topic Link
GPU instance pricing instances.vantage.sh
Spot instance quotas AWS docs
Deep Learning AMIs AWS docs
Nsight remote GPU profiling Guide โ€” Nsight Compute, Nsight Systems, and Nsight VSCE on EC2

Tutorials on setting up a CUDA environment on EC2 GPU instances:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aws_bootstrap_g4dn-0.16.0.tar.gz (162.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aws_bootstrap_g4dn-0.16.0-py3-none-any.whl (109.7 kB view details)

Uploaded Python 3

File details

Details for the file aws_bootstrap_g4dn-0.16.0.tar.gz.

File metadata

  • Download URL: aws_bootstrap_g4dn-0.16.0.tar.gz
  • Upload date:
  • Size: 162.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aws_bootstrap_g4dn-0.16.0.tar.gz
Algorithm Hash digest
SHA256 1f83208cf3d93f6678b2d7ff051c1afb2156bbf051d2495f2b6936946d4b2052
MD5 f2b789cbab2001dfcc8a51630fd64eb0
BLAKE2b-256 b60c196e6ed3c5cd49d140d5b09d943ab28b234d2416053e83bf10ee168034d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for aws_bootstrap_g4dn-0.16.0.tar.gz:

Publisher: publish-to-pypi.yml on promptromp/aws-bootstrap-g4dn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aws_bootstrap_g4dn-0.16.0-py3-none-any.whl.

File metadata

File hashes

Hashes for aws_bootstrap_g4dn-0.16.0-py3-none-any.whl
Algorithm Hash digest
SHA256 596afa18dfd1f6e1c6bf5432fe8167fd737176b42a0cd1b20477c6de5733b189
MD5 fedfd9bad40f7c5d69ef226013d0ea34
BLAKE2b-256 c3e574ac695aff6af9a1c2756f52b597f9ab40f75f601f75a6f6558691fa29a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for aws_bootstrap_g4dn-0.16.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on promptromp/aws-bootstrap-g4dn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page