Skip to main content

Research infrastructure management tooling.

Project description

❄️ brr ❄️

Opinionated research infrastructure tooling. Launch clusters, get SSH access, start building.

Features

  • Shared filesystem — All nodes share $HOME via EFS (AWS) or virtiofs (Nebius).
  • Coding tools — Install Claude Code, Codex, or Gemini. Connect with e.g. brr attach dev claude
  • Autoscaling — Ray-based cluster scaling with cached instances.
  • Project-based workflows — Per-repo cluster configs and project-specific dependencies.
  • Auto-shutdown — Monitors CPU, GPU, and SSH activity. Shuts down idle instances to save costs.
  • Dotfiles integration — Take your dev environment (vim, tmux, shell config) to every cluster node.

Prerequisites

  • uv (for installation)

Quick Start

# Install (AWS only)
uv tool install brr-cli[aws]

# Install (both providers)
# uv tool install brr-cli[aws,nebius]

# Configure (interactive wizard)
brr configure      # or: brr configure nebius

# Launch a GPU instance
brr up aws:l4

# brr up nebius:h100

# Connect
brr attach aws:l4                # SSH
brr attach aws:l4 claude         # Claude Code on the cluster
brr vscode aws:l4                # VS Code remote

Built-in templates use provider:name syntax (e.g. aws:l4). Inside a project, short names like brr up dev work automatically.

Supported clouds: AWS · Nebius

Projects

For per-repo cluster configs, initialize a project:

cd my-research-repo/
brr init

This creates:

.brr/
  setup.sh          # Project-specific dependencies (shared across providers)
  aws/
    dev.yaml        # Single GPU for development
    cluster.yaml    # CPU head + GPU workers

Templates are Ray cluster YAML — edit them or add your own. Inside a project, use short names:

brr up dev              # launches .brr/aws/dev.yaml
brr up cluster          # launches .brr/aws/cluster.yaml
brr attach dev          # SSH into dev cluster
brr down dev            # tear down

If your project uses uv, brr init automatically adds brr-cli and ray to a brr dependency group. The cluster uses your project-locked versions — no manual setup needed.

All global config lives in ~/.brr/config.env.

Templates

See docs/templates.md for the full template reference (placeholders, injection, overrides, Nebius fields).

Built-in templates

Template Instance GPU Workers
aws:cpu t3.2xlarge 0-2
aws:l4 gr6.4xlarge 1x L4
aws:h100 p5.4xlarge 1x H100
aws:cpu-l4 t3.2xlarge + g6.4xlarge 1x L4 0-4
nebius:cpu 8vcpu-32gb 0-2
nebius:h100 1gpu-16vcpu-200gb 1x H100
nebius:cpu-h100s 8vcpu-32gb + 8gpu-128vcpu-1600gb 8x H100 0-4

Overrides

Override template values inline:

brr up aws:cpu instance_type=t3.xlarge max_workers=4
brr up aws:l4 spot=true
brr up dev region=us-west-2

Preview the rendered config without launching:

brr up dev --dry-run

See available overrides for a template:

brr templates show dev

Multi-provider

Use the provider prefix for built-in templates:

brr up aws:l4
brr up nebius:h100
brr attach nebius:h100
brr down nebius:h100

Both providers can run simultaneously. For projects with multiple providers, use the prefix: brr up aws:dev.

Customization

Node setup

The built-in setup.sh runs on every node boot. It installs packages, mounts shared storage, sets up Python/Ray, GitHub SSH keys, AI coding tools, dotfiles, and the idle shutdown daemon. It updates automatically when you upgrade brr.

Project-specific dependencies go in .brr/setup.sh (created by brr init), which runs after the global setup.

uv integration

brr wraps the uv binary to route virtual environments away from the shared EFS home directory:

Environment variable Value Purpose
UV_CACHE_DIR /tmp/uv Download cache (per-instance)
UV_PYTHON_INSTALL_DIR /tmp/uv/python Managed Python builds (per-instance)
UV_PROJECT_ENVIRONMENT /tmp/venvs/{project} Project venvs (per-instance)

The wrapper lives at ~/.local/bin/uv and delegates to the real binary at ~/.local/lib/uv. Both persist on EFS so new instances reuse them without reinstalling. Only caches, Python builds, and venvs are per-instance (rebuilt on boot from lockfiles).

For uv-managed projects, Ray runs inside the project venv via uv run --group brr ray start. For non-uv clusters, Ray runs from a standalone venv at /tmp/brr/venv.

AI coding tools

Install AI coding assistants on every cluster node:

brr configure tools    # select Claude Code, Codex, and/or Gemini CLI

Then connect and start coding:

brr up dev
brr attach dev claude

Dotfiles

Set a dotfiles repo to sync your dev environment to every node:

brr config set DOTFILES_REPO "https://github.com/user/dotfiles"

The repo is cloned to ~/dotfiles and installed via install.sh (if present) or GNU Stow.

Image baking

Bake the global setup into AMIs/images for fast boot:

brr bake aws          # bake both CPU + GPU AMIs
brr bake status       # check if baked images are up to date

After baking, clusters boot from the pre-built image. Only project-specific deps need to install. brr up warns when setup.sh has changed since the last bake.

Idle shutdown

A systemd daemon monitors CPU, GPU, and SSH activity. When all signals are idle for the configured timeout, the instance shuts down.

Configure in ~/.brr/config.env:

IDLE_SHUTDOWN_ENABLED="true"
IDLE_SHUTDOWN_TIMEOUT_MIN="30"
IDLE_SHUTDOWN_CPU_THRESHOLD="10"
IDLE_SHUTDOWN_GRACE_MIN="15"

The grace period prevents shutdown during initial setup. Monitor on a node with journalctl -u idle-shutdown -f.

Node caching

By default, Nebius nodes are deleted on scale-down. Unlike AWS, stopped Nebius instances still incur disk charges, so deleting is cheaper.

To keep nodes stopped instead (faster restart, but you pay for disks while idle), enable caching in your template's provider config:

provider:
  cache_stopped_nodes: true

AWS nodes are cached (stopped) by default.

Commands

Command Description
brr up TEMPLATE [OVERRIDES...] Launch or update a cluster (aws:l4, dev, or path.yaml)
brr up TEMPLATE --dry-run Preview rendered config without launching
brr down TEMPLATE Stop a cluster (instances preserved for fast restart)
brr down TEMPLATE --delete Terminate all instances and remove staging files
brr attach TEMPLATE [COMMAND] SSH into head node, optionally run a command (e.g. claude)
brr list [--all] List clusters (project-scoped by default, --all for everything)
brr clean [TEMPLATE] Terminate stopped (cached) instances
brr vscode TEMPLATE Open VS Code on a running cluster
brr templates list List built-in templates
brr templates show TEMPLATE Show template config and overrides
brr init Initialize a project (interactive provider selection)
brr configure [cloud|tools|general] Interactive setup (cloud provider, AI tools, settings)
brr config [list|get|set|path] View and manage configuration
brr bake [aws|nebius] Bake setup into cloud images
brr bake status Check if baked images are up to date
brr completion [bash|zsh|fish] Shell completion (--install to add to shell rc)
brr nuke [aws|nebius] Tear down all cloud resources

Cloud Setup

AWS Setup

  1. Attach the IAM policy to your IAM user
  2. Install the AWS CLI and run aws configure
  3. (Optional) For GitHub SSH access on clusters, authenticate the GitHub CLI:
    gh auth login
    gh auth refresh -h github.com -s admin:public_key
    
  4. Run the setup wizard:
    brr configure aws
    

Nebius Setup

  1. Install the Nebius CLI and run nebius init
  2. Create a service account with editor permissions:
    TENANT_ID="<your-tenant-id>"  # from console.nebius.com → Administration
    
    SA_ID=$(nebius iam service-account create \
      --name brr-cluster --format json | jq -r '.metadata.id')
    
    EDITORS_GROUP_ID=$(nebius iam group get-by-name \
      --name editors --parent-id $TENANT_ID --format json | jq -r '.metadata.id')
    
    nebius iam group-membership create \
      --parent-id $EDITORS_GROUP_ID --member-id $SA_ID
    
  3. Generate credentials:
    mkdir -p ~/.nebius
    nebius iam auth-public-key generate \
      --service-account-id $SA_ID --output ~/.nebius/credentials.json
    
  4. Run the setup wizard:
    brr configure nebius
    

Acknowledgments

This project started as a fork of aws_wiz by Bes and has been inspired by discussions with colleagues from the Encode: AI for Science Fellowship.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

brr_cli-0.4.2.tar.gz (197.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

brr_cli-0.4.2-py3-none-any.whl (81.5 kB view details)

Uploaded Python 3

File details

Details for the file brr_cli-0.4.2.tar.gz.

File metadata

  • Download URL: brr_cli-0.4.2.tar.gz
  • Upload date:
  • Size: 197.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for brr_cli-0.4.2.tar.gz
Algorithm Hash digest
SHA256 7139fb0d9775502eca9333e2f5cb00bceb44ada8f42f758e5685236a4b2b55e0
MD5 d66a5a80d0bf1c8a9b72e5791983b2a0
BLAKE2b-256 8f0c38a178aba992358d9fa65c3e575f237a25b3ecdfe5477722e983ff55228a

See more details on using hashes here.

Provenance

The following attestation bundles were made for brr_cli-0.4.2.tar.gz:

Publisher: publish.yml on joncarter1/brr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file brr_cli-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: brr_cli-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 81.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for brr_cli-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a5f918e4f3f39161fb9eaf470d3b99df93ebe81eb790dc52abcdbf89dff75d2b
MD5 1e0cfc43719ca43cc77b39626faee9d6
BLAKE2b-256 bdaff2f384a66024ec78ff507dbe0f1d4df1336f1342cae6d1a68368cfb0422c

See more details on using hashes here.

Provenance

The following attestation bundles were made for brr_cli-0.4.2-py3-none-any.whl:

Publisher: publish.yml on joncarter1/brr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page