CLI for running GPU workloads, managing remote workspaces, and evaluating/optimizing kernels
Project description
Wafer CLI
Run GPU workloads, optimize kernels, and query GPU documentation.
Getting Started
# Install
cd apps/wafer-cli && uv sync
# Use staging (workspaces and other features require staging)
wafer config set api.environment staging
# Login
wafer login
# Run a command on a remote GPU
wafer remote-run -- nvidia-smi
Commands
wafer login / wafer logout / wafer whoami
Authenticate with GitHub OAuth.
wafer login # Opens browser for GitHub OAuth
wafer whoami # Show current user
wafer logout # Remove credentials
wafer remote-run
Run any command on a remote GPU.
wafer remote-run -- nvidia-smi
wafer remote-run --upload-dir ./my_code -- python3 train.py
wafer workspaces
Create and manage persistent GPU environments.
Available GPUs:
MI300X- AMD Instinct MI300X (192GB HBM3, ROCm)B200- NVIDIA Blackwell B200 (180GB HBM3e, CUDA) - default
wafer workspaces list
wafer workspaces create my-workspace --gpu B200 --wait # NVIDIA B200
wafer workspaces create amd-dev --gpu MI300X # AMD MI300X
wafer workspaces ssh <workspace-id>
wafer workspaces delete <workspace-id>
wafer agent
AI assistant for GPU kernel development. Helps with CUDA/Triton optimization, documentation queries, and performance analysis.
wafer agent "What is TMEM in CuTeDSL?"
wafer agent -s "optimize this kernel" < kernel.py
wafer evaluate
Evaluate kernel correctness and performance against a reference implementation.
Functional format (default):
# Generate template files
wafer evaluate make-template ./my-kernel
# Run evaluation
wafer evaluate --impl kernel.py --reference ref.py --test-cases tests.json --benchmark
The implementation must define custom_kernel(inputs), the reference must define ref_kernel(inputs) and generate_input(**params).
KernelBench format (ModelNew class):
# Extract a KernelBench problem as template
wafer evaluate kernelbench make-template level1/1
# Run evaluation
wafer evaluate kernelbench --impl my_kernel.py --reference problem.py --benchmark
The implementation must define class ModelNew(nn.Module), the reference must define class Model, get_inputs(), and get_init_inputs().
wafer wevin -t ask-docs
Query GPU documentation using the docs template.
wafer wevin -t ask-docs --json -s "What causes bank conflicts in shared memory?"
wafer corpus
Download documentation to local filesystem for agents to search.
wafer corpus list
wafer corpus download cuda-programming-guide
Customization
wafer remote-run options
wafer remote-run --image pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel -- python3 script.py
wafer remote-run --require-hwc -- ncu --set full python3 bench.py # Hardware counters for NCU
wafer evaluate options
wafer evaluate --impl k.py --reference r.py --test-cases t.json \
--target vultr-b200 \ # Specific GPU target
--benchmark \ # Measure performance
--profile # Enable torch.profiler + NCU
wafer push for multi-command workflows
WORKSPACE=$(wafer push ./project)
wafer remote-run --workspace-id $WORKSPACE -- python3 test1.py
wafer remote-run --workspace-id $WORKSPACE -- python3 test2.py
Profile analysis
wafer nvidia ncu analyze profile.ncu-rep
wafer nvidia nsys analyze profile.nsys-rep
Advanced
Local targets
Bypass the API and SSH directly to your own GPUs:
wafer targets list
wafer targets add ./my-gpu.toml
wafer targets default my-gpu
Defensive evaluation
Detect evaluation hacking (stream injection, lazy evaluation, etc.):
wafer evaluate --impl k.py --reference r.py --test-cases t.json --benchmark --defensive
Other tools
wafer perfetto <trace.json> --query "SELECT * FROM slice" # Perfetto SQL queries
wafer capture ./script.py # Capture execution snapshot
wafer compiler-analyze kernel.ptx # Analyze PTX/SASS
ROCm profiling (AMD GPUs)
wafer rocprof-sdk ...
wafer rocprof-systems ...
wafer rocprof-compute ...
Shell Completion
Enable tab completion for commands, options, and target names:
# Install completion (zsh/bash/fish)
wafer --install-completion
# Then restart your terminal, or source your shell config:
source ~/.zshrc # or ~/.bashrc
Now you can tab-complete:
- Commands:
wafer eva<TAB>→wafer evaluate - Options:
wafer evaluate --<TAB> - Target names:
wafer evaluate --target v<TAB>→wafer evaluate --target vultr-b200 - File paths:
wafer evaluate --impl ./<TAB>
AI Assistant Skills
Install the Wafer CLI skill to make wafer commands discoverable by your AI coding assistant:
# Install for all supported tools (Claude Code, Codex CLI, Cursor)
wafer skill install
# Install for a specific tool
wafer skill install -t cursor # Cursor
wafer skill install -t claude # Claude Code
wafer skill install -t codex # Codex CLI
# Check installation status
wafer skill status
# Uninstall
wafer skill uninstall
Installing from GitHub (Cursor)
You can also install the skill directly from GitHub in Cursor:
- Open Cursor Settings (Cmd+Shift+J / Ctrl+Shift+J)
- Navigate to Rules → Add Rule → Remote Rule (Github)
- Enter:
https://github.com/wafer-ai/skills - Cursor will automatically discover skills in
.cursor/skills/
The skill provides comprehensive guidance for GPU kernel development, including documentation lookup, trace analysis, kernel evaluation, and optimization workflows.
Requirements
- Python 3.10+
- GitHub account (for authentication)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wafer_cli-0.2.34.tar.gz.
File metadata
- Download URL: wafer_cli-0.2.34.tar.gz
- Upload date:
- Size: 252.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f46bed91078b428384f1f2a1b4a7f7acae1a2d6ef6172fc675b3aa1087227f7f
|
|
| MD5 |
5fa0e7c397fd5ca77cb5b828942470d0
|
|
| BLAKE2b-256 |
0a6e80cb8ce53f92266f18b51930fb1c50340036c2c2c21c879876b83d7fcb59
|
File details
Details for the file wafer_cli-0.2.34-py3-none-any.whl.
File metadata
- Download URL: wafer_cli-0.2.34-py3-none-any.whl
- Upload date:
- Size: 233.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32f9faf5f89c6cfd7ccf0205be0b2d0ffbf9dd819638b3d2ca91bd43e2df539a
|
|
| MD5 |
28b25964acaf2bf7b4f2ca50dfe51447
|
|
| BLAKE2b-256 |
69438a80219eee08126334310cdbd28f4988e234edb33b848bc39fd58bbe34c4
|