Python library to manage RunPod pods
Project description
RunPodManager
A Python library for seamless RunPod GPU pod management and workflow automation
RunPodManager simplifies the process of creating, managing, and executing workflows on RunPod GPU pods. Whether you're training machine learning models, running experiments, or need remote GPU compute, RunPodManager provides an intuitive Python interface to handle everything from pod provisioning to SSH operations and port forwarding.
Features
- Complete Pod Lifecycle Management: Create, connect, stop, resume, and terminate pods programmatically
- Bidirectional Data Transfer: Upload and download files and directories between local machine and pods via SCP
- Remote Command Execution: Run commands on pods with real-time output streaming
- SSH Port Forwarding: Forward ports for services like Jupyter, TensorBoard, or web applications
- Background Process Management: Launch long-running processes with automatic port forwarding
- Smart Pod State Checking: Verify pod existence and running status before operations
- Flexible Configuration: Support for custom Docker images, GPU types, volumes, and environment variables
Installation
pip install runpodmanager
Quick Start
from runpodmanager import RunPodManager
import os
# Initialize with your RunPod API key
manager = RunPodManager(api_key=os.getenv("RUNPOD_API_KEY"))
# Create a pod
pod_config = {
"name": "my-gpu-pod",
"image_name": "runpod/pytorch:2.8.0-py3.11-cuda12.8.1-cudnn-devel-ubuntu22.04",
"gpu_type_id": "NVIDIA RTX 2000 Ada Generation",
"gpu_count": 1,
"cloud_type": "ALL",
"support_public_ip": True,
"start_ssh": True,
}
manager.create_pod(pod_config)
# Wait for pod to be ready
import time
while not manager.is_pod_running():
print("Waiting for pod to start...")
time.sleep(5)
# Execute a command
manager.execute_command("nvidia-smi")
# Terminate when done
manager.terminate_pod()
Usage Guide
Initialization
You can provide your RunPod API key in two ways:
# Option 1: Pass directly
manager = RunPodManager(api_key="your-api-key")
# Option 2: Set environment variable RUNPOD_API_KEY
manager = RunPodManager()
Creating a Pod
Create a fully configured pod with custom settings:
pod_config = {
"name": "training-pod",
"image_name": "runpod/pytorch:2.8.0-py3.11-cuda12.8.1-cudnn-devel-ubuntu22.04",
"gpu_type_id": "NVIDIA RTX 2000 Ada Generation",
"gpu_count": 1,
"cloud_type": "ALL", # Options: "ALL", "SECURE", "COMMUNITY"
"support_public_ip": True,
"start_ssh": True,
"volume_in_gb": 50,
"container_disk_in_gb": 50,
"min_vcpu_count": 1,
"docker_args": "",
"ports": "8888/http,6006/http,22/tcp",
"env": {
"HUGGINGFACE_TOKEN": os.getenv("HUGGINGFACE_TOKEN"),
"WANDB_API_KEY": os.getenv("WANDB_API_KEY"),
}
}
manager.create_pod(pod_config)
Connecting to an Existing Pod
# Connect to a pod you created previously
manager.connect_to_pod(pod_id="your-pod-id")
# Check if pod is running
if manager.is_pod_running():
print("Pod is ready!")
Pod Lifecycle Management
# Stop a running pod (saves costs when not in use)
manager.stop_pod()
# Resume a stopped pod
manager.resume_pod()
# Terminate a pod permanently
manager.terminate_pod()
# Check pod status
if manager.pod_exists():
print("Pod exists")
if manager.is_pod_running():
print("Pod is running")
Transferring Data to Pod
Upload local files or directories to your pod:
# Transfer a single file
manager.transfer_data_to_pod(
local_path="./model.py",
remote_path="/workspace/"
)
# Transfer a directory recursively
manager.transfer_data_to_pod(
local_path="./dataset/",
remote_path="/workspace/data/"
)
# Transfer to home directory
manager.transfer_data_to_pod(
local_path="./training_script.py",
remote_path="" # Defaults to home directory
)
Downloading Data from Pod
Download files or directories from your pod to your local machine:
# Download a single file
manager.download_data_from_pod(
remote_path="/workspace/model.pth",
local_path="./models/"
)
# Download a directory recursively
manager.download_data_from_pod(
remote_path="/workspace/results/",
local_path="./local_results/"
)
# Download to current directory
manager.download_data_from_pod(
remote_path="/workspace/logs/training.log",
local_path="." # Defaults to current directory
)
# Download training outputs
manager.download_data_from_pod(
remote_path="runs", # TensorBoard logs
local_path="./tensorboard_logs/"
)
Executing Commands
Foreground Execution (with real-time output)
# Run a command and see output in real-time
manager.execute_command("pip install transformers accelerate")
# Execute a training script
manager.execute_command("python train.py --epochs 10 --batch-size 32")
Background Execution
# Run a command in the background
manager.execute_command(
command="python long_running_task.py",
background=True
)
Background Execution with Port Forwarding
Perfect for Jupyter, TensorBoard, or web applications:
# Start TensorBoard with port forwarding
tb_process = manager.execute_command(
command="tensorboard --logdir=runs --port=6006 --bind_all",
background=True,
port_forward=(6006, 6006) # (local_port, remote_port)
)
# Now access TensorBoard at http://localhost:6006
print("TensorBoard running at http://localhost:6006")
Complete Workflow Example
Here's a complete example that demonstrates a typical machine learning workflow:
from runpodmanager import RunPodManager
import time
import os
# Initialize
manager = RunPodManager()
# Create a pod with TensorBoard port exposed
pod_config = {
"name": "ml-training-pod",
"image_name": "runpod/pytorch:2.8.0-py3.11-cuda12.8.1-cudnn-devel-ubuntu22.04",
"gpu_type_id": "NVIDIA RTX 2000 Ada Generation",
"gpu_count": 1,
"cloud_type": "ALL",
"support_public_ip": True,
"start_ssh": True,
"volume_in_gb": 50,
"container_disk_in_gb": 50,
"ports": "6006/http,22/tcp",
}
print("Creating pod...")
manager.create_pod(pod_config)
# Wait for pod to be ready
print("Waiting for pod to start...")
while not manager.is_pod_running():
time.sleep(5)
print("Pod is running!")
# Transfer training script
print("Transferring training script...")
manager.transfer_data_to_pod(
local_path="./train.py",
remote_path="/workspace/"
)
# Install dependencies
print("Installing dependencies...")
manager.execute_command("pip install tensorboard torch torchvision")
# Start TensorBoard with port forwarding
print("Starting TensorBoard...")
tb_process = manager.execute_command(
command="tensorboard --logdir=runs --port=6006 --bind_all",
background=True,
port_forward=(6006, 6006)
)
print("TensorBoard available at http://localhost:6006")
# Run training
print("Starting training...")
manager.execute_command("cd /workspace && python train.py")
# Download training results
print("Downloading training results...")
manager.download_data_from_pod(
remote_path="runs",
local_path="./training_results/"
)
# Training complete, terminate pod
print("Training complete! Terminating pod...")
manager.terminate_pod()
print("Done!")
API Reference
RunPodManager(api_key: Optional[str] = None)
Initialize the RunPodManager.
Parameters:
api_key(str, optional): RunPod API key. If not provided, reads fromRUNPOD_API_KEYenvironment variable.
Raises:
ValueError: If API key is not provided and not found in environment variables.
create_pod(pod_config: dict) -> None
Creates a new RunPod pod.
Parameters:
pod_config(dict): Pod configuration dictionary. See Configuration Options below.
Returns: None (sets self.pod_id)
connect_to_pod(pod_id: str) -> None
Connects to an existing pod.
Parameters:
pod_id(str): The ID of the pod to connect to.
Raises:
ValueError: If the pod does not exist.
stop_pod() -> None
Stops the current pod (can be resumed later).
resume_pod() -> None
Resumes a stopped pod with the same GPU configuration.
terminate_pod() -> None
Permanently terminates the current pod.
pod_exists() -> bool
Checks if the current pod exists.
Returns: True if pod exists, False otherwise.
is_pod_running() -> bool
Checks if the current pod is running.
Returns: True if pod is running, False otherwise.
transfer_data_to_pod(local_path: str, remote_path: str = "") -> None
Transfers local files or directories to the pod via SCP.
Parameters:
local_path(str): Path to local file or directory.remote_path(str, optional): Destination path on pod. Defaults to home directory.
Raises:
ValueError: If pod is not running.Exception: If transfer fails.
download_data_from_pod(remote_path: str, local_path: str = ".") -> None
Downloads files or directories from the pod to local machine via SCP.
Parameters:
remote_path(str): Path to file or directory on the pod.local_path(str, optional): Local destination path. Defaults to current directory.
Raises:
ValueError: If pod is not running.Exception: If download fails.
execute_command(command: str, background: bool = False, port_forward: tuple[int, int] = None)
Executes a command on the pod via SSH.
Parameters:
command(str): Command to execute.background(bool, optional): IfTrue, runs command in background. Default isFalse.port_forward(tuple[int, int], optional): Tuple of(local_port, remote_port)for SSH port forwarding.
Returns:
subprocess.Popenobject ifbackground=Truewithport_forward- Return code (int) otherwise
Raises:
ValueError: If pod is not running.
Configuration Options
The pod_config dictionary supports the following options:
| Parameter | Type | Description |
|---|---|---|
name |
str | Name for your pod |
image_name |
str | Docker image (e.g., runpod/pytorch:2.8.0-py3.11-cuda12.8.1-cudnn-devel-ubuntu22.04) |
gpu_type_id |
str | GPU type (e.g., NVIDIA RTX 2000 Ada Generation, NVIDIA A100 80GB PCIe) |
gpu_count |
int | Number of GPUs |
cloud_type |
str | "ALL", "SECURE", or "COMMUNITY" |
support_public_ip |
bool | Enable public IP address |
start_ssh |
bool | Enable SSH access (required for RunPodManager operations) |
volume_in_gb |
int | Persistent volume size in GB |
container_disk_in_gb |
int | Container disk size in GB |
min_vcpu_count |
int | Minimum vCPU count |
docker_args |
str | Additional Docker arguments |
ports |
str | Port mappings (e.g., "8888/http,6006/http,22/tcp") |
env |
dict | Environment variables |
Best Practices
-
Always Set SSH: Ensure
start_ssh: Truein your pod configuration, as it's required for all RunPodManager operations. -
Wait for Pod Ready: Always check
is_pod_running()before executing commands or transferring data:while not manager.is_pod_running(): time.sleep(5)
-
Use Environment Variables: Store sensitive data like API keys in environment variables:
pod_config = { "env": { "API_KEY": os.getenv("MY_API_KEY") } }
-
Stop vs Terminate: Use
stop_pod()to pause a pod and save costs, thenresume_pod()later. Useterminate_pod()only when completely done. -
Port Forwarding for Services: Use background execution with port forwarding for interactive services:
manager.execute_command( "jupyter lab --ip=0.0.0.0 --port=8888 --no-browser", background=True, port_forward=(8888, 8888) )
-
Transfer Before Execute: Always transfer your code/data before running commands:
manager.transfer_data_to_pod("./code", "/workspace/") manager.execute_command("cd /workspace/code && python main.py")
-
Download Results After Processing: Remember to download your results before terminating the pod:
# Download model checkpoints, logs, and results manager.download_data_from_pod("/workspace/results", "./local_results") manager.download_data_from_pod("runs", "./tensorboard_logs") manager.terminate_pod()
Troubleshooting
SSH Connection Issues
If you encounter SSH connection errors:
- Ensure
start_ssh: Truein your pod configuration - Wait for the pod to be fully running with
is_pod_running() - Check that
support_public_ip: Trueis set
Port Forwarding Not Working
- Verify the port is exposed in pod configuration:
"ports": "6006/http,22/tcp" - Ensure you're using
background=Truewithport_forward - Check that no other service is using the local port
File Transfer Failures
- Confirm the pod is running before transferring
- Verify local file paths are correct
- Ensure sufficient disk space on the pod
Requirements
- Python 3.11+
runpod>=1.7.13- SSH client (scp, ssh) installed on your system
License
This project is licensed under the MIT License.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file runpodmanager-0.1.1.tar.gz.
File metadata
- Download URL: runpodmanager-0.1.1.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc26e1d91030739305a8880087fa3433217e4b5e429a28413c5d299db3950210
|
|
| MD5 |
974ee200398f8bf594ca422ccbf7205a
|
|
| BLAKE2b-256 |
6ab4b9958a9746669842affb4a29bc12e9e03faabd64c4868a44c34277480589
|
File details
Details for the file runpodmanager-0.1.1-py3-none-any.whl.
File metadata
- Download URL: runpodmanager-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ffc1dd1e134b0ad4f5ea86d41de9e0bfd626a138ccce046eba53c3a16f127b8
|
|
| MD5 |
f68a1ce63af9d762ba155e75feacf48d
|
|
| BLAKE2b-256 |
8831f36dc7a8a16c540591466115b1da114fcefa5f70fb5a3d77ec8ba048495f
|