A Python library for distributed inference and serving of machine learning models
Project description
Flash
Flash is a Python SDK for developing cloud-native AI apps where you define everything -- hardware, remote functions, and dependencies -- using local code.
import asyncio
from runpod_flash import Endpoint, GpuType
@Endpoint(name="hello-gpu", gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, dependencies=["torch"])
async def hello():
import torch
gpu_name = torch.cuda.get_device_name(0)
print(f"Hello from your GPU! ({gpu_name})")
return {"gpu": gpu_name}
asyncio.run(hello())
print("Done!")
Write @Endpoint decorated Python functions on your local machine. Deploy them with flash deploy, then call them by running the same script. Flash handles GPU/CPU provisioning and worker scaling on RunPod Serverless.
Setup
Install Flash
pip install runpod-flash
# or
uv add runpod-flash
Flash requires Python 3.10+ on macOS or Linux. Windows support is in development.
Authentication
flash login
This saves your API key and allows you to use the Flash CLI and call @Endpoint functions.
Coding agent integration (optional)
npx skills add runpod/skills
You can review the SKILL.md file in the runpod/skills repository.
Quickstart
Create gpu_demo.py:
import asyncio
from runpod_flash import Endpoint, GpuType
@Endpoint(
name="flash-quickstart",
gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
workers=3,
dependencies=["numpy", "torch"]
)
def gpu_matrix_multiply(size):
import numpy as np
import torch
device_name = torch.cuda.get_device_name(0)
A = np.random.rand(size, size)
B = np.random.rand(size, size)
C = np.dot(A, B)
return {
"matrix_size": size,
"result_mean": float(np.mean(C)),
"gpu": device_name
}
async def main():
print("Running matrix multiplication on RunPod GPU...")
result = await gpu_matrix_multiply(1000)
print(f"Matrix size: {result['matrix_size']}x{result['matrix_size']}")
print(f"Result mean: {result['result_mean']:.4f}")
print(f"GPU used: {result['gpu']}")
if __name__ == "__main__":
asyncio.run(main())
Deploy, then run:
flash deploy
python gpu_demo.py
How it works
Flash has two modes: deploy and dev.
Deploy and run (flash deploy + python script.py)
Deploy packages your code and provisions endpoints on RunPod. After deploying, run your script directly and Flash routes calls to your deployed endpoints via implicit resolution:
flash deploy # build, upload, provision endpoints
python gpu_demo.py # calls deployed endpoints automatically
Flash resolves endpoints by matching the app name (defaults to the current directory name) and environment (defaults to production). Configure with env vars or .env:
FLASH_APP=my-project # defaults to current directory name
FLASH_ENV=staging # defaults to "production"
Dev mode (flash dev)
For local development and testing, flash dev starts a hybrid dev server that runs your FastAPI app locally while provisioning live ephemeral workers on RunPod:
flash dev # starts local server + provisions workers
flash dev --port 3000 # custom port
flash dev --auto-provision # provision all endpoints at startup
What Flash does
- Remote execution:
@Endpointfunctions run on RunPod Serverless GPUs/CPUs - Implicit endpoint resolution:
python script.pyroutes to deployed endpoints automatically - Auto-scaling: workers scale from 0 to N based on demand
- Dependency management: packages install automatically on remote workers
- Two patterns: queue-based (
@Endpoint) for batch work, load-balanced (Endpoint()+ routes) for REST APIs - Concurrency control:
max_concurrencylets each worker process multiple jobs simultaneously
Documentation
Full documentation: docs.runpod.io/flash
- Quickstart - First GPU workload in 5 minutes
- Create endpoints - Queue-based, load-balancing, and custom Docker endpoints
- CLI reference -
flash dev,flash deploy,flash build - Configuration - All endpoint parameters
Flash apps
When you're ready to move beyond scripts and build a production-ready API, you can create a Flash app (a collection of interconnected endpoints with diverse hardware configurations) and deploy it to RunPod.
Follow this tutorial to build your first Flash app.
Flash CLI
flash --help
Learn more about the Flash CLI.
Examples
Browse working examples: github.com/runpod/flash-examples
Requirements
- Python 3.10-3.12
- macOS or Linux (Windows support in development)
- A RunPod account (email must be verified) with an API key
Contributing
We welcome contributions! See RELEASE_SYSTEM.md for development workflow.
git clone https://github.com/runpod/flash.git
cd flash
pip install -e ".[dev]"
# use conventional commits
git commit -m "feat: add new feature"
git commit -m "fix: resolve issue"
Support
- Discord - Community support
- GitHub Issues - Bug reports
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file runpod_flash-1.16.0.tar.gz.
File metadata
- Download URL: runpod_flash-1.16.0.tar.gz
- Upload date:
- Size: 210.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91f92002fca7d81eecf94ff9af9a0db9c33f9385457593f40089536f7691a5ca
|
|
| MD5 |
646eb36bb1422dbf56cab14c4453cb82
|
|
| BLAKE2b-256 |
8d9e618e60a56d4a30ee19c2e7159a9d1919980a7ec5834f27e512c34a84aceb
|
Provenance
The following attestation bundles were made for runpod_flash-1.16.0.tar.gz:
Publisher:
release-please.yml on runpod/flash
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
runpod_flash-1.16.0.tar.gz -
Subject digest:
91f92002fca7d81eecf94ff9af9a0db9c33f9385457593f40089536f7691a5ca - Sigstore transparency entry: 1398169255
- Sigstore integration time:
-
Permalink:
runpod/flash@838018d570e18e5d80f30c4cc644d94fd4feb5fc -
Branch / Tag:
refs/heads/main - Owner: https://github.com/runpod
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@838018d570e18e5d80f30c4cc644d94fd4feb5fc -
Trigger Event:
push
-
Statement type:
File details
Details for the file runpod_flash-1.16.0-py3-none-any.whl.
File metadata
- Download URL: runpod_flash-1.16.0-py3-none-any.whl
- Upload date:
- Size: 250.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0959b9cf0a7f1ff468d93a3c14ef6c8325613401f571b6ec0aa773c4f6d1b944
|
|
| MD5 |
b09aac77927b9bc7438e499b8c97cba8
|
|
| BLAKE2b-256 |
2f88af365194827766d34951392a83e8ea15c292bcf9ca660ac4adfcb795ffe1
|
Provenance
The following attestation bundles were made for runpod_flash-1.16.0-py3-none-any.whl:
Publisher:
release-please.yml on runpod/flash
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
runpod_flash-1.16.0-py3-none-any.whl -
Subject digest:
0959b9cf0a7f1ff468d93a3c14ef6c8325613401f571b6ec0aa773c4f6d1b944 - Sigstore transparency entry: 1398169264
- Sigstore integration time:
-
Permalink:
runpod/flash@838018d570e18e5d80f30c4cc644d94fd4feb5fc -
Branch / Tag:
refs/heads/main - Owner: https://github.com/runpod
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@838018d570e18e5d80f30c4cc644d94fd4feb5fc -
Trigger Event:
push
-
Statement type: