A Python library for distributed inference and serving of machine learning models
Project description
Tetra: Serverless GPU Computing for AI Workloads
Dynamic GPU provisioning for ML workloads with transparent execution
Overview • Installation • Quick Start • Key Features • Examples • Configuration • Troubleshooting
Overview
The Tetra-RunPod integration provides seamless access to on-demand GPU resources through RunPod's serverless platform. With a simple decorator-based API, you can execute functions on powerful GPUs without managing infrastructure, while Tetra handles all the complexity of provisioning, communication, and state management.
Installation
pip install tetra_rp
You'll need a RunPod API key to use this integration. Sign up at RunPod.io and generate an API key in your account settings. set it in ENV or save it in a local .env file:
export RUNPOD_API_KEY=<YOUR_API_KEY>
Quick Start
import os
import asyncio
from tetra_rp import remote, LiveServerless
# Configure RunPod resource
runpod_config = LiveServerless(
name="example-diffusion-server",
)
# Define a function to run on RunPod GPU
@remote(
resource_config=runpod_config,
dependencies=["torch", "numpy"]
)
def gpu_compute(data):
import torch
import numpy as np
# Convert to tensor and perform computation on GPU
tensor = torch.tensor(data, device="cuda")
result = tensor.sum().item()
# Get GPU info
gpu_info = torch.cuda.get_device_properties(0)
return {
"result": result,
"gpu_name": gpu_info.name,
"cuda_version": torch.version.cuda
}
async def main():
# Run the function on RunPod GPU
result = await gpu_compute([1, 2, 3, 4, 5])
print(f"Result: {result['result']}")
print(f"Computed on: {result['gpu_name']} with CUDA {result['cuda_version']}")
if __name__ == "__main__":
try:
asyncio.run(main())
except Exception as e:
print(f"An error occurred: {e}")
Key Features
Dynamic GPU Provisioning
Automatically provision GPUs on demand without any manual setup:
@remote(
resource_config=runpod_config,
)
def my_gpu_function(data):
# Runs on GPU when called
return process(data)
Automatic Dependency Management
Specify dependencies you need, which are automatically installed for you:
@remote(
resource_config=runpod_config,
dependencies=["torch==2.0.1", "transformers", "diffusers"]
)
def generate_image(prompt):
# Dependencies are automatically installed
from diffusers import StableDiffusionPipeline
# Generate image...
return image
Examples
See more examples here: tetra-examples
You can also install the examples as a submodule:
make examples
cd tetra-examples
python -m examples.example
python -m examples.image_gen
python -m examples.matrix_operations
Multi-Stage ML Pipeline
# Feature extraction on GPU
@remote(
resource_config=runpod_config,
dependencies=["torch", "transformers"]
)
def extract_features(texts):
import torch
from transformers import AutoTokenizer, AutoModel
# Load model
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
model.to("cuda")
# Process texts
features = []
for text in texts:
inputs = tokenizer(text, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model(**inputs)
features.append(outputs.last_hidden_state[:, 0].cpu().numpy().tolist()[0])
return features
# Classification on GPU
@remote(
resource_config=runpod_config,
dependencies=["torch", "sklearn"]
)
def classify(features, labels=None):
import torch
import numpy as np
from sklearn.linear_model import LogisticRegression
# Convert to numpy
features = np.array(features)
if labels is not None:
# Training mode
labels = np.array(labels)
classifier = LogisticRegression()
classifier.fit(features, labels)
# Save model coefficients (can't pickle sklearn model easily)
coefficients = {
"coef": classifier.coef_.tolist(),
"intercept": classifier.intercept_.tolist(),
"classes": classifier.classes_.tolist()
}
return coefficients
else:
# Inference mode (assuming coefficients are passed as first element)
coefficients = features[0]
actual_features = features[1:]
# Recreate classifier
classifier = LogisticRegression()
classifier.coef_ = np.array(coefficients["coef"])
classifier.intercept_ = np.array(coefficients["intercept"])
classifier.classes_ = np.array(coefficients["classes"])
# Predict
predictions = classifier.predict(actual_features)
probabilities = classifier.predict_proba(actual_features)
return {
"predictions": predictions.tolist(),
"probabilities": probabilities.tolist()
}
# Complete pipeline
async def text_classification_pipeline(train_texts, train_labels, test_texts):
# Extract features
train_features = await extract_features(train_texts)
test_features = await extract_features(test_texts)
# Train classifier
model = await classify(train_features, train_labels)
# Predict
predictions = await classify([model] + test_features)
return predictions
Configuration
Configuration Parameters
| Parameter | Description | Default | Example Values |
|---|---|---|---|
name |
(Required) Name for your endpoint | "" | "stable-diffusion-server" |
gpuIds |
Type of GPU to request | "any" | "any" or list of GPU IDs (comma-separated) |
gpuCount |
Number of GPUs per worker | 1 | 1, 2, 4 |
workersMin |
Minimum number of workers | 0 | Set to 1 for persistence |
workersMax |
Maximum number of workers | 3 | Higher for more concurrency |
idleTimeout |
Minutes before scaling down | 5 | 10, 30, 60 |
env |
Environment variables | None | {"HF_TOKEN": "xyz"} |
networkVolumeId |
Persistent storage ID | None | "vol_abc123" |
executionTimeoutMs |
Max execution time (ms) | 0 (no limit) | 600000 (10 min) |
scalerType |
Scaling strategy | QUEUE_DELAY | NONE, QUEUE_SIZE |
scalerValue |
Scaling parameter value | 4 | 1-10 range typical |
locations |
Preferred datacenter locations | None | "us-east,eu-central" |
Examples
See more examples in the ./examples/* folder
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tetra_rp-0.1.1.tar.gz.
File metadata
- Download URL: tetra_rp-0.1.1.tar.gz
- Upload date:
- Size: 19.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db32ff8ed7d6596ed00bf76ebf2c2f216861862e04e91917ae41cc041a87ae74
|
|
| MD5 |
32145748613422068cfb3b868db86618
|
|
| BLAKE2b-256 |
47bb26f98fdfd541ea93af5a5b7ef145bd0573de608cdc0e75b56607f52f62fa
|
File details
Details for the file tetra_rp-0.1.1-py3-none-any.whl.
File metadata
- Download URL: tetra_rp-0.1.1-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f483206c98ec465a0beac2b29e6e94588c0928a521b986c8022ede0a4112f1da
|
|
| MD5 |
b6f53d0c3310e8596fed1ae68d8a09bd
|
|
| BLAKE2b-256 |
69d18c72ee610fe690e78b6974f126375bc4878d21d60ddd4787c7d81be0202b
|