A Python library for distributed inference and serving of machine learning models

These details have not been verified by PyPI

Project description

Tetra: Serverless GPU Computing for AI Workloads

Dynamic GPU provisioning for ML workloads with transparent execution

Overview • Installation • Quick Start • Key Features • Examples • Configuration • Troubleshooting

Overview

The Tetra-RunPod integration provides seamless access to on-demand GPU resources through RunPod's serverless platform. With a simple decorator-based API, you can execute functions on powerful GPUs without managing infrastructure, while Tetra handles all the complexity of provisioning, communication, and state management.

Installation

pip install tetra_rp

You'll need a RunPod API key to use this integration. Sign up at RunPod.io and generate an API key in your account settings. set it in ENV or save it in a local .env file:

export RUNPOD_API_KEY=<YOUR_API_KEY>

Quick Start

import os
import asyncio
from tetra_rp import remote, LiveServerless

# Configure RunPod resource
runpod_config = LiveServerless(
    name="example-diffusion-server",
)

# Define a function to run on RunPod GPU
@remote(
    resource_config=runpod_config,
    dependencies=["torch", "numpy"]
)
def gpu_compute(data):
    import torch
    import numpy as np
    
    # Convert to tensor and perform computation on GPU
    tensor = torch.tensor(data, device="cuda")
    result = tensor.sum().item()
    
    # Get GPU info
    gpu_info = torch.cuda.get_device_properties(0)
    
    return {
        "result": result,
        "gpu_name": gpu_info.name,
        "cuda_version": torch.version.cuda
    }

async def main():
    # Run the function on RunPod GPU
    result = await gpu_compute([1, 2, 3, 4, 5])
    print(f"Result: {result['result']}")
    print(f"Computed on: {result['gpu_name']} with CUDA {result['cuda_version']}")

if __name__ == "__main__":
    try:
        asyncio.run(main())
    except Exception as e:
        print(f"An error occurred: {e}")

Key Features

Dynamic GPU Provisioning

Automatically provision GPUs on demand without any manual setup:

@remote(
    resource_config=runpod_config,
)
def my_gpu_function(data):
    # Runs on GPU when called
    return process(data)

Automatic Dependency Management

Specify dependencies you need, which are automatically installed for you:

@remote(
    resource_config=runpod_config,
    dependencies=["torch==2.0.1", "transformers", "diffusers"]
)
def generate_image(prompt):
    # Dependencies are automatically installed
    from diffusers import StableDiffusionPipeline
    # Generate image...
    return image

Examples

See more examples here: tetra-examples

You can also install the examples as a submodule:

make examples
cd tetra-examples
python -m examples.example
python -m examples.image_gen
python -m examples.matrix_operations

Multi-Stage ML Pipeline

# Feature extraction on GPU
@remote(
    resource_config=runpod_config,
    dependencies=["torch", "transformers"]
)
def extract_features(texts):
    import torch
    from transformers import AutoTokenizer, AutoModel
    
    # Load model
    tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
    model = AutoModel.from_pretrained("bert-base-uncased")
    model.to("cuda")
    
    # Process texts
    features = []
    for text in texts:
        inputs = tokenizer(text, return_tensors="pt").to("cuda")
        with torch.no_grad():
            outputs = model(**inputs)
        features.append(outputs.last_hidden_state[:, 0].cpu().numpy().tolist()[0])
    
    return features

# Classification on GPU
@remote(
    resource_config=runpod_config,
    dependencies=["torch", "sklearn"]
)
def classify(features, labels=None):
    import torch
    import numpy as np
    from sklearn.linear_model import LogisticRegression
    
    # Convert to numpy
    features = np.array(features)
    
    if labels is not None:
        # Training mode
        labels = np.array(labels)
        classifier = LogisticRegression()
        classifier.fit(features, labels)
        
        # Save model coefficients (can't pickle sklearn model easily)
        coefficients = {
            "coef": classifier.coef_.tolist(),
            "intercept": classifier.intercept_.tolist(),
            "classes": classifier.classes_.tolist()
        }
        
        return coefficients
    else:
        # Inference mode (assuming coefficients are passed as first element)
        coefficients = features[0]
        actual_features = features[1:]
        
        # Recreate classifier
        classifier = LogisticRegression()
        classifier.coef_ = np.array(coefficients["coef"])
        classifier.intercept_ = np.array(coefficients["intercept"])
        classifier.classes_ = np.array(coefficients["classes"])
        
        # Predict
        predictions = classifier.predict(actual_features)
        probabilities = classifier.predict_proba(actual_features)
        
        return {
            "predictions": predictions.tolist(),
            "probabilities": probabilities.tolist()
        }

# Complete pipeline
async def text_classification_pipeline(train_texts, train_labels, test_texts):
    # Extract features
    train_features = await extract_features(train_texts)
    test_features = await extract_features(test_texts)
    
    # Train classifier
    model = await classify(train_features, train_labels)
    
    # Predict
    predictions = await classify([model] + test_features)
    
    return predictions

Configuration

Configuration Parameters

Parameter	Description	Default	Example Values
`name`	(Required) Name for your endpoint	""	"stable-diffusion-server"
`gpuIds`	Type of GPU to request	"any"	"any" or list of GPU IDs (comma-separated)
`gpuCount`	Number of GPUs per worker	1	1, 2, 4
`workersMin`	Minimum number of workers	0	Set to 1 for persistence
`workersMax`	Maximum number of workers	3	Higher for more concurrency
`idleTimeout`	Minutes before scaling down	5	10, 30, 60
`env`	Environment variables	None	`{"HF_TOKEN": "xyz"}`
`networkVolumeId`	Persistent storage ID	None	"vol_abc123"
`executionTimeoutMs`	Max execution time (ms)	0 (no limit)	600000 (10 min)
`scalerType`	Scaling strategy	QUEUE_DELAY	NONE, QUEUE_SIZE
`scalerValue`	Scaling parameter value	4	1-10 range typical
`locations`	Preferred datacenter locations	None	"us-east,eu-central"

Examples

See more examples in the ./examples/* folder

License

This project is licensed under the MIT License - see the LICENSE file for details.

Tetra • RunPod

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.25.2

Feb 4, 2026

0.25.1

Feb 3, 2026

0.25.0

Feb 2, 2026

0.24.0

Jan 31, 2026

0.23.1

Jan 29, 2026

0.23.0

Jan 29, 2026

0.22.0

Jan 27, 2026

0.21.0

Jan 27, 2026

0.20.1

Jan 22, 2026

0.20.0

Jan 16, 2026

0.19.0

Dec 12, 2025

0.18.0

Dec 1, 2025

0.17.1

Nov 19, 2025

0.17.0

Nov 19, 2025

0.16.1

Nov 14, 2025

0.16.0

Nov 14, 2025

0.15.0

Nov 14, 2025

0.14.0

Nov 1, 2025

0.13.0

Oct 10, 2025

0.12.0

Sep 18, 2025

0.11.0

Aug 19, 2025

0.10.0

Aug 7, 2025

0.9.0

Aug 4, 2025

0.8.0

Jul 22, 2025

0.7.0

Jul 22, 2025

0.6.0

Jul 10, 2025

0.5.5

Jul 10, 2025

0.5.0

Jul 9, 2025

0.4.2

Jun 26, 2025

0.4.1

Jun 26, 2025

0.3.0

Jun 23, 2025

0.2.0

Jun 17, 2025

0.1.5

May 7, 2025

0.1.4

May 5, 2025

0.1.3

May 3, 2025

0.1.2

Apr 28, 2025

This version

0.1.1

Apr 27, 2025

0.1.0

Apr 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tetra_rp-0.1.1.tar.gz (19.6 kB view details)

Uploaded Apr 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tetra_rp-0.1.1-py3-none-any.whl (22.3 kB view details)

Uploaded Apr 27, 2025 Python 3

File details

Details for the file tetra_rp-0.1.1.tar.gz.

File metadata

Download URL: tetra_rp-0.1.1.tar.gz
Upload date: Apr 27, 2025
Size: 19.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.14

File hashes

Hashes for tetra_rp-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`db32ff8ed7d6596ed00bf76ebf2c2f216861862e04e91917ae41cc041a87ae74`
MD5	`32145748613422068cfb3b868db86618`
BLAKE2b-256	`47bb26f98fdfd541ea93af5a5b7ef145bd0573de608cdc0e75b56607f52f62fa`

See more details on using hashes here.

File details

Details for the file tetra_rp-0.1.1-py3-none-any.whl.

File metadata

Download URL: tetra_rp-0.1.1-py3-none-any.whl
Upload date: Apr 27, 2025
Size: 22.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.14

File hashes

Hashes for tetra_rp-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f483206c98ec465a0beac2b29e6e94588c0928a521b986c8022ede0a4112f1da`
MD5	`b6f53d0c3310e8596fed1ae68d8a09bd`
BLAKE2b-256	`69d18c72ee610fe690e78b6974f126375bc4878d21d60ddd4787c7d81be0202b`

See more details on using hashes here.

tetra-rp 0.1.1

Navigation

Verified details

Owner

Unverified details

Meta

Classifiers

Project description

Tetra: Serverless GPU Computing for AI Workloads

Overview

Installation

Quick Start

Key Features

Dynamic GPU Provisioning

Automatic Dependency Management

Examples

Multi-Stage ML Pipeline

Configuration

Configuration Parameters

Examples

License

Project details

Verified details

Owner

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes