Skip to main content

Execute Python and Shell code on remote GPU sessions

Project description

Clouditia SDK

Execute Python and Shell code on remote session sessions.

Clouditia SDK provides a simple Python interface to run code on remote session-powered containers. Perfect for machine learning, deep learning, and any GPU-accelerated workloads.

PyPI version Python 3.7+ License: MIT

Installation

pip install clouditia

# With S3 support for saving outputs
pip install clouditia[s3]

Quick Start

from clouditia import GPUSession

# Connect to your GPU session
session_live_gpu = GPUSession("ck_your_api_key")

# Execute Python code on the remote session
result = session_live_gpu.run("""
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
""")

print(result.output)

Features

  • Python Execution: Run Python code on remote sessions
  • Shell Commands: Execute shell commands on the remote session pod
  • Persistent Sessions: Keep variables between executions with start()/stop()
  • Variable Transfer: Send and retrieve variables between local and remote
  • File Transfer: Upload/download files and folders between local and remote
  • S3 Output: Save outputs directly to S3 buckets
  • Async Jobs: Submit long-running tasks with real-time log monitoring
  • Jupyter Magic: Use %%clouditia magic in notebooks
  • Decorator Support: Use @session_live_gpu.remote to run functions on the remote session

Table of Contents

  1. Getting Your API Key
  2. Basic Usage
  3. Persistent Sessions
  4. Executing Python Code
  5. Shell Commands
  6. Variable Transfer
  7. File Transfer
  8. S3 Output
  9. Remote Functions (Decorator)
  10. Async Jobs (Long-Running Tasks)
  11. Jupyter Magic
  12. Error Handling
  13. API Reference

Getting Your API Key

  1. Log in to clouditia.com
  2. Start a GPU session
  3. Go to API Keys in your session dashboard
  4. Generate a new API key (starts with ck_ or sk_)

Basic Usage

Connect to a Session

from clouditia import GPUSession

# Create a session with your API key
session_live_gpu = GPUSession("ck_your_api_key_here")

# Verify the connection
info = session_live_gpu.verify()
print(f"Connected to: {info['session_name']}")
print(f"GPU: {info['gpu_type']}")
print(f"Credit remaining: {info['user_credit']}€")

Waiting for a Session to Be Ready

A GPU session can have several intermediate states before being fully usable:

  • creating: the pod is being scheduled on a compute node.
  • running but workspace still downloading: when the session is resumed from a custom environment (venv), the workspace (models, datasets, caches…) is streamed back from S3 at pod startup. For a small workspace this takes a few seconds, for a vLLM cache with 70,000+ files and 16 GB of data it can take 10+ minutes.

The SDK exposes two fields to handle this:

  • ready: boolTrue only when the session is fully usable (status running AND any workspace download complete).
  • estimated_ready_in_seconds: int | None — ETA until ready.
  • workspace_sync — live progress of the workspace download: {in_progress, bytes_done, bytes_total, files_done, pct, rate_bps, eta_seconds}.

Quick check

info = session_live_gpu.verify()
if info['ready']:
    print("Session is ready!")
else:
    ws = info.get('workspace_sync') or {}
    if ws.get('in_progress'):
        print(f"Workspace: {ws['pct']}% ({ws['bytes_done']}/{ws['bytes_total']} bytes)")
        print(f"ETA: {info['estimated_ready_in_seconds']} seconds")
    else:
        print(f"Waiting — status={info['status']}")

Blocking helper: wait_until_ready()

The cleanest way to wait for a resumed session is to call wait_until_ready(). It polls verify() every few seconds and prints a live progress line until the session is ready (or timeout).

session_live_gpu = GPUSession("ck_your_api_key")

# Block until the workspace is fully restored and VS Code/Jupyter is up
if session_live_gpu.wait_until_ready(timeout=1200):  # wait max 20 min
    # Safe to run code now
    result = session_live_gpu.run("import torch; print(torch.cuda.is_available())")
else:
    print("Session failed to become ready in time")

Output during a typical vLLM resume:

⏳ Workspace: 2.34/16.23 GB (14%) @ 22.1 MB/s — ETA 10min 32s
⏳ Workspace: 3.12/16.23 GB (19%) @ 22.5 MB/s — ETA 9min 41s
⏳ Workspace: 4.01/16.23 GB (25%) @ 22.3 MB/s — ETA 9min 5s
...
✅ Session ready!

Parameters:

  • timeout: int = 1800 — max total wait time in seconds (default 30 min).
  • poll_interval: int = 5 — delay between polls in seconds.
  • verbose: bool = True — print progress updates to stdout.

Persistent Sessions

By default, each run() call executes in an isolated environment - variables don't persist between calls. Use start() and stop() to enable persistent sessions where variables are preserved.

Isolated Mode (Default)

# Without start(), variables are NOT persistent
session_live_gpu.run("x = 10")
session_live_gpu.run("print(x)")  # Error: x is not defined

Persistent Mode

# Start a persistent session
session_live_gpu.start()
print(f"Session active: {session_live_gpu.is_persistent}")  # True

# Variables now persist between run() calls
session_live_gpu.run("x = 10")
session_live_gpu.run("y = 20")
session_live_gpu.run("z = x + y")
result = session_live_gpu.run("print(f'Result: {z}')")
# Output: Result: 30

# Stop the session when done
session_live_gpu.stop()
print(f"Session active: {session_live_gpu.is_persistent}")  # False

Full Example

from clouditia import GPUSession

session_live_gpu = GPUSession("ck_your_api_key")

# Start persistent session
session_live_gpu.start()

# Build up state across multiple calls
session_live_gpu.run("import torch")
session_live_gpu.run("model = torch.nn.Linear(10, 5).cuda()")
session_live_gpu.run("data = torch.randn(32, 10).cuda()")

# Use the accumulated state
result = session_live_gpu.run("""
output = model(data)
print(f"Input shape: {data.shape}")
print(f"Output shape: {output.shape}")
""")

# Clean up
session_live_gpu.stop()

Checking Session State

# Check if a persistent session is active
if session_live_gpu.is_persistent:
    print("Persistent session is running")
else:
    print("Running in isolated mode")

Executing Python Code

Simple Execution

# Run Python code and get the output
result = session_live_gpu.run("print('Hello from the GPU!')")
print(result.output)  # "Hello from the GPU!"

# Check if execution was successful
if result.success:
    print("Code executed successfully!")
else:
    print(f"Error: {result.error}")

output vs result

  • result.output — contient tout le resultat du code execute (print + derniere expression), comme une cellule Jupyter
  • result.result — contient uniquement la valeur de la derniere ligne si c'est une expression (pour usage programmatique)
# Expression seule
result = session_live_gpu.run("2 + 2")
print(result.output)  # "4"
print(result.result)  # "4"

# List comprehension
result = session_live_gpu.run("[i**2 for i in range(5)]")
print(result.output)  # "[0, 1, 4, 9, 16]"
print(result.result)  # "[0, 1, 4, 9, 16]"

# print() + expression : output contient tout, result contient la derniere valeur
result = session_live_gpu.run("x = 10\nprint(f'x = {x}')\nx * 2")
print(result.output)  # "x = 10\n20"
print(result.result)  # "20"

# Statements seuls (pas d'expression en derniere ligne)
result = session_live_gpu.run("print('hello')")
print(result.output)  # "hello"
print(result.result)  # None

Multi-line Code

result = session_live_gpu.run("""
import torch
import torch.nn as nn

# Create a simple model
model = nn.Linear(10, 5).cuda()
x = torch.randn(32, 10).cuda()
output = model(x)

print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")
print(f"Model parameters: {sum(p.numel() for p in model.parameters())}")
""")

print(result.output)

run() vs exec()

  • run() — retourne un ExecutionResult avec output, result, success. Vous gerez les erreurs vous-meme
  • exec() — retourne True si OK, leve une exception ExecutionError si le code echoue. Raccourci pour du code sans retour

Les deux executent le code de la meme facon. La seule difference est la gestion d'erreur.

Important : Chaque appel run() ou exec() est isole — les variables ne persistent pas entre les appels. Pour persister les variables, utilisez le mode persistent (voir section Persistent Sessions) :

# ERREUR: chaque exec() est isole, torch n'est pas connu au 2e appel
session_live_gpu.exec("import torch")
session_live_gpu.exec("model = torch.nn.Linear(10, 5).cuda()")  # NameError!

# CORRECT: tout dans un seul appel
session_live_gpu.exec("""
import torch
model = torch.nn.Linear(10, 5).cuda()
optimizer = torch.optim.Adam(model.parameters())
print(f"Model parameters: {sum(p.numel() for p in model.parameters())}")
""")

# CORRECT: ou utiliser le mode persistent
session_live_gpu.start()  # Active le mode persistent
session_live_gpu.exec("import torch")
session_live_gpu.exec("model = torch.nn.Linear(10, 5).cuda()")  # torch est connu
session_live_gpu.exec("optimizer = torch.optim.Adam(model.parameters())")
session_live_gpu.stop()

Shell Commands

Execute shell commands on the remote session pod:

# Check current directory
result = session_live_gpu.shell("pwd")
print(result.output)  # /home/coder/workspace

# List files (chemin complet ou ~/workspace)
result = session_live_gpu.shell("ls -la /home/coder/workspace")
print(result.output)

result = session_live_gpu.shell("ls -la ~/workspace")
print(result.output)

# Create directories and files
result = session_live_gpu.shell("mkdir -p ~/workspace/models && ls ~/workspace")
print(result.output)

# Chain multiple commands
result = session_live_gpu.shell("cd ~/workspace && mkdir -p data && ls -la")
print(result.output)

# Check disk space
result = session_live_gpu.shell("df -h")
print(result.output)

# Check memory
result = session_live_gpu.shell("free -h")
print(result.output)

# Install packages
result = session_live_gpu.shell("pip install transformers datasets")
print(result.output)

# Download files
result = session_live_gpu.shell(
    "wget https://archive.ics.uci.edu/static/public/53/iris.zip -O ~/workspace/data.zip"
)
print(result.output)

result = session_live_gpu.shell(
    "wget https://huggingface.co/datasets/scikit-learn/iris/resolve/main/Iris.csv -O ~/workspace/data.csv"
)
print(result.output)

Checking Exit Codes

result = session_live_gpu.shell("ls /nonexistent")
print(f"Exit code: {result.exit_code}")
print(f"Success: {result.success}")
print(f"result content : {result}")
print(f"result output : {result.output}")

Variable Transfer

Important : set() et get() necessitent le mode persistent (start()/stop()) pour que les variables persistent entre les appels.

Sending Variables to session_live_gpu

# Start persistent mode (variables persist between calls)
session_live_gpu.start()

# Send local data to the remote session
data = [1, 2, 3, 4, 5, 99]
session_live_gpu.set("my_data", data)

# Use it in remote code
session_live_gpu.run("print(f'Data: {my_data}')")
session_live_gpu.run("print(f'Sum: {sum(my_data)}')")

session_live_gpu.stop()

Retrieving Variables from session_live_gpu

session_live_gpu.start()

# Compute something on the remote session
session_live_gpu.run("""
import torch
tensor = torch.randn(100, 100).cuda()
result_stats = {
    'mean': tensor.mean().item(),
    'std': tensor.std().item(),
    'shape': list(tensor.shape)
}
""")

# Get the result locally
stats = session_live_gpu.get("result_stats")
print(f"Mean: {stats['mean']:.4f}")
print(f"Std: {stats['std']:.4f}")
print(f"Shape: {stats['shape']}")

session_live_gpu.stop()

Sending Complex Objects

import numpy as np

session_live_gpu.start()

# Send numpy arrays
arr = np.random.randn(100, 100)
session_live_gpu.set("numpy_array", arr)

# Send dictionaries
config = {
    "learning_rate": 0.001,
    "batch_size": 32,
    "epochs": 100
}
session_live_gpu.set("config", config)

# Use in remote code
session_live_gpu.run("""
import torch
tensor = torch.from_numpy(numpy_array).cuda()
print(f"Learning rate: {config['learning_rate']}")
""")

session_live_gpu.stop()

File Transfer

Transfer files and folders between your local machine and the remote session session.

Uploading a Single File

# Upload a local file to the remote session
session_live_gpu.upload("./data.csv", "/home/coder/workspace/data.csv")

# Upload with custom path
session_live_gpu.upload("./model.pkl", "/home/coder/workspace/models/trained_model.pkl")

# Disable progress output
session_live_gpu.upload("./config.json", "/home/coder/workspace/config.json", show_progress=False)

Downloading a Single File

# Download a file from the remote session
session_live_gpu.download("/home/coder/workspace/results.csv", "./results.csv")

# Download trained model
session_live_gpu.download("/home/coder/workspace/checkpoints/model.pt", "./local_model.pt")

# Download silently
session_live_gpu.download("/home/coder/workspace/logs.txt", "./logs.txt", show_progress=False)

Uploading a Folder

Upload an entire directory with all its contents:

# Upload a project folder
session_live_gpu.upload_folder("./my_project", "/home/coder/workspace/project")

# Upload with exclusions (default excludes: __pycache__, .git, *.pyc, .DS_Store, node_modules)
session_live_gpu.upload_folder(
    "./my_project",
    "/home/coder/workspace/project",
    exclude=["*.log", ".env", "__pycache__", ".git"]
)

# Upload data folder
session_live_gpu.upload_folder("./datasets", "/home/coder/workspace/data")

Downloading a Folder

Download an entire directory with all its contents:

# Download results folder
session_live_gpu.download_folder("/home/coder/workspace/results", "./local_results")

# Download checkpoints
session_live_gpu.download_folder(
    "/home/coder/workspace/checkpoints",
    "./checkpoints",
    exclude=["*.tmp", "*.log"]
)

# Download trained models
session_live_gpu.download_folder("/home/coder/workspace/models", "./downloaded_models")

Listing Remote Files

# List files in a directory
files = session_live_gpu.list_files("/home/coder/workspace")
for f in files:
    icon = "📁" if f["is_dir"] else "📄"
    print(f"{icon} {f['name']} - {f['size']} bytes")

# Filter by pattern
python_files = session_live_gpu.list_files("/home/coder/workspace", pattern="*.py")
for f in python_files:
    print(f"📄 {f['name']}")

# List with full details
files = session_live_gpu.list_files("/home/coder/workspace")
for f in files:
    print(f"Name: {f['name']}")
    print(f"  Path: {f['path']}")
    print(f"  Size: {f['size']} bytes")
    print(f"  Is Directory: {f['is_dir']}")
    print(f"  Modified: {f['modified']}")

Checking if a File Exists

# Check before downloading
if session_live_gpu.file_exists("/home/coder/workspace/model.pt"):
    session_live_gpu.download("/home/coder/workspace/model.pt", "./model.pt")
    print("Model downloaded!")
else:
    print("Model not found, training required...")

# Check multiple files
files_to_check = ["config.json", "data.csv", "model.pt"]
for filename in files_to_check:
    path = f"/home/coder/workspace/{filename}"
    exists = session_live_gpu.file_exists(path)
    status = "✓" if exists else "✗"
    print(f"{status} {filename}")

Complete Workflow Example

from clouditia import GPUSession

session_live_gpu = GPUSession("ck_your_api_key")

# 1. Upload training data and code
session_live_gpu.upload_folder("./training_code", "/home/coder/workspace/code")
session_live_gpu.upload("./data/train.csv", "/home/coder/workspace/data/train.csv")
session_live_gpu.upload("./data/test.csv", "/home/coder/workspace/data/test.csv")

# 2. Run training
result = session_live_gpu.run("""
import sys
sys.path.insert(0, '/home/coder/workspace/code')
from train import train_model

model = train_model('/home/coder/workspace/data/train.csv')
model.save('/home/coder/workspace/output/model.pt')
print("Training complete!")
""")

# 3. Check and download results
if session_live_gpu.file_exists("/home/coder/workspace/output/model.pt"):
    session_live_gpu.download("/home/coder/workspace/output/model.pt", "./trained_model.pt")
    print("Model saved locally!")

# 4. Download all outputs
session_live_gpu.download_folder("/home/coder/workspace/output", "./results")
print("All results downloaded!")

# 5. List what was created
files = session_live_gpu.list_files("/home/coder/workspace/output")
print(f"Created {len(files)} files during training")

Working with Different File Types

# CSV files
session_live_gpu.upload("./data.csv", "/home/coder/workspace/data.csv")

# Pickle files (models, data)
session_live_gpu.upload("./model.pkl", "/home/coder/workspace/model.pkl")

# PyTorch models
session_live_gpu.download("/home/coder/workspace/checkpoint.pt", "./checkpoint.pt")

# JSON configuration
session_live_gpu.upload("./config.json", "/home/coder/workspace/config.json")

# Text files
session_live_gpu.upload("./requirements.txt", "/home/coder/workspace/requirements.txt")

# Binary files
session_live_gpu.upload("./image.png", "/home/coder/workspace/image.png")

# Any file type works!
session_live_gpu.upload("./data.parquet", "/home/coder/workspace/data.parquet")
session_live_gpu.upload("./weights.h5", "/home/coder/workspace/weights.h5")

S3 Output

Save your outputs directly to Amazon S3 or compatible storage (MinIO, etc.).

Installation

To use S3 features, install with the s3 extra:

pip install clouditia[s3]

Creating an S3 Connection

from clouditia import GPUSession

session_live_gpu = GPUSession("sk_live_your_api_key")

# Create S3 connection
s3 = session_live_gpu.s3_connect(
    bucket="my-ml-outputs",
    access_key="AKIAIOSFODNN7EXAMPLE",
    secret_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
    region="eu-west-1",
    prefix="experiments/run-001/"  # Optional: prefix for all uploads
)

output() vs output_file()

Deux methodes pour sauvegarder vers S3 :

  • output(filename, data, s3) — sauvegarde un objet Python en memoire (variable, array, dict) vers S3. Le SDK le serialise automatiquement selon l'extension du fichier. L'objet n'a pas besoin d'exister sur disque.

  • output_file(local_path, s3) — uploade un fichier deja existant sur votre disque local vers S3. Utile pour des fichiers generes par un script ou telecharges.

Saving Python Objects to S3 (output)

Le format de serialisation est detecte automatiquement selon l'extension :

  • .pt, .pth: PyTorch state dict
  • .npy: NumPy array
  • .json: JSON data
  • .pkl, .pickle: Pickle format (default)
# Save NumPy arrays
import numpy as np
embeddings = np.random.randn(1000, 768)
url = session_live_gpu.output("embeddings_aina_23052026.npy", embeddings, s3)

# Save JSON metrics
metrics = {"accuracy": 0.95, "loss": 0.05, "epoch": 100}
url = session_live_gpu.output("metrics.json", metrics, s3)

# Save any picklable object
results = {"predictions": [1, 2, 3], "embeddings": embeddings, "metrics": metrics}
url = session_live_gpu.output("results.pkl", results, s3)

Uploading Local Files to S3 (output_file)

# Upload a file that already exists on your local disk
url = session_live_gpu.output_file("./checkpoints/best_model.pt", s3)
print(f"Uploaded to: {url}")

# remote_filename : choisir le chemin et le nom du fichier sur S3
# Par defaut, le fichier garde son nom local (best_model.pt)
# Avec remote_filename, vous choisissez l'arborescence sur S3
url = session_live_gpu.output_file(
    "./model.pt",                                       # fichier local
    s3,
    remote_filename="models/production/v2.0/model.pt"   # chemin sur S3
)
# Resultat sur S3: s3://mon-bucket/prefix/models/production/v2.0/model.pt

Saving Remote Session Data to S3 (remote_output / remote_output_file)

Les methodes output() et output_file() sauvegardent des objets/fichiers locaux vers S3. Les methodes remote_output() et remote_output_file() sauvegardent des objets/fichiers du pod session distante directement vers S3, sans transiter par votre machine locale.

Methode Source Destination Transit local
output() objet Python local (en memoire) S3 oui
output_file() fichier local (sur disque) S3 oui
remote_output() variable Python sur le session distante S3 non
remote_output_file() fichier sur le session distante S3 non
# remote_output() necessite le mode persistent (la variable doit rester en memoire)
session_live_gpu.start()

session_live_gpu.run("""
import torch
model = torch.nn.Linear(784, 10).cuda()
optimizer = torch.optim.Adam(model.parameters())
# ... entrainement ...
results = {"accuracy": 0.95, "loss": 0.05, "epochs": 100}
torch.save(model.state_dict(), "/home/coder/workspace/model.pt")
""")

# Sauvegarder la variable 'results' de la session distante vers S3 (format JSON)
url = session_live_gpu.remote_output("results.json", "results", s3)

session_live_gpu.stop()

# remote_output_file() n'a PAS besoin de start()/stop()
# car le fichier est sur le disque du pod (il persiste entre les appels)
url = session_live_gpu.remote_output_file("/home/coder/workspace/model.pt", s3)

# Avec un nom personnalise sur S3
url = session_live_gpu.remote_output_file(
    "/home/coder/workspace/model.pt",
    s3,
    s3_filename="models/production/v3/model.pt"
)

Using with MinIO or Other S3-Compatible Storage

# MinIO connection
s3_minio = session_live_gpu.s3_connect(
    bucket="ml-outputs",
    access_key="minio_user",
    secret_key="minio_password",
    endpoint="https://minio.endpoint.url",  # Custom endpoint : "http://minio.local:9000"
    region="us-east-1"
)

metrics_minio = {"accuracy": 0.95, "loss": 0.05, "epoch": 100}
session_live_gpu.output("metrics_minio_ok", metrics_minio, s3_minio)

Complete Training Workflow with S3 Output

from clouditia import GPUSession

session_live_gpu = GPUSession("sk_live_your_api_key")

# Configure S3 output
s3 = session_live_gpu.s3_connect(
    bucket="my-training-outputs",
    access_key="AKIA...",
    secret_key="...",
    prefix="training/experiment-001/"
)

# Start persistent session for training
session_live_gpu.start()

# Setup
session_live_gpu.run("""
import torch
import torch.nn as nn

model = nn.Linear(100, 10).cuda()
optimizer = torch.optim.Adam(model.parameters())
""")

# Training loop
session_live_gpu.run("""
for epoch in range(100):
    x = torch.randn(32, 100).cuda()
    y = model(x)
    loss = y.sum()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
""")

# Get model state and save to S3
session_live_gpu.run("final_state = model.state_dict()")
model_state = session_live_gpu.get("final_state")

url = session_live_gpu.output("trained_model.pt", model_state, s3)
print(f"Model saved to: {url}")

# Save training metrics
metrics = {"final_loss": 0.05, "epochs": 100}
session_live_gpu.output("metrics.json", metrics, s3)

session_live_gpu.stop()

Remote Functions (Decorator)

Use the @session_live_gpu.remote decorator to run functions on the remote session:

from clouditia import GPUSession

session_live_gpu = GPUSession("ck_your_api_key")

@session_live_gpu.remote
def compute_on_gpu(data, power=2):
    import torch
    tensor = torch.tensor(data, device='cuda', dtype=torch.float32)
    result = tensor ** power
    return result.cpu().tolist()

# Call the function - it runs on the remote session!
result = compute_on_gpu([1, 2, 3, 4, 5], power=2)
print(result)  # [1.0, 4.0, 9.0, 16.0, 25.0]

Remote Function with Model

@session_live_gpu.remote
def train_step(batch_data, learning_rate=0.01):
    import torch
    import torch.nn as nn

    # Create model (or load from checkpoint)
    model = nn.Sequential(
        nn.Linear(len(batch_data), 64),
        nn.ReLU(),
        nn.Linear(64, 1)
    ).cuda()

    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

    # Training step
    x = torch.tensor(batch_data, dtype=torch.float32).cuda()
    output = model(x)
    loss = output.sum()

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    return {"loss": loss.item()}

# Call it like a normal function
result = train_step([1.0, 2.0, 3.0, 4.0], learning_rate=0.001)
print(f"Loss: {result['loss']}")

Async Remote Functions

@session_live_gpu.remote(async_mode=True)
def long_training():
    import torch
    for epoch in range(100):
        print(f"Epoch {epoch}/100")
        # ... training code ...
    return {"status": "completed"}

# Returns an AsyncJob instead of waiting
job = long_training()
print(f"Job submitted: {job.job_id}")

# Wait for completion
result = job.wait(show_logs=True)

Async Jobs (Long-Running Tasks)

For tasks that take hours or days, use async jobs:

Submitting a Job

# Submit a long-running job
job = session_live_gpu.submit("""
import torch
import time

print("Starting training...")
for epoch in range(100):
    print(f"Epoch {epoch + 1}/100")
    time.sleep(1)  # Simulate training

print("Training complete!")
torch.save({'epoch': 100}, '/home/coder/workspace/checkpoint.pt')
""", name="my_training")

print(f"Job ID: {job.job_id}")

Monitoring Progress

import time

# Poll for status
while not job.is_done():
    status = job.status()
    print(f"Status: {status}")

    # View recent logs
    if status == "running":
        logs = job.logs(tail=10)
        print(logs)

    time.sleep(30)

print("Job finished!")

Real-Time Log Streaming

# View logs as they come in
while job.is_running():
    new_logs = job.logs(new_only=True)
    if new_logs.strip():
        print(new_logs, end='')
    time.sleep(5)

Waiting for Completion

# Wait with live log output
result = job.wait(show_logs=True)

# Or wait with timeout
try:
    result = job.wait(timeout=3600)  # 1 hour max
except TimeoutError:
    print("Job taking too long, cancelling...")
    job.cancel()

Getting Results

# Attendre que le job termine avant de recuperer le resultat
job.wait()  # bloque jusqu'a completion

# Recuperer le resultat
result = job.result()

if job.status == "running":
    print("Job still running...")
elif result.success:
    print("Job completed successfully!")
    print(result.output)
else:
    print(f"Job failed: {result.error}")

Listing Jobs

# List all jobs
jobs = session_live_gpu.jobs()
for j in jobs:
    print(f"{j.name}: {j.status()}")

# List only running jobs
running_jobs = session_live_gpu.jobs(status="running")

# List completed jobs
completed_jobs = session_live_gpu.jobs(status="completed", limit=5)

Cancelling Jobs

if job.is_running():
    job.cancel()
    print("Job cancelled")

Shell Jobs

# Submit a shell command as an async job
job = session_live_gpu.submit(
    "pip install transformers && python /home/coder/workspace/train.py",
    name="install_and_train",
    job_type="shell"
)

Jupyter Magic

Use Clouditia directly in Jupyter notebooks with magic commands.

Loading the Extension

# In a Jupyter cell
%load_ext clouditia

# Set your API key
CLOUDITIA_API_KEY = "ck_your_api_key"

Running Code on Remote Session

%%clouditia
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")

x = torch.randn(1000, 1000, device='cuda')
y = torch.randn(1000, 1000, device='cuda')
z = torch.matmul(x, y)
print(f"Result shape: {z.shape}")

Specifying API Key Directly

%%clouditia ck_your_api_key
print("Hello from GPU!")

Async Mode in Jupyter

%%clouditia --async
for epoch in range(100):
    print(f"Epoch {epoch}")
    # ... training code ...

# The job is submitted and _clouditia_job variable is set
# Check job status
_clouditia_job.status()

# View logs
print(_clouditia_job.logs())

Utility Magic Commands

# Check session status
%clouditia_status

# List recent jobs
%clouditia_jobs

# List only running jobs
%clouditia_jobs running

Error Handling

The SDK provides specific exceptions for different error types:

from clouditia import (
    GPUSession,
    ClouditiaError,
    AuthenticationError,
    SessionError,
    ExecutionError,
    TimeoutError,
    CommandBlockedError
)

session_live_gpu = GPUSession("ck_your_api_key")

try:
    result = session_live_gpu.run("some_code()")
except AuthenticationError:
    print("Invalid API key")
except SessionError:
    print("Session not running or not accessible")
except ExecutionError as e:
    print(f"Code execution failed: {e}")
except TimeoutError:
    print("Execution timed out - consider using async jobs")
except CommandBlockedError:
    print("Command blocked by security filters")
except ClouditiaError as e:
    print(f"General error: {e}")

Using raise_for_status()

result = session_live_gpu.run("some_code()")
result.raise_for_status()  # Raises ExecutionError if failed
print(result.output)

API Reference

GPUSession

GPUSession(
    api_key: str,
    base_url: str = "https://clouditia.com/code-editor",
    timeout: int = 120,
    poll_interval: int = 5
)

Methods:

Method Description
verify() Verify API key and get session info
run(code, timeout=None, stream=True) Execute Python code
exec(code, timeout=None) Execute without return value
shell(command, timeout=None) Execute shell command
start() Start a persistent session
stop() Stop the persistent session
set(name, value) Send variable to remote
get(name) Retrieve variable from remote
upload(local_path, remote_path, show_progress=True) Upload a file to remote session
download(remote_path, local_path, show_progress=True) Download a file from remote session
upload_folder(local_path, remote_path, exclude=None) Upload a folder to remote session
download_folder(remote_path, local_path, exclude=None) Download a folder from remote session
list_files(remote_path, pattern=None) List files in remote directory
file_exists(remote_path) Check if a file exists on remote
submit(code, name=None, job_type="python") Submit async job
jobs(status=None, limit=10) List jobs
gpu_info() Get GPU information
remote(func) Decorator for remote functions
s3_connect(bucket, access_key, secret_key, ...) Create S3 connection
output(filename, data, s3_connection) Save Python object to S3
output_file(local_path, s3_connection) Upload local file to S3
remote_output(filename, variable_name, s3) Save remote session variable directly to S3
remote_output_file(remote_path, s3) Upload remote session file directly to S3

Properties:

Property Description
is_persistent True if a persistent session is active

ExecutionResult

ExecutionResult(
    output: str,      # tout le resultat (print + derniere expression)
    result: Any,      # valeur de la derniere ligne si c'est une expression (None sinon)
    error: str,       # message d'erreur si echec
    exit_code: int,   # code de sortie du processus
    success: bool     # True si execution reussie
)

Difference output vs result :

  • output = tout le stdout + la derniere expression (comme une cellule Jupyter)
  • result = uniquement la derniere expression pour usage programmatique (ex: int(result.result))

Methods:

Method Description
raise_for_status() Raise exception if failed
to_dict() Convert to dictionary

AsyncJob

AsyncJob(session, job_id, name=None)

Methods:

Method Description
status() Get current status
is_done() Check if finished
is_running() Check if running
is_pending() Check if pending
logs(tail=50, new_only=False) Get logs
result() Get final result
cancel() Cancel the job
wait(timeout=None, show_logs=False) Wait for completion
get_info() Get detailed job info

S3Connection

S3Connection(
    bucket: str,           # S3 bucket name
    access_key: str,       # AWS Access Key ID
    secret_key: str,       # AWS Secret Access Key
    endpoint: str = "https://s3.amazonaws.com",  # S3 endpoint (for MinIO, etc.)
    region: str = "us-east-1",                   # AWS region
    prefix: str = ""                             # Optional prefix for uploads
)

Usage:

from clouditia import GPUSession, S3Connection

session_live_gpu = GPUSession("sk_live_...")

# Via method (recommended)
s3 = session_live_gpu.s3_connect(bucket="my-bucket", access_key="...", secret_key="...")

# Or create directly
s3 = S3Connection(bucket="my-bucket", access_key="...", secret_key="...")

Configuration

Environment Variables

You can set the API key via environment variable:

export CLOUDITIA_API_KEY="ck_your_api_key"
import os
from clouditia import GPUSession

session_live_gpu = GPUSession(os.environ["CLOUDITIA_API_KEY"])

Custom Base URL

session_live_gpu = GPUSession(
    "ck_your_api_key",
    base_url="https://custom.clouditia.com/code-editor"
)

Timeouts

# Set default timeout (seconds)
session_live_gpu = GPUSession("ck_your_api_key", timeout=300)

# Or per-request
result = session_live_gpu.run("long_computation()", timeout=600)

Examples

Training a Neural Network

from clouditia import GPUSession

session_live_gpu = GPUSession("ck_your_api_key")

# Submit training job
job = session_live_gpu.submit("""
import torch
import torch.nn as nn
import torch.optim as optim

# Create model
model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10)
).cuda()

optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Training loop
for epoch in range(10):
    # Simulated batch
    x = torch.randn(64, 784).cuda()
    y = torch.randint(0, 10, (64,)).cuda()

    optimizer.zero_grad()
    output = model(x)
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()

    print(f"Epoch {epoch+1}/10, Loss: {loss.item():.4f}")

# Save model
torch.save(model.state_dict(), '/home/coder/workspace/model.pt')
print("Training complete!")
""", name="mnist_training")

# Wait with live logs
result = job.wait(show_logs=True)

Data Processing Pipeline

# Create workspace
session_live_gpu.shell("mkdir -p ~/workspace/data ~/workspace/output")

# Download data
session_live_gpu.shell("cd ~/workspace/data && wget https://example.com/data.csv")

# Process data
result = session_live_gpu.run("""
import pandas as pd

# Load and process data
df = pd.read_csv('/home/coder/workspace/data/data.csv')
print(f"Loaded {len(df)} rows")

# Process...
df_processed = df.dropna()
print(f"After cleaning: {len(df_processed)} rows")

# Save
df_processed.to_csv('/home/coder/workspace/output/processed.csv', index=False)
print("Saved to /home/coder/workspace/output/processed.csv")
""")

print(result.output)

Support


License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clouditia-1.8.6.tar.gz (58.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clouditia-1.8.6-py3-none-any.whl (38.1 kB view details)

Uploaded Python 3

File details

Details for the file clouditia-1.8.6.tar.gz.

File metadata

  • Download URL: clouditia-1.8.6.tar.gz
  • Upload date:
  • Size: 58.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for clouditia-1.8.6.tar.gz
Algorithm Hash digest
SHA256 b474200eb804a74da6733808f3afdc694b95a39de7f95de274639e4529b8e9c4
MD5 d78a87388649cf5f327a3134a2c2766a
BLAKE2b-256 32b8f2530e4e24facfa00b5fff8207995078114863bd2edfeb40b975e931ec30

See more details on using hashes here.

File details

Details for the file clouditia-1.8.6-py3-none-any.whl.

File metadata

  • Download URL: clouditia-1.8.6-py3-none-any.whl
  • Upload date:
  • Size: 38.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for clouditia-1.8.6-py3-none-any.whl
Algorithm Hash digest
SHA256 cc0a8c3a7f16ba7b894d1043d557197e5e18ac34893fe38c5f064bb07ccb9bea
MD5 2f0feffa6e369752f236b0096da26484
BLAKE2b-256 8e97a2117b9530fd0cfe4275b28aeffa0556809e57b2c8b023231629073c0dea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page