Skip to main content

Execute Python and Shell code on remote GPU sessions

Project description

Clouditia SDK

Execute Python and Shell code on remote session sessions.

Clouditia SDK provides a simple Python interface to run code on remote session-powered containers. Perfect for machine learning, deep learning, and any GPU-accelerated workloads.

PyPI version Python 3.7+ License: MIT

Installation

pip install clouditia

# With S3 support for saving outputs
pip install clouditia[s3]

Quick Start

from clouditia import GPUSession

# Connect to your GPU session
session_live_gpu = GPUSession("ck_your_api_key")

# Execute Python code on the remote session
result = session_live_gpu.run("""
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
""")

print(result.output)

Features

  • Python Execution: Run Python code on remote sessions
  • Shell Commands: Execute shell commands on the remote session pod
  • Persistent Sessions: Keep variables between executions with start()/stop()
  • Variable Transfer: Send and retrieve variables between local and remote
  • File Transfer: Upload/download files and folders between local and remote
  • S3 Output: Save outputs directly to S3 buckets
  • Async Jobs: Submit long-running tasks with real-time log monitoring
  • Jupyter Magic: Use %%clouditia magic in notebooks
  • Decorator Support: Use @session_live_gpu.remote to run functions on the remote session

Table of Contents

  1. Getting Your API Key
  2. Basic Usage
  3. Persistent Sessions
  4. Executing Python Code
  5. Shell Commands
  6. Variable Transfer
  7. File Transfer
  8. S3 Output
  9. Remote Functions (Decorator)
  10. Async Jobs (Long-Running Tasks)
  11. Jupyter Magic
  12. Error Handling
  13. API Reference

Getting Your API Key

  1. Log in to clouditia.com
  2. Start a GPU session
  3. Go to API Keys in your session dashboard
  4. Generate a new API key (starts with ck_ or sk_)

Basic Usage

Connect to a Session

from clouditia import GPUSession

# Create a session with your API key
session_live_gpu = GPUSession("ck_your_api_key_here")

# Verify the connection
info = session_live_gpu.verify()
print(f"Connected to: {info['session_name']}")
print(f"GPU: {info['gpu_type']}")
print(f"Credit remaining: {info['user_credit']}€")

Waiting for a Session to Be Ready

A GPU session can have several intermediate states before being fully usable:

  • creating: the pod is being scheduled on a compute node.
  • running but workspace still downloading: when the session is resumed from a custom environment (venv), the workspace (models, datasets, caches…) is streamed back from S3 at pod startup. For a small workspace this takes a few seconds, for a vLLM cache with 70,000+ files and 16 GB of data it can take 10+ minutes.

The SDK exposes two fields to handle this:

  • ready: boolTrue only when the session is fully usable (status running AND any workspace download complete).
  • estimated_ready_in_seconds: int | None — ETA until ready.
  • workspace_sync — live progress of the workspace download: {in_progress, bytes_done, bytes_total, files_done, pct, rate_bps, eta_seconds}.

Quick check

info = session_live_gpu.verify()
if info['ready']:
    print("Session is ready!")
else:
    ws = info.get('workspace_sync') or {}
    if ws.get('in_progress'):
        print(f"Workspace: {ws['pct']}% ({ws['bytes_done']}/{ws['bytes_total']} bytes)")
        print(f"ETA: {info['estimated_ready_in_seconds']} seconds")
    else:
        print(f"Waiting — status={info['status']}")

Blocking helper: wait_until_ready()

The cleanest way to wait for a resumed session is to call wait_until_ready(). It polls verify() every few seconds and prints a live progress line until the session is ready (or timeout).

session_live_gpu = GPUSession("ck_your_api_key")

# Block until the workspace is fully restored and VS Code/Jupyter is up
if session_live_gpu.wait_until_ready(timeout=1200):  # wait max 20 min
    # Safe to run code now
    result = session_live_gpu.run("import torch; print(torch.cuda.is_available())")
else:
    print("Session failed to become ready in time")

Output during a typical vLLM resume:

⏳ Workspace: 2.34/16.23 GB (14%) @ 22.1 MB/s — ETA 10min 32s
⏳ Workspace: 3.12/16.23 GB (19%) @ 22.5 MB/s — ETA 9min 41s
⏳ Workspace: 4.01/16.23 GB (25%) @ 22.3 MB/s — ETA 9min 5s
...
✅ Session ready!

Parameters:

  • timeout: int = 1800 — max total wait time in seconds (default 30 min).
  • poll_interval: int = 5 — delay between polls in seconds.
  • verbose: bool = True — print progress updates to stdout.

Persistent Sessions

By default, each run() call executes in an isolated environment - variables don't persist between calls. Use start() and stop() to enable persistent sessions where variables are preserved.

Isolated Mode (Default)

# Without start(), variables are NOT persistent
session_live_gpu.run("x = 10")
session_live_gpu.run("print(x)")  # Error: x is not defined

Persistent Mode

# Start a persistent session
session_live_gpu.start()
print(f"Session active: {session_live_gpu.is_persistent}")  # True

# Variables now persist between run() calls
session_live_gpu.run("x = 10")
session_live_gpu.run("y = 20")
session_live_gpu.run("z = x + y")
result = session_live_gpu.run("print(f'Result: {z}')")
# Output: Result: 30

# Stop the session when done
session_live_gpu.stop()
print(f"Session active: {session_live_gpu.is_persistent}")  # False

Full Example

from clouditia import GPUSession

session_live_gpu = GPUSession("ck_your_api_key")

# Start persistent session
session_live_gpu.start()

# Build up state across multiple calls
session_live_gpu.run("import torch")
session_live_gpu.run("model = torch.nn.Linear(10, 5).cuda()")
session_live_gpu.run("data = torch.randn(32, 10).cuda()")

# Use the accumulated state
result = session_live_gpu.run("""
output = model(data)
print(f"Input shape: {data.shape}")
print(f"Output shape: {output.shape}")
""")

# Clean up
session_live_gpu.stop()

Checking Session State

# Check if a persistent session is active
if session_live_gpu.is_persistent:
    print("Persistent session is running")
else:
    print("Running in isolated mode")

Executing Python Code

Simple Execution

# Run Python code and get the output
result = session_live_gpu.run("print('Hello from the GPU!')")
print(result.output)  # "Hello from the GPU!"

# Check if execution was successful
if result.success:
    print("Code executed successfully!")
else:
    print(f"Error: {result.error}")

output vs result

  • result.output — contains all output from the executed code (print statements + last expression), like a Jupyter cell
  • result.result — contains only the value of the last line if it's an expression (for programmatic use)
# Expression seule
result = session_live_gpu.run("2 + 2")
print(result.output)  # "4"
print(result.result)  # "4"

# List comprehension
result = session_live_gpu.run("[i**2 for i in range(5)]")
print(result.output)  # "[0, 1, 4, 9, 16]"
print(result.result)  # "[0, 1, 4, 9, 16]"

# print() + expression : output contient tout, result contient la derniere valeur
result = session_live_gpu.run("x = 10\nprint(f'x = {x}')\nx * 2")
print(result.output)  # "x = 10\n20"
print(result.result)  # "20"

# Statements seuls (pas d'expression en derniere ligne)
result = session_live_gpu.run("print('hello')")
print(result.output)  # "hello"
print(result.result)  # None

Multi-line Code

result = session_live_gpu.run("""
import torch
import torch.nn as nn

# Create a simple model
model = nn.Linear(10, 5).cuda()
x = torch.randn(32, 10).cuda()
output = model(x)

print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")
print(f"Model parameters: {sum(p.numel() for p in model.parameters())}")
""")

print(result.output)

run() vs exec()

  • run() — retourne un ExecutionResult avec output, result, success. Vous gerez les erreurs vous-meme
  • exec() — returns True if OK, raises an exception ExecutionError if the code fails. Shortcut for code that doesn't need a return value.

Both execute code the same way. The only difference is error handling.

Important: Each run() or exec() call is isolated — variables don't persist between calls. To persist variables, use persistent mode (see Persistent Sessions section):

# ERREUR: chaque exec() est isole, torch n'est pas connu au 2e appel
session_live_gpu.exec("import torch")
session_live_gpu.exec("model = torch.nn.Linear(10, 5).cuda()")  # NameError!

# CORRECT: tout dans un seul appel
session_live_gpu.exec("""
import torch
model = torch.nn.Linear(10, 5).cuda()
optimizer = torch.optim.Adam(model.parameters())
print(f"Model parameters: {sum(p.numel() for p in model.parameters())}")
""")

# CORRECT: ou utiliser le mode persistent
session_live_gpu.start()  # Active le mode persistent
session_live_gpu.exec("import torch")
session_live_gpu.exec("model = torch.nn.Linear(10, 5).cuda()")  # torch est connu
session_live_gpu.exec("optimizer = torch.optim.Adam(model.parameters())")
session_live_gpu.stop()

Shell Commands

Execute shell commands on the remote session pod:

# Check current directory
result = session_live_gpu.shell("pwd")
print(result.output)  # /home/coder/workspace

# List files (chemin complet ou ~/workspace)
result = session_live_gpu.shell("ls -la /home/coder/workspace")
print(result.output)

result = session_live_gpu.shell("ls -la ~/workspace")
print(result.output)

# Create directories and files
result = session_live_gpu.shell("mkdir -p ~/workspace/models && ls ~/workspace")
print(result.output)

# Chain multiple commands
result = session_live_gpu.shell("cd ~/workspace && mkdir -p data && ls -la")
print(result.output)

# Check disk space
result = session_live_gpu.shell("df -h")
print(result.output)

# Check memory
result = session_live_gpu.shell("free -h")
print(result.output)

# Install packages
result = session_live_gpu.shell("pip install transformers datasets")
print(result.output)

# Download files
result = session_live_gpu.shell(
    "wget https://archive.ics.uci.edu/static/public/53/iris.zip -O ~/workspace/data.zip"
)
print(result.output)

result = session_live_gpu.shell(
    "wget https://huggingface.co/datasets/scikit-learn/iris/resolve/main/Iris.csv -O ~/workspace/data.csv"
)
print(result.output)

Checking Exit Codes

result = session_live_gpu.shell("ls /nonexistent")
print(f"Exit code: {result.exit_code}")
print(f"Success: {result.success}")
print(f"result content : {result}")
print(f"result output : {result.output}")

Variable Transfer

Important: set() and get() require persistent mode (start()/stop()) so that variables persist between calls.

Sending Variables to session_live_gpu

# Start persistent mode (variables persist between calls)
session_live_gpu.start()

# Send local data to the remote session
data = [1, 2, 3, 4, 5, 99]
session_live_gpu.set("my_data", data)

# Use it in remote code
session_live_gpu.run("print(f'Data: {my_data}')")
session_live_gpu.run("print(f'Sum: {sum(my_data)}')")

session_live_gpu.stop()

Retrieving Variables from session_live_gpu

session_live_gpu.start()

# Compute something on the remote session
session_live_gpu.run("""
import torch
tensor = torch.randn(100, 100).cuda()
result_stats = {
    'mean': tensor.mean().item(),
    'std': tensor.std().item(),
    'shape': list(tensor.shape)
}
""")

# Get the result locally
stats = session_live_gpu.get("result_stats")
print(f"Mean: {stats['mean']:.4f}")
print(f"Std: {stats['std']:.4f}")
print(f"Shape: {stats['shape']}")

session_live_gpu.stop()

Sending Complex Objects

import numpy as np

session_live_gpu.start()

# Send numpy arrays
arr = np.random.randn(100, 100)
session_live_gpu.set("numpy_array", arr)

# Send dictionaries
config = {
    "learning_rate": 0.001,
    "batch_size": 32,
    "epochs": 100
}
session_live_gpu.set("config", config)

# Use in remote code
session_live_gpu.run("""
import torch
tensor = torch.from_numpy(numpy_array).cuda()
print(f"Learning rate: {config['learning_rate']}")
""")

session_live_gpu.stop()

File Transfer

Transfer files and folders between your local machine and the remote session session.

Uploading a Single File

# Upload a local file to the remote session
session_live_gpu.upload("./data.csv", "/home/coder/workspace/data.csv")

# Upload with custom path
session_live_gpu.upload("./model.pkl", "/home/coder/workspace/models/trained_model.pkl")

# Disable progress output
session_live_gpu.upload("./config.json", "/home/coder/workspace/config.json", show_progress=False)

Downloading a Single File

# Download a file from the remote session
session_live_gpu.download("/home/coder/workspace/results.csv", "./results.csv")

# Download trained model
session_live_gpu.download("/home/coder/workspace/checkpoints/model.pt", "./local_model.pt")

# Download silently
session_live_gpu.download("/home/coder/workspace/logs.txt", "./logs.txt", show_progress=False)

Uploading a Folder

Upload an entire directory with all its contents:

# Upload a project folder
session_live_gpu.upload_folder("./my_project", "/home/coder/workspace/project")

# Upload with exclusions (default excludes: __pycache__, .git, *.pyc, .DS_Store, node_modules)
session_live_gpu.upload_folder(
    "./my_project",
    "/home/coder/workspace/project",
    exclude=["*.log", ".env", "__pycache__", ".git"]
)

# Upload data folder
session_live_gpu.upload_folder("./datasets", "/home/coder/workspace/data")

Downloading a Folder

Download an entire directory with all its contents:

# Download results folder
session_live_gpu.download_folder("/home/coder/workspace/results", "./local_results")

# Download checkpoints
session_live_gpu.download_folder(
    "/home/coder/workspace/checkpoints",
    "./checkpoints",
    exclude=["*.tmp", "*.log"]
)

# Download trained models
session_live_gpu.download_folder("/home/coder/workspace/models", "./downloaded_models")

Listing Remote Files

# List files in a directory
files = session_live_gpu.list_files("/home/coder/workspace")
for f in files:
    icon = "📁" if f["is_dir"] else "📄"
    print(f"{icon} {f['name']} - {f['size']} bytes")

# Filter by pattern
python_files = session_live_gpu.list_files("/home/coder/workspace", pattern="*.py")
for f in python_files:
    print(f"📄 {f['name']}")

# List with full details
files = session_live_gpu.list_files("/home/coder/workspace")
for f in files:
    print(f"Name: {f['name']}")
    print(f"  Path: {f['path']}")
    print(f"  Size: {f['size']} bytes")
    print(f"  Is Directory: {f['is_dir']}")
    print(f"  Modified: {f['modified']}")

Checking if a File Exists

# Check before downloading
if session_live_gpu.file_exists("/home/coder/workspace/model.pt"):
    session_live_gpu.download("/home/coder/workspace/model.pt", "./model.pt")
    print("Model downloaded!")
else:
    print("Model not found, training required...")

# Check multiple files
files_to_check = ["config.json", "data.csv", "model.pt"]
for filename in files_to_check:
    path = f"/home/coder/workspace/{filename}"
    exists = session_live_gpu.file_exists(path)
    status = "✓" if exists else "✗"
    print(f"{status} {filename}")

Complete Workflow Example

from clouditia import GPUSession

session_live_gpu = GPUSession("ck_your_api_key")

# 1. Upload training data and code
session_live_gpu.upload_folder("./training_code", "/home/coder/workspace/code")
session_live_gpu.upload("./data/train.csv", "/home/coder/workspace/data/train.csv")
session_live_gpu.upload("./data/test.csv", "/home/coder/workspace/data/test.csv")

# 2. Run training
result = session_live_gpu.run("""
import sys
sys.path.insert(0, '/home/coder/workspace/code')
from train import train_model

model = train_model('/home/coder/workspace/data/train.csv')
model.save('/home/coder/workspace/output/model.pt')
print("Training complete!")
""")

# 3. Check and download results
if session_live_gpu.file_exists("/home/coder/workspace/output/model.pt"):
    session_live_gpu.download("/home/coder/workspace/output/model.pt", "./trained_model.pt")
    print("Model saved locally!")

# 4. Download all outputs
session_live_gpu.download_folder("/home/coder/workspace/output", "./results")
print("All results downloaded!")

# 5. List what was created
files = session_live_gpu.list_files("/home/coder/workspace/output")
print(f"Created {len(files)} files during training")

Working with Different File Types

# CSV files
session_live_gpu.upload("./data.csv", "/home/coder/workspace/data.csv")

# Pickle files (models, data)
session_live_gpu.upload("./model.pkl", "/home/coder/workspace/model.pkl")

# PyTorch models
session_live_gpu.download("/home/coder/workspace/checkpoint.pt", "./checkpoint.pt")

# JSON configuration
session_live_gpu.upload("./config.json", "/home/coder/workspace/config.json")

# Text files
session_live_gpu.upload("./requirements.txt", "/home/coder/workspace/requirements.txt")

# Binary files
session_live_gpu.upload("./image.png", "/home/coder/workspace/image.png")

# Any file type works!
session_live_gpu.upload("./data.parquet", "/home/coder/workspace/data.parquet")
session_live_gpu.upload("./weights.h5", "/home/coder/workspace/weights.h5")

S3 Output

Save your outputs directly to Amazon S3 or compatible storage (MinIO, etc.).

Installation

To use S3 features, install with the s3 extra:

pip install clouditia[s3]

Creating an S3 Connection

from clouditia import GPUSession

session_live_gpu = GPUSession("sk_live_your_api_key")

# Create S3 connection
s3 = session_live_gpu.s3_connect(
    bucket="my-ml-outputs",
    access_key="AKIAIOSFODNN7EXAMPLE",
    secret_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
    region="eu-west-1",
    prefix="experiments/run-001/"  # Optional: prefix for all uploads
)

output() vs output_file()

Two methods to save data to S3:

  • output(filename, data, s3) — saves a Python object in memory (variable, array, dict) to S3. The SDK serializes it automatically based on the file extension. The object doesn't need to exist on disk.

  • output_file(local_path, s3) — uploads an existing file from your local disk to S3. Useful for files generated by a script or downloaded.

Saving Python Objects to S3 (output)

The serialization format is auto-detected from the file extension:

  • .pt, .pth: PyTorch state dict
  • .npy: NumPy array
  • .json: JSON data
  • .pkl, .pickle: Pickle format (default)
# Save NumPy arrays
import numpy as np
embeddings = np.random.randn(1000, 768)
url = session_live_gpu.output("embeddings_aina_23052026.npy", embeddings, s3)

# Save JSON metrics
metrics = {"accuracy": 0.95, "loss": 0.05, "epoch": 100}
url = session_live_gpu.output("metrics.json", metrics, s3)

# Save any picklable object
results = {"predictions": [1, 2, 3], "embeddings": embeddings, "metrics": metrics}
url = session_live_gpu.output("results.pkl", results, s3)

Uploading Local Files to S3 (output_file)

# Upload a file that already exists on your local disk
url = session_live_gpu.output_file("./checkpoints/best_model.pt", s3)
print(f"Uploaded to: {url}")

# remote_filename: choose the path and name of the file on S3
# By default, the file keeps its local name (best_model.pt)
# With remote_filename, you choose the S3 path structure
url = session_live_gpu.output_file(
    "./model.pt",                                       # local file
    s3,
    remote_filename="models/production/v2.0/model.pt"   # path on S3
)
# Result on S3: s3://my-bucket/prefix/models/production/v2.0/model.pt

Saving Remote Session Data to S3 (remote_output / remote_output_file)

The output() and output_file() methods save local objects/files to S3. The remote_output() and remote_output_file() methods save objects/files from the remote session directly to S3, without transiting through your local machine.

Methode Source Destination Transit local
output() local Python object (in memory) S3 yes
output_file() local file (on disk) S3 yes
remote_output() Python variable on remote session S3 no
remote_output_file() file on remote session S3 no
# remote_output() requires persistent mode (variable must stay in memory)
session_live_gpu.start()

session_live_gpu.run("""
import torch
model = torch.nn.Linear(784, 10).cuda()
optimizer = torch.optim.Adam(model.parameters())
# ... training ...
results = {"accuracy": 0.95, "loss": 0.05, "epochs": 100}
torch.save(model.state_dict(), "/home/coder/workspace/model.pt")
""")

# Save the 'results' variable from the remote session to S3 (JSON format)
url = session_live_gpu.remote_output("results.json", "results", s3)

session_live_gpu.stop()

# remote_output_file() does NOT need start()/stop()
# because the file is on the pod's disk (it persists between calls)
url = session_live_gpu.remote_output_file("/home/coder/workspace/model.pt", s3)

# With a custom name on S3
url = session_live_gpu.remote_output_file(
    "/home/coder/workspace/model.pt",
    s3,
    s3_filename="models/production/v3/model.pt"
)

Using with MinIO or Other S3-Compatible Storage

# MinIO connection
s3_minio = session_live_gpu.s3_connect(
    bucket="ml-outputs",
    access_key="minio_user",
    secret_key="minio_password",
    endpoint="https://minio.endpoint.url",  # Custom endpoint : "http://minio.local:9000"
    region="us-east-1"
)

metrics_minio = {"accuracy": 0.95, "loss": 0.05, "epoch": 100}
session_live_gpu.output("metrics_minio_ok", metrics_minio, s3_minio)

Complete Training Workflow with S3 Output

from clouditia import GPUSession

session_live_gpu = GPUSession("sk_live_your_api_key")

# Configure S3 output
s3 = session_live_gpu.s3_connect(
    bucket="my-training-outputs",
    access_key="AKIA...",
    secret_key="...",
    prefix="training/experiment-001/"
)

# Start persistent session for training
session_live_gpu.start()

# Setup
session_live_gpu.run("""
import torch
import torch.nn as nn

model = nn.Linear(100, 10).cuda()
optimizer = torch.optim.Adam(model.parameters())
""")

# Training loop
session_live_gpu.run("""
for epoch in range(100):
    x = torch.randn(32, 100).cuda()
    y = model(x)
    loss = y.sum()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
""")

# Get model state and save to S3
session_live_gpu.run("final_state = model.state_dict()")
model_state = session_live_gpu.get("final_state")

url = session_live_gpu.output("trained_model.pt", model_state, s3)
print(f"Model saved to: {url}")

# Save training metrics
metrics = {"final_loss": 0.05, "epochs": 100}
session_live_gpu.output("metrics.json", metrics, s3)

session_live_gpu.stop()

Remote Functions (Decorator)

Use the @session_live_gpu.remote decorator to run functions on the remote session:

from clouditia import GPUSession

session_live_gpu = GPUSession("ck_your_api_key")

@session_live_gpu.remote
def compute_on_gpu(data, power=2):
    import torch
    tensor = torch.tensor(data, device='cuda', dtype=torch.float32)
    result = tensor ** power
    return result.cpu().tolist()

# Call the function - it runs on the remote session!
result = compute_on_gpu([1, 2, 3, 4, 5], power=2)
print(result)  # [1.0, 4.0, 9.0, 16.0, 25.0]

Remote Function with Model

@session_live_gpu.remote
def train_step(batch_data, learning_rate=0.01):
    import torch
    import torch.nn as nn

    # Create model (or load from checkpoint)
    model = nn.Sequential(
        nn.Linear(len(batch_data), 64),
        nn.ReLU(),
        nn.Linear(64, 1)
    ).cuda()

    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

    # Training step
    x = torch.tensor(batch_data, dtype=torch.float32).cuda()
    output = model(x)
    loss = output.sum()

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    return {"loss": loss.item()}

# Call it like a normal function
result = train_step([1.0, 2.0, 3.0, 4.0], learning_rate=0.001)
print(f"Loss: {result['loss']}")

Async Remote Functions

@session_live_gpu.remote(async_mode=True)
def long_training():
    import torch
    for epoch in range(100):
        print(f"Epoch {epoch}/100")
        # ... training code ...
    return {"status": "completed"}

# Returns an AsyncJob instead of waiting
job = long_training()
print(f"Job submitted: {job.job_id}")

# Wait for completion
result = job.wait(show_logs=True)

Async Jobs (Long-Running Tasks)

For tasks that take hours or days, use async jobs:

Submitting a Job

# Submit a long-running job
job = session_live_gpu.submit("""
import torch
import time

print("Starting training...")
for epoch in range(100):
    print(f"Epoch {epoch + 1}/100")
    time.sleep(1)  # Simulate training

print("Training complete!")
torch.save({'epoch': 100}, '/home/coder/workspace/checkpoint.pt')
""", name="my_training")

print(f"Job ID: {job.job_id}")

Monitoring Progress

import time

# Poll for status
while not job.is_done():
    status = job.status()
    print(f"Status: {status}")

    # View recent logs
    if status == "running":
        logs = job.logs(tail=10)
        print(logs)

    time.sleep(30)

print("Job finished!")

Real-Time Log Streaming

# View logs as they come in
while job.is_running():
    new_logs = job.logs(new_only=True)
    if new_logs.strip():
        print(new_logs, end='')
    time.sleep(5)

Waiting for Completion

# Wait with live log output
result = job.wait(show_logs=True)

# Or wait with timeout
try:
    result = job.wait(timeout=3600)  # 1 hour max
except TimeoutError:
    print("Job taking too long, cancelling...")
    job.cancel()

Getting Results

# Wait for the job to complete before getting the result
job.wait()  # bloque jusqu'a completion

# Get the result
result = job.result()

if job.status == "running":
    print("Job still running...")
elif result.success:
    print("Job completed successfully!")
    print(result.output)
else:
    print(f"Job failed: {result.error}")

Listing Jobs

# List all jobs
jobs = session_live_gpu.jobs()
for j in jobs:
    print(f"{j.name}: {j.status()}")

# List only running jobs
running_jobs = session_live_gpu.jobs(status="running")

# List completed jobs
completed_jobs = session_live_gpu.jobs(status="completed", limit=5)

Cancelling Jobs

if job.is_running():
    job.cancel()
    print("Job cancelled")

Shell Jobs

# Submit a shell command as an async job
job = session_live_gpu.submit(
    "pip install transformers && python /home/coder/workspace/train.py",
    name="install_and_train",
    job_type="shell"
)

Jupyter Magic

Use Clouditia directly in Jupyter notebooks with magic commands.

Loading the Extension

# In a Jupyter cell
%load_ext clouditia

# Set your API key
CLOUDITIA_API_KEY = "ck_your_api_key"

Running Code on Remote Session

%%clouditia
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")

x = torch.randn(1000, 1000, device='cuda')
y = torch.randn(1000, 1000, device='cuda')
z = torch.matmul(x, y)
print(f"Result shape: {z.shape}")

Specifying API Key Directly

%%clouditia ck_your_api_key
print("Hello from GPU!")

Async Mode in Jupyter

%%clouditia --async
for epoch in range(100):
    print(f"Epoch {epoch}")
    # ... training code ...

# The job is submitted and _clouditia_job variable is set
# Check job status
_clouditia_job.status()

# View logs
print(_clouditia_job.logs())

Utility Magic Commands

# Check session status
%clouditia_status

# List recent jobs
%clouditia_jobs

# List only running jobs
%clouditia_jobs running

Error Handling

The SDK provides specific exceptions for different error types:

from clouditia import (
    GPUSession,
    ClouditiaError,
    AuthenticationError,
    SessionError,
    ExecutionError,
    TimeoutError,
    CommandBlockedError
)

session_live_gpu = GPUSession("ck_your_api_key")

try:
    result = session_live_gpu.run("import torch; print(torch.cuda.is_available())")
except AuthenticationError:
    print("Invalid API key")
except SessionError:
    print("Session not running or not accessible")
except ExecutionError as e:
    print(f"Code execution failed: {e}")
except TimeoutError:
    print("Execution timed out - consider using async jobs")
except CommandBlockedError:
    print("Command blocked by security filters")
except ClouditiaError as e:
    print(f"General error: {e}")

Using raise_for_status()

result = session_live_gpu.run("import torch; print(torch.cuda.is_available())")
result.raise_for_status()  # Raises ExecutionError if failed
print(result.output)

API Reference

GPUSession

GPUSession(
    api_key: str,
    base_url: str = "https://clouditia.com/code-editor",
    timeout: int = 120,
    poll_interval: int = 5
)

Methods:

Method Description
Connection & Info
verify() Verify API key and get session info
wait_until_ready(timeout=600) Wait until session is fully ready (workspace restored)
gpu_info() Get GPU information (name, memory, CUDA version)
Code Execution
run(code, timeout=None, stream=True) Execute Python code (REPL-like: captures last expression)
exec(code, timeout=None) Execute code, raises ExecutionError on failure
shell(command, timeout=None) Execute shell command (security-filtered)
Persistent Mode
start() Start a persistent session (variables persist between calls)
stop() Stop the persistent session
set(name, value) Send a variable to the remote session
get(name) Retrieve a variable from the remote session
File Transfer
upload(local_path, remote_path, show_progress=True) Upload a file (auto-chunked for large files)
download(remote_path, local_path, show_progress=True) Download a file (auto-chunked for large files)
upload_folder(local_path, remote_path, exclude=None) Upload a folder (compressed + chunked)
download_folder(remote_path, local_path, exclude=None) Download a folder (compressed + chunked)
list_files(remote_path, pattern=None) List files in remote directory
file_exists(remote_path) Check if a file exists on remote
S3 Output (local)
s3_connect(bucket, access_key, secret_key, ...) Create S3 connection
output(filename, data, s3_connection) Save local Python object to S3
output_file(local_path, s3_connection, remote_filename=None) Upload local file to S3
S3 Output (remote — no local transit)
remote_output(filename, variable_name, s3) Save remote session variable directly to S3
remote_output_file(remote_path, s3, s3_filename=None) Upload remote session file directly to S3
Async Jobs
submit(code, name=None, job_type="python") Submit async background job
jobs(status=None, limit=10) List async jobs
Decorator
@remote Decorator to run a function on the remote session

Properties:

Property Type Description
is_persistent bool True if a persistent session is active
api_key str The API key used for authentication
base_url str The API base URL
timeout int Default timeout in seconds

ExecutionResult

ExecutionResult(
    output: str,      # all output (print statements + last expression)
    result: Any,      # value of the last line if it's an expression (None otherwise)
    error: str,       # error message if failed
    exit_code: int,   # process exit code
    success: bool     # True if execution succeeded
)

Difference between output and result:

  • output = all stdout + last expression (like a Jupyter cell)
  • result = only the last expression for programmatic use (e.g., int(result.result))

Methods:

Method Description
raise_for_status() Raise exception if failed
to_dict() Convert to dictionary
__bool__() True if success=True (use in if result:)
__str__() Returns output if success, "Error: ..." otherwise

AsyncJob

AsyncJob(session, job_id, name=None)

Methods:

Method Description
status() Get current status
is_done() Check if finished
is_running() Check if running
is_pending() Check if pending
logs(tail=50, new_only=False) Get logs
result() Get final result
cancel() Cancel the job
wait(timeout=None, show_logs=False) Wait for completion
get_info() Get detailed job info

S3Connection

S3Connection(
    bucket: str,           # S3 bucket name
    access_key: str,       # AWS Access Key ID
    secret_key: str,       # AWS Secret Access Key
    endpoint: str = "https://s3.amazonaws.com",  # S3 endpoint (for MinIO, etc.)
    region: str = "us-east-1",                   # AWS region
    prefix: str = ""                             # Optional prefix for uploads
)

Usage:

from clouditia import GPUSession, S3Connection

session_live_gpu = GPUSession("sk_live_...")

# Via method (recommended)
s3 = session_live_gpu.s3_connect(bucket="my-bucket", access_key="...", secret_key="...")

# Or create directly
s3 = S3Connection(bucket="my-bucket", access_key="...", secret_key="...")

Exceptions

Exception Description
ClouditiaError Base exception for all Clouditia errors
AuthenticationError Invalid or expired API key
SessionError Session not found, not running, or not accessible
ExecutionError Code execution failed on the remote session
TimeoutError Execution timed out
CommandBlockedError Command blocked by security filters

Hierarchy: All exceptions inherit from ClouditiaError.


Configuration

Environment Variables

You can set the API key via environment variable:

export CLOUDITIA_API_KEY="ck_your_api_key"
import os
from clouditia import GPUSession

session_live_gpu = GPUSession(os.environ["CLOUDITIA_API_KEY"])

Custom Base URL

session_live_gpu = GPUSession(
    "ck_your_api_key",
    base_url="https://custom.clouditia.com/code-editor"
)

Timeouts

# Set default timeout (seconds)
session_live_gpu = GPUSession("ck_your_api_key", timeout=300)

# Or per-request
result = session_live_gpu.run("long_computation()", timeout=600)

Examples

Training a Neural Network

from clouditia import GPUSession

session_live_gpu = GPUSession("ck_your_api_key")

# Submit training job
job = session_live_gpu.submit("""
import torch
import torch.nn as nn
import torch.optim as optim

# Create model
model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10)
).cuda()

optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Training loop
for epoch in range(10):
    # Simulated batch
    x = torch.randn(64, 784).cuda()
    y = torch.randint(0, 10, (64,)).cuda()

    optimizer.zero_grad()
    output = model(x)
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()

    print(f"Epoch {epoch+1}/10, Loss: {loss.item():.4f}")

# Save model
torch.save(model.state_dict(), '/home/coder/workspace/model.pt')
print("Training complete!")
""", name="mnist_training")

# Wait with live logs
result = job.wait(show_logs=True)

Data Processing Pipeline

# Create workspace
session_live_gpu.shell("mkdir -p ~/workspace/data ~/workspace/output")

# Download data
session_live_gpu.shell("cd ~/workspace/data && wget https://example.com/data.csv")

# Process data
result = session_live_gpu.run("""
import pandas as pd

# Load and process data
df = pd.read_csv('/home/coder/workspace/data/data.csv')
print(f"Loaded {len(df)} rows")

# Process...
df_processed = df.dropna()
print(f"After cleaning: {len(df_processed)} rows")

# Save
df_processed.to_csv('/home/coder/workspace/output/processed.csv', index=False)
print("Saved to /home/coder/workspace/output/processed.csv")
""")

print(result.output)

Support


License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clouditia-1.9.0.tar.gz (59.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clouditia-1.9.0-py3-none-any.whl (38.4 kB view details)

Uploaded Python 3

File details

Details for the file clouditia-1.9.0.tar.gz.

File metadata

  • Download URL: clouditia-1.9.0.tar.gz
  • Upload date:
  • Size: 59.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for clouditia-1.9.0.tar.gz
Algorithm Hash digest
SHA256 4758270f216a0a7f9963c6a0c68ac275f3533e2d6f773ce29cfa7d95345d3238
MD5 341356aeb801963f34cc58770f66d367
BLAKE2b-256 91ea884d21eb3597bc80e0b1e97fc231edb4a2b2edf61889be76d6363e237a82

See more details on using hashes here.

File details

Details for the file clouditia-1.9.0-py3-none-any.whl.

File metadata

  • Download URL: clouditia-1.9.0-py3-none-any.whl
  • Upload date:
  • Size: 38.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for clouditia-1.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1ee0fac534517f0d503b742e2d6e5a23196619473cd570e4676d8c41547ba8eb
MD5 c826b61e202e9d3a48beb4b051f2532a
BLAKE2b-256 c6f510bfa710b114a6cf1ad47cd501d73a1ba9370fbad48097d843d7bc8db83f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page