Skip to main content

MLOps Python SDK for XCloud Service API

Project description

MLOps Python SDK

MLOps Python SDK for XCloud Service API. Manage and execute tasks with confidence.

Installation

Install the SDK from PyPI:

pip install mlops-python-sdk

Quick Start

1. Setup Authentication

You can authenticate using either an API Key or an Access Token.

Option 1: API Key (Recommended for programmatic access)

  1. Sign up at MLOps
  2. Create an API key from API Keys
  3. Set environment variables:
export MLOPS_API_KEY=xck_******
export MLOPS_DOMAIN=localhost:8090  # optional, default is localhost:8090

Option 2: Access Token (For user authentication)

export MLOPS_ACCESS_TOKEN=your_access_token
export MLOPS_DOMAIN=localhost:8090  # optional

2. Basic Usage

from mlops import Task, ConnectionConfig
from mlops.api.client.models.task_status import TaskStatus

# Initialize Task client (uses environment variables by default)
task = Task()

# Or initialize with explicit configuration
config = ConnectionConfig(
    api_key="xck_******",
    domain="localhost:8090",
    debug=False
)
task = Task(config=config)

# Submit a task with script
result = task.submit(
    name="my-training-task",
    cluster_id=1,
    script="#!/bin/bash\necho 'Hello World'",
    resources={"cpu": 4, "memory": "8GB", "gpu": 1}
)

# Or submit with command
result = task.submit(
    name="my-task",
    cluster_id=1,
    command="python train.py",
    resources={"cpu": 4, "memory": "8GB"}
)

# Get task details
task_info = task.get(task_id=result.job_id, cluster_id=1)

# List tasks with filters
running_tasks = task.list(
    status=TaskStatus.RUNNING,
    cluster_id=1,
    page=1,
    page_size=20
)

# Cancel a task
task.cancel(task_id=result.job_id, cluster_id=1)

# Delete a task
task.delete(task_id=task_id, cluster_id=1)

API Reference

Task Class

The Task class provides a high-level interface for managing tasks.

Initialization

from mlops import Task, ConnectionConfig

# Using environment variables
task = Task()

# With explicit configuration
config = ConnectionConfig(
    api_key="xck_******",           # API key for authentication
    access_token="token_******",     # Access token (alternative to API key)
    domain="localhost:8090",         # API domain
    debug=False,                      # Enable debug mode
    request_timeout=30.0              # Request timeout in seconds
)
task = Task(config=config)

# Or pass parameters directly
task = Task(
    api_key="xck_******",
    domain="localhost:8090"
)

Methods

submit()

Submit a new task to the cluster.

result = task.submit(
    name: str,                    # Task name (required)
    cluster_id: int,              # Cluster ID (required)
    script: Optional[str] = None, # Script content (script or command required)
    command: Optional[str] = None,# Command to execute (script or command required)
    resources: Optional[dict] = None, # Resource requirements
    team_id: Optional[int] = None # Team ID (optional)
) -> TaskSubmitResponse

Resources dictionary can contain:

  • cpu or cpus_per_task: Number of CPUs
  • memory: Memory requirement (e.g., "8GB", "4096M")
  • nodes: Number of nodes
  • gres: GPU resources (e.g., "gpu:1")
  • time: Time limit (e.g., "1-00:00:00" for 1 day)
  • partition: Partition name
  • tres: TRES specification

Example:

result = task.submit(
    name="ml-training",
    cluster_id=1,
    script="#!/bin/bash\npython train.py --epochs 100",
    resources={
        "cpu": 8,
        "memory": "16GB",
        "gpu": 1,
        "time": "2-00:00:00",  # 2 days
        "partition": "gpu"
    }
)
print(f"Task submitted: Job ID = {result.job_id}")
get()

Get task details by task ID.

task_info = task.get(
    task_id: int,    # Task ID (Slurm job ID)
    cluster_id: int  # Cluster ID (required)
) -> Task

Example:

task_info = task.get(task_id=12345, cluster_id=1)
print(f"Task status: {task_info.status}")
print(f"Task name: {task_info.name}")
list()

List tasks with optional filters and pagination.

tasks = task.list(
    page: int = 1,                           # Page number
    page_size: int = 20,                     # Items per page
    status: Optional[TaskStatus] = None,     # Filter by status
    cluster_id: Optional[int] = None,         # Filter by cluster ID
    team_id: Optional[int] = None,           # Filter by team ID
    user_id: Optional[int] = None            # Filter by user ID
) -> TaskListResponse

Example:

from mlops.api.client.models.task_status import TaskStatus

# List all running tasks
running_tasks = task.list(status=TaskStatus.RUNNING)

# List tasks in a specific cluster
cluster_tasks = task.list(cluster_id=1, page=1, page_size=10)

# List completed tasks with pagination
completed = task.list(
    status=TaskStatus.COMPLETED,
    cluster_id=1,
    page=1,
    page_size=50
)
cancel()

Cancel a running task.

task.cancel(
    task_id: int,    # Task ID (Slurm job ID)
    cluster_id: int  # Cluster ID (required)
)

Example:

task.cancel(task_id=12345, cluster_id=1)

TaskStatus Enum

Task status values for filtering:

from mlops.api.client.models.task_status import TaskStatus

TaskStatus.PENDING      # Task is pending
TaskStatus.QUEUED       # Task is queued
TaskStatus.RUNNING      # Task is running
TaskStatus.COMPLETED    # Task completed successfully
TaskStatus.SUCCEEDED    # Task succeeded
TaskStatus.FAILED       # Task failed
TaskStatus.CANCELLED    # Task was cancelled
TaskStatus.CREATED      # Task was created

Configuration

Environment Variables

The SDK reads configuration from environment variables:

  • MLOPS_API_KEY: API key for authentication
  • MLOPS_ACCESS_TOKEN: Access token for authentication (alternative to API key)
  • MLOPS_DOMAIN: API domain (default: localhost:8090)
  • MLOPS_DEBUG: Enable debug mode (true/false, default: false)
  • MLOPS_API_PATH: API path prefix (default: /api/v1)

ConnectionConfig

You can also configure the connection programmatically:

from mlops import ConnectionConfig

config = ConnectionConfig(
    domain="api.example.com",
    api_key="xck_******",
    debug=True,
    request_timeout=60.0,
    api_path="/api/v1"
)

Error Handling

The SDK provides specific exception types:

from mlops.exceptions import (
    APIException,           # General API errors
    AuthenticationException, # Authentication failures
    NotFoundException,       # Resource not found
    RateLimitException,     # Rate limit exceeded
    TimeoutException,       # Request timeout
    InvalidArgumentException # Invalid arguments
)

try:
    result = task.submit(name="test", cluster_id=1, command="echo hello")
except AuthenticationException as e:
    print(f"Authentication failed: {e}")
except NotFoundException as e:
    print(f"Resource not found: {e}")
except APIException as e:
    print(f"API error: {e}")

Examples

Submit a Machine Learning Training Job

from mlops import Task

task = Task()

result = task.submit(
    name="pytorch-training",
    cluster_id=1,
    script="""#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=2
#SBATCH --mem=4GB

python train.py --config config.yaml
""",
    resources={
        "cpus_per_task": 2,
        "memory": "4GB",
        "gres": "gpu:1",
        "time": "1-00:00:00",  # 1 days
        "partition": "gpu"
    }
)

print(f"Training job submitted: {result.job_id}")

Monitor Task Status

from mlops import Task
from mlops.api.client.models.task_status import TaskStatus
import time

task = Task()
job_id = 12345
cluster_id = 1

while True:
    task_info = task.get(task_id=job_id, cluster_id=cluster_id)
    print(f"Status: {task_info.status}")
    
    if task_info.status in [TaskStatus.COMPLETED, TaskStatus.FAILED, TaskStatus.CANCELLED]:
        break
    
    time.sleep(10)  # Check every 10 seconds

List and Filter Tasks

from mlops import Task
from mlops.api.client.models.task_status import TaskStatus

task = Task()

# Get all running tasks in cluster 1
running = task.list(
    status=TaskStatus.RUNNING,
    cluster_id=1
)

for t in running.tasks:
    print(f"{t.name}: {t.status} (Job ID: {t.job_id})")

# Get failed tasks
failed = task.list(status=TaskStatus.FAILED)

print(f"Total failed tasks: {failed.total}")

Documentation

License

MIT

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlops_python_sdk-1.0.0.tar.gz (30.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlops_python_sdk-1.0.0-py3-none-any.whl (45.2 kB view details)

Uploaded Python 3

File details

Details for the file mlops_python_sdk-1.0.0.tar.gz.

File metadata

  • Download URL: mlops_python_sdk-1.0.0.tar.gz
  • Upload date:
  • Size: 30.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.8 Darwin/23.2.0

File hashes

Hashes for mlops_python_sdk-1.0.0.tar.gz
Algorithm Hash digest
SHA256 52247cddbe75e8672ae09d236a1dd68be7fa63b177d3e4ebec3aa0f32ebc119c
MD5 63b5c07658e16298c4da928e2c99d05d
BLAKE2b-256 1257ed28a0c0c508930843eb95ecbd35b7f126a986a6fb81ccc9a3bafb4a5902

See more details on using hashes here.

File details

Details for the file mlops_python_sdk-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: mlops_python_sdk-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 45.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.8 Darwin/23.2.0

File hashes

Hashes for mlops_python_sdk-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7940c01941f876b9b0669e7e5188d8e27a97580e882c399d5eb1833b097a9904
MD5 c36a67e28fec7d3e5ea3ba87c11640e6
BLAKE2b-256 5e3f37d1f1d2a314fce77ef02c3d592477d6c4217294500f9b6aec6c24c47d7b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page