Skip to main content

Python SDK for the AI Factory Compute API

Project description

AI Factory SDK

Python SDK for the AI Factory Compute API — submit and manage HPC jobs from Python.

Features

  • Synchronous and asynchronous clients (AIFactoryClient, AsyncAIFactoryClient)
  • Typed request/response models with Pydantic validation
  • Job polling with configurable timeout and retry (client.wait())
  • Automatic retry on transient errors (429, 5xx)
  • PEP 561 compatible — full type annotation coverage

Installation

pip install ai-factory-sdk

Or with uv:

uv add ai-factory-sdk

Pre-release versions

Development builds published from the dev branch use PEP 440 pre-release suffixes (e.g., 0.2.0.dev1). Install them with:

pip install ai-factory-sdk --pre

Quick Start

from ai_factory.sdk import AIFactoryClient, JobRequest

# Credentials via environment: AI_FACTORY_API_KEY, AI_FACTORY_SLURM_USER
# Or pass explicitly:
with AIFactoryClient(token="...", slurm_user="jane") as client:
    # Submit a job
    resp = client.submit_job(
        JobRequest(name="hello", script="#!/bin/bash\necho Hello from SLURM")
    )
    print(f"Submitted job {resp.job_id}")

    # Wait for completion
    if resp.job_id is not None:
        detail = client.wait(str(resp.job_id), timeout=3600)
        print(f"Job finished with status: {detail.status}")

Async Usage

import asyncio
from ai_factory.sdk import AsyncAIFactoryClient, JobRequest

async def main():
    async with AsyncAIFactoryClient(token="...", slurm_user="jane") as client:
        resp = await client.submit_job(
            JobRequest(name="async-job", script="#!/bin/bash\nsleep 10 && echo done")
        )
        if resp.job_id is not None:
            detail = await client.wait(str(resp.job_id))
            print(detail.status)

asyncio.run(main())

Container Jobs

from ai_factory.sdk import AIFactoryClient, ContainerJobRequest

with AIFactoryClient(token="...", slurm_user="jane") as client:
    resp = client.submit_container(
        ContainerJobRequest(
            name="gpu-training",
            image="docker://nvcr.io/nvidia/pytorch:24.01-py3",
            container_command="python train.py",
            gres="gpu:a40:1",
            time_limit=120,
        )
    )

Configuration

Parameter Environment Variable Default
base_url AI_FACTORY_API_URL https://compute-api.ai-factory.datalab.tuwien.ac.at/compute-api/v1
token AI_FACTORY_API_KEY (required)
slurm_user AI_FACTORY_SLURM_USER (required)
timeout 30.0 (HTTP timeout in seconds)

Constructor parameters take precedence over environment variables.

API Reference

Clients

Class Description
AIFactoryClient Synchronous client (context manager)
AsyncAIFactoryClient Asynchronous client (async context manager)

Methods

Method Description
submit_job(request) Submit a Slurm job script
submit_container(request) Submit a containerised job
get_job(job_id) Get job details by ID
list_jobs(...) List jobs with optional filters and pagination
cancel_job(job_id) Cancel a running or pending job
wait(job_id, ...) Poll until the job reaches a terminal state

Request Models

Model Fields
JobRequest name, script, partition, tasks, cpus_per_task, time_limit, gres, standard_output, standard_error
ContainerJobRequest name, image, container_command, partition, tasks, cpus_per_task, time_limit, gres, standard_output, standard_error

Response Models

Model Fields
SubmitJobResponse job_id, output_dir, logs_url
JobDetail job_id, name, status, partition, nodes, exit_code, duration, start_time, end_time, submit_time, working_directory, standard_output, standard_error, gres, output_dir, logs_url
JobListItem job_id, name, status, duration, start_time, end_time
JobList jobs, total, limit, offset
CancelJobResponse message

Exceptions

Exception When
SDKError Base for all SDK errors
APIError Non-2xx HTTP response
AuthError 401 or 403 response
NotFoundError 404 response
WaitTimeoutError wait() exceeded its deadline

Requirements

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_factory_sdk-0.1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file ai_factory_sdk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ai_factory_sdk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for ai_factory_sdk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f57ac214e6470ccac9f7da328eb90589e11bc0984476980f6fee74fc299bed94
MD5 b8b2a30a39564473254aa60675593a7d
BLAKE2b-256 a0417e2020dafbee01fcf11ff160684f9ecd04e631d993feae4e5393005de701

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page