Python SDK for the AI Factory Compute API
Project description
AI Factory SDK
Python SDK for the AI Factory Compute API — submit and manage HPC jobs from Python.
Features
- Synchronous and asynchronous clients (
AIFactoryClient,AsyncAIFactoryClient) - Typed request/response models with Pydantic validation
- Job polling with configurable timeout and retry (
client.wait()) - Automatic retry on transient errors (429, 5xx)
- PEP 561 compatible — full type annotation coverage
Installation
pip install ai-factory-sdk
Or with uv:
uv add ai-factory-sdk
Pre-release versions
Development builds published from the dev branch use PEP 440 pre-release
suffixes (e.g., 0.2.0.dev1). Install them with:
pip install ai-factory-sdk --pre
Quick Start
from ai_factory.sdk import AIFactoryClient, JobRequest
# Credentials via environment: AI_FACTORY_API_KEY, AI_FACTORY_SLURM_USER
# Or pass explicitly:
with AIFactoryClient(token="...", slurm_user="jane") as client:
# Submit a job
resp = client.submit_job(
JobRequest(name="hello", script="#!/bin/bash\necho Hello from SLURM")
)
print(f"Submitted job {resp.job_id}")
# Wait for completion
if resp.job_id is not None:
detail = client.wait(str(resp.job_id), timeout=3600)
print(f"Job finished with status: {detail.status}")
Async Usage
import asyncio
from ai_factory.sdk import AsyncAIFactoryClient, JobRequest
async def main():
async with AsyncAIFactoryClient(token="...", slurm_user="jane") as client:
resp = await client.submit_job(
JobRequest(name="async-job", script="#!/bin/bash\nsleep 10 && echo done")
)
if resp.job_id is not None:
detail = await client.wait(str(resp.job_id))
print(detail.status)
asyncio.run(main())
Container Jobs
from ai_factory.sdk import AIFactoryClient, ContainerJobRequest
with AIFactoryClient(token="...", slurm_user="jane") as client:
resp = client.submit_container(
ContainerJobRequest(
name="gpu-training",
image="docker://nvcr.io/nvidia/pytorch:24.01-py3",
container_command="python train.py",
gres="gpu:a40:1",
time_limit=120,
)
)
Configuration
| Parameter | Environment Variable | Default |
|---|---|---|
base_url |
AI_FACTORY_API_URL |
https://compute-api.ai-factory.datalab.tuwien.ac.at/compute-api/v1 |
token |
AI_FACTORY_API_KEY |
(required) |
slurm_user |
AI_FACTORY_SLURM_USER |
(required) |
timeout |
— | 30.0 (HTTP timeout in seconds) |
Constructor parameters take precedence over environment variables.
API Reference
Clients
| Class | Description |
|---|---|
AIFactoryClient |
Synchronous client (context manager) |
AsyncAIFactoryClient |
Asynchronous client (async context manager) |
Methods
| Method | Description |
|---|---|
submit_job(request) |
Submit a Slurm job script |
submit_container(request) |
Submit a containerised job |
get_job(job_id) |
Get job details by ID |
list_jobs(...) |
List jobs with optional filters and pagination |
cancel_job(job_id) |
Cancel a running or pending job |
wait(job_id, ...) |
Poll until the job reaches a terminal state |
Request Models
| Model | Fields |
|---|---|
JobRequest |
name, script, partition, tasks, cpus_per_task, time_limit, gres, standard_output, standard_error |
ContainerJobRequest |
name, image, container_command, partition, tasks, cpus_per_task, time_limit, gres, standard_output, standard_error |
Response Models
| Model | Fields |
|---|---|
SubmitJobResponse |
job_id, output_dir, logs_url |
JobDetail |
job_id, name, status, partition, nodes, exit_code, duration, start_time, end_time, submit_time, working_directory, standard_output, standard_error, gres, output_dir, logs_url |
JobListItem |
job_id, name, status, duration, start_time, end_time |
JobList |
jobs, total, limit, offset |
CancelJobResponse |
message |
Exceptions
| Exception | When |
|---|---|
SDKError |
Base for all SDK errors |
APIError |
Non-2xx HTTP response |
AuthError |
401 or 403 response |
NotFoundError |
404 response |
WaitTimeoutError |
wait() exceeded its deadline |
Requirements
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_factory_sdk-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ai_factory_sdk-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f57ac214e6470ccac9f7da328eb90589e11bc0984476980f6fee74fc299bed94
|
|
| MD5 |
b8b2a30a39564473254aa60675593a7d
|
|
| BLAKE2b-256 |
a0417e2020dafbee01fcf11ff160684f9ecd04e631d993feae4e5393005de701
|