Python SDK for the AI Factory Compute API
Project description
AI Factory SDK
Python SDK for the AI Factory Compute API — submit and manage HPC jobs from Python.
Features
- Synchronous and asynchronous clients (
AIFactoryClient,AsyncAIFactoryClient) - Typed request/response models with Pydantic validation
- Job polling with configurable timeout and retry (
client.wait()) - Automatic retry on transient errors (429, 5xx)
- PEP 561 compatible — full type annotation coverage
Installation
pip install ai-factory-sdk
Or with uv:
uv add ai-factory-sdk
Pre-release versions
Development builds published from the dev branch use PEP 440 pre-release
suffixes (e.g., 0.2.0.dev1). Install them with:
pip install ai-factory-sdk --pre
Quick Start
from ai_factory.sdk import AIFactoryClient, JobRequest
# Credentials resolve from ~/.ai-factory/config.yaml, env vars, or constructor
# args (see "Configuration" below). Passed explicitly here for clarity:
with AIFactoryClient(token="...", slurm_user="jane") as client:
# Submit a job
resp = client.submit_job(
JobRequest(name="hello", script="#!/bin/bash\necho Hello from SLURM")
)
print(f"Submitted job {resp.job_id}")
# Wait for completion
if resp.job_id is not None:
detail = client.wait(str(resp.job_id), timeout=3600)
print(f"Job finished with status: {detail.status}")
Async Usage
import asyncio
from ai_factory.sdk import AsyncAIFactoryClient, JobRequest
async def main():
async with AsyncAIFactoryClient(token="...", slurm_user="jane") as client:
resp = await client.submit_job(
JobRequest(name="async-job", script="#!/bin/bash\nsleep 10 && echo done")
)
if resp.job_id is not None:
detail = await client.wait(str(resp.job_id))
print(detail.status)
asyncio.run(main())
Container Jobs
from ai_factory.sdk import AIFactoryClient, ContainerJobRequest
with AIFactoryClient(token="...", slurm_user="jane") as client:
resp = client.submit_container(
ContainerJobRequest(
name="gpu-training",
image="docker://nvcr.io/nvidia/pytorch:24.01-py3",
container_command="python train.py",
gres="gpu:a40:1",
time_limit=120,
)
)
Configuration
Credentials resolve from three sources, in priority order:
- Explicit constructor arguments —
Client(token=..., slurm_user=...). - Environment variables.
- YAML config file at
~/.ai-factory/config.yaml.
If a required value is missing from all three sources, Client() raises
ValueError with a message listing all three options.
| Parameter | Environment Variable | Config File Key | Default |
|---|---|---|---|
base_url |
AI_FACTORY_API_URL |
api_url |
https://aifactory.ai-factory.datalab.tuwien.ac.at/compute-api/v1 |
token |
AI_FACTORY_API_KEY |
api_key |
(required) |
slurm_user |
AI_FACTORY_SLURM_USER |
slurm_user |
(required) |
timeout |
— | — | 30.0 (HTTP timeout in seconds) |
Config file
Example ~/.ai-factory/config.yaml:
api_url: "https://aifactory.ai-factory.datalab.tuwien.ac.at/compute-api/v1"
api_key: "eyJhbGciOiJSUzI1NiIs..."
slurm_user: "jane.doe"
Secure the file so only your user can read it:
chmod 600 ~/.ai-factory/config.yaml
The SDK emits a UserWarning when a Client() is constructed if the file
is group- or world-accessible. A malformed or unreadable file raises
ConfigFileError (a subclass of SDKError).
API Reference
Clients
| Class | Description |
|---|---|
AIFactoryClient |
Synchronous client (context manager) |
AsyncAIFactoryClient |
Asynchronous client (async context manager) |
Methods
| Method | Description |
|---|---|
submit_job(request) |
Submit a Slurm job script |
submit_container(request) |
Submit a containerised job |
get_job(job_id) |
Get job details by ID |
list_jobs(...) |
List jobs with optional filters and pagination |
cancel_job(job_id) |
Cancel a running or pending job |
wait(job_id, ...) |
Poll until the job reaches a terminal state |
Request Models
| Model | Fields |
|---|---|
JobRequest |
name, script, partition, tasks, cpus_per_task, time_limit, gres, standard_output, standard_error |
ContainerJobRequest |
name, image, container_command, partition, tasks, cpus_per_task, time_limit, gres, standard_output, standard_error |
Response Models
| Model | Fields |
|---|---|
SubmitJobResponse |
job_id, output_dir, logs_url |
JobDetail |
job_id, name, status, partition, nodes, exit_code, duration, start_time, end_time, submit_time, working_directory, standard_output, standard_error, gres, output_dir, logs_url |
JobListItem |
job_id, name, status, duration, start_time, end_time |
JobList |
jobs, total, limit, offset |
CancelJobResponse |
message |
Exceptions
| Exception | When |
|---|---|
SDKError |
Base for all SDK errors |
APIError |
Non-2xx HTTP response |
AuthError |
401 or 403 response |
NotFoundError |
404 response |
WaitTimeoutError |
wait() exceeded its deadline |
ConfigFileError |
~/.ai-factory/config.yaml unreadable or malformed |
Requirements
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_factory_sdk-0.2.0.dev2-py3-none-any.whl.
File metadata
- Download URL: ai_factory_sdk-0.2.0.dev2-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
facc0a6bf36de2883de72af22b57d4d371ecc6fa8e5a188d54f3be14aafdf3d9
|
|
| MD5 |
b1c2aaaae1ae25c5aa73a0c8e8041dd0
|
|
| BLAKE2b-256 |
b0aff9740beeb47c44c7adb7368bf445c1b74a334c64a6d04658e4bb8d6ad13a
|