A fast CLI for submitting and managing Azure ML jobs via pure REST APIs
Project description
Azure Jobs
A fast, lightweight CLI for submitting Azure ML jobs through pure REST APIs — no azure-ai-ml SDK and no amlt runtime required.
aj run adds a template inheritance layer on top of three submission backends:
- native — direct Azure ML REST (default for AML / Singularity).
- amlt — delegates to the
amltCLI for compatibility. - volcano — submits to a Kubernetes Volcano cluster via
kubectl.
Install
pipx install azure_jobs
Requires az login. The volcano backend additionally needs kubectl configured against your cluster.
Quickstart
mkdir my-project && cd my-project
aj init # scaffold .azure_jobs/, register workspace
aj pull <user>/<repo> # (optional) clone shared templates
aj run -t gpu train.py # submit using the "gpu" template
.py scripts run via uv run, .sh via bash. Drop a .codeignore (or .amltignore) at the project root to exclude paths from the upload.
Templates
Templates live under .azure_jobs/template/ as YAML files.
aj init— scaffolds.azure_jobs/and (optionally) pulls a starter template repo.aj pull <user>/<repo>— clone a shared template repo into.azure_jobs/.- Hand-author — drop a YAML file into
.azure_jobs/template/.
Minimal leaf template (.azure_jobs/template/gpu.yaml):
base: [account.default, storage.default, environment.aml]
config:
target:
name: my-cluster
jobs:
- name: train
sku: "{nodes}xA100-80GB"
base chains other YAML files in .azure_jobs/ (dotted name → .azure_jobs/<dir>/<name>.yaml); {nodes} / {processes} are substituted from CLI flags. Inheritance, merge rules, and SKU formats are documented in docs/configuration.md.
aj template list # see what's available
aj template show <name> # resolved config (after inheritance)
aj template validate # sanity check
aj template push -m "msg" # commit + push back upstream
aj run
aj run -t gpu train.py # submit via REST
aj run train.py # reuse last template
aj run -t gpu -n 4 -p 8 train.py # 4 nodes × 8 GPUs/node
aj run -d train.py # dry run — print config, don't submit
aj run --amlt -t gpu train.py # submit via amlt instead
| Flag | Purpose |
|---|---|
-t |
Template name |
-n |
Number of nodes |
-p |
GPUs per node (drives SKU + AJ_PROCESSES) |
--ppn |
Launcher processes per node (e.g. torchrun --nproc-per-node) |
-d |
Dry run |
--amlt |
Submit via amlt |
Positional args after the script are forwarded verbatim to your command.
How it works
- Resolve the template, walk the
basechain, merge configs. - Apply CLI overrides (
-n/-p/--ppn). - Build a normalized
SubmitRequest. - Dispatch by backend:
- native — register environment (SHA-deduped) → upload code (content-addressed) →
PUT /jobs/{name}. - volcano — render Volcano Job YAML → upload code to a PVC via
kubectl exec+ tar →kubectl create. - amlt — write a submission YAML and shell out to
amlt run.
- native — register environment (SHA-deduped) → upload code (content-addressed) →
- Append a
SubmitRecordtorecord.jsonland print the portal URL.
Code uploads are content-addressed: identical (template + command + code) → identical hash → re-runs reuse the prior asset.
AJ_* environment variables
Exported into every job. Read them in your training script.
| Variable | Meaning |
|---|---|
AJ_NAME |
Job display name |
AJ_ID |
Submission ID (matches record.jsonl) |
AJ_TEMPLATE |
Template name used |
AJ_NODES |
Number of nodes |
AJ_GPUS_PER_NODE |
-p value |
AJ_PROCESSES |
AJ_NODES × AJ_GPUS_PER_NODE |
AJ_PROCESSES_PER_NODE |
--ppn value |
AJ_SUBMIT_TIMESTAMP_UTC |
Submission timestamp |
Example — torchrun with whatever the user requested:
torchrun \
--nnodes=$AJ_NODES \
--nproc_per_node=$AJ_GPUS_PER_NODE \
--node_rank=$RANK \
--master_addr=$MASTER_ADDR \
train.py
aj dash
Interactive TUI dashboard for browsing and managing cloud jobs.
aj dash
| Key | Action |
|---|---|
↑ ↓ |
Move selection |
← → |
Prev / next page |
enter / i |
Job detail panel |
l |
Open logs (auto-streams if the job is running) |
o |
Pick a different log file |
c |
Cancel the selected job |
r |
Refresh |
f / e / w |
Filter by status / experiment / workspace |
/ |
Search |
F |
Clear all filters |
esc |
Help overlay |
q |
Quit |
Use as a Python SDK
The same engine the CLI uses is exposed at the package root, so you can build and submit jobs from your own scripts:
from azure_jobs import (
Template,
build_submit_request,
submit_via_native, # also: submit_via_volcano, submit_via_amlt
get_workspace_config,
)
template = Template.from_conf_path(".azure_jobs/template/gpu.yaml")
request = build_submit_request(
template,
name="my-job", sid="abc123", sku="2xA100-80GB",
user_command="train.py", user_args=(),
workspace=get_workspace_config(),
template_name="gpu", nodes=2, processes=8,
code_dir="/path/to/project", # defaults to os.getcwd()
)
result = submit_via_native(request)
print(result.status, result.portal_url)
See docs/sdk.md for the full surface and a submit_and_record example.
Documentation
| Document | Contents |
|---|---|
| Commands | aj job, aj template, aj quota, aj sku, aj dash, ... |
| SDK | Programmatic submission API |
| Architecture | Module layout, submission flow, backends |
| Configuration | Templates, inheritance, merge rules, SKU formats |
| REST API | REST client design, endpoints, job body shape |
| Comparison | aj vs amlt feature matrix |
| Roadmap | Planned features |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file azure_jobs-0.3.1.tar.gz.
File metadata
- Download URL: azure_jobs-0.3.1.tar.gz
- Upload date:
- Size: 929.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce87c9ed0f228b954ed5dfc7b451eb9c56cedd327f9e228a61c53e6349e3c1fc
|
|
| MD5 |
a4104fa422b819a5727223842230ee14
|
|
| BLAKE2b-256 |
c0b00a5a6a2955380fb06a6ee038d5762feaf61cea8fe9fa648e11fe493bf19d
|
Provenance
The following attestation bundles were made for azure_jobs-0.3.1.tar.gz:
Publisher:
release.yml on HSPK/azure_jobs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
azure_jobs-0.3.1.tar.gz -
Subject digest:
ce87c9ed0f228b954ed5dfc7b451eb9c56cedd327f9e228a61c53e6349e3c1fc - Sigstore transparency entry: 1603688448
- Sigstore integration time:
-
Permalink:
HSPK/azure_jobs@563f65bf8ac578a8ff1235429662399db38b0374 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/HSPK
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@563f65bf8ac578a8ff1235429662399db38b0374 -
Trigger Event:
push
-
Statement type:
File details
Details for the file azure_jobs-0.3.1-py3-none-any.whl.
File metadata
- Download URL: azure_jobs-0.3.1-py3-none-any.whl
- Upload date:
- Size: 230.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64d24dd2546233bef0c75c6b3e9340c59b93b7bcde5fc359e23b356d1b9e0e08
|
|
| MD5 |
adacb3f9dce74e66c3b5b90d457c0898
|
|
| BLAKE2b-256 |
43e17c36d98bdea10f70d3621f178ed571312f968c0fdf72d756acc41af63b02
|
Provenance
The following attestation bundles were made for azure_jobs-0.3.1-py3-none-any.whl:
Publisher:
release.yml on HSPK/azure_jobs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
azure_jobs-0.3.1-py3-none-any.whl -
Subject digest:
64d24dd2546233bef0c75c6b3e9340c59b93b7bcde5fc359e23b356d1b9e0e08 - Sigstore transparency entry: 1603688543
- Sigstore integration time:
-
Permalink:
HSPK/azure_jobs@563f65bf8ac578a8ff1235429662399db38b0374 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/HSPK
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@563f65bf8ac578a8ff1235429662399db38b0374 -
Trigger Event:
push
-
Statement type: