Skip to main content

AmSC Resource Orchestration Client Toolkit

Project description

Table of contents

Description

The American Science Cloud Infrastructure Services Resource Orchestration Toolkit (AmSC-ISRO-Toolkit [AmSCROT]) provides infrastructure orchestrtion for AmSC use.

Installation

pip install amscrot-py

Operation Instructions

Credentials

AmSCROT reads provider credentials from ~/.amscrot/credentials.yml. Each section key corresponds to a service type or profile name:

esnet-iri:
  api_key: <token>
  api_endpoint: https://iri.es.net/api/v1

nersc-iri:
  api_key: <token>
  api_endpoint: https://api.iri.nersc.gov/api/v1

amsc-iro:
  api_key: <token>
  api_endpoint: https://...

Core Concepts

Class Role
Client Top-level entry point; owns sessions and service clients
ServiceClient Provider-specific driver (Kube/Kueue, ESnet IRI, NERSC IRI, AMSC-IRO)
Session Named unit of work; groups jobs and persists state to disk
Job A single compute task bound to a ServiceClient
JobSpec Declares executable, arguments, resources, and provider attributes
DiscoveryResult Typed result from service_client.discover()

Basic Usage

1. Set up a Client and ServiceClient

from amscrot.client.client import Client
from amscrot.serviceclient import ServiceClient
from amscrot.util.constants import Constants

client = Client()

# Choose a provider: KUBE, ESNET_IRI, NERSC_IRI, AMSC_IRO
svc = ServiceClient.create(
    type=Constants.ServiceType.NERSC_IRI,
    name="nersc-compute",
    profile="nersc-iri"           # matches credentials.yml section
)
client.add_service_client(svc)

2. Discover Available Resources

result = svc.discover()      # returns DiscoveryResult

# Iterate typed resources
for item in result.by_type("compute"):
    print(item.data["id"], item.data["name"])

# Normalized Facility objects (provider-agnostic)
for facility in result.facilities:
    print(facility.name, [c.cores for c in (facility.compute or [])])

3. Define a Job

from amscrot.client.job import Job, JobSpec, JobType, JobServiceType

spec = JobSpec(
    executable="python",
    arguments=["-c", "print('hello')"],
    resources={"requests": {"cpu": "1", "memory": "4Gi"}},
    attributes={
        "container": {"image": "python:3.12-slim"},  # provider image
        "resource_id": "<compute-resource-id>"       # from discovery
    }
)

job = Job(
    name="my-job",
    type=JobType.COMPUTE,
    service_type=JobServiceType.BATCH,
    service_client=svc,
    job_spec=spec
)

4. Create a Session and Submit

session = client.create_session("my-session")
session.add_job(job)

# Validate (raises PlanError on failure)
session.plan(verbose=True)

# Submit all jobs
session.apply()

5. Wait for Completion

from amscrot.client.job import JobState

results = session.wait(
    timeout=300,
    interval=5,
    verbose=True,
)

for job_name, status in results.items():
    print(f"{job_name}: {status.state}  message={status.message}")

session.wait() polls until all jobs reach a terminal state (COMPLETED, FAILED, or CANCELED). Pass jobs=[job1, job2] to wait on a subset.

6. Clean Up

session.destroy()   # cancels running jobs and removes session state

Sessions are persisted to ~/.amscrot/sessions/<session-name>/ so they survive process restarts. An existing session is restored automatically on client.create_session(name).

Kubernetes / Kueue Jobs

spec = JobSpec(
    executable="sleep",
    arguments=["30"],
    resources={"requests": {"cpu": "1", "memory": "1Gi"}},
    attributes={
        "container": {"image": "busybox"},
        "namespace": "default",
        "labels": {"kueue.x-k8s.io/queue-name": "compute-queue"},
        "completions": 1,
        "restartPolicy": "Never"
    }
)

See scripts/kube/setup-keueu.sh to install Kueue and create the required ResourceFlavor, ClusterQueue, LocalQueue, and PriorityClass resources on your cluster.

Jupyter Notebook Examples

Interactive notebooks are provided under examples/notebooks/client/.

Notebook Description
amsc_hello_world Start here. Walks through installation, credential setup, creating a Client/Session, submitting a job, monitoring with session.wait(), and cleanup with session.destroy().
amsc_gpt2_training_job Submits a GPT-2 training job to a remote IRI compute resource, polls for completion, and fetches log output.
amsc_iri_multisite Demonstrates multi-site job submission across ESnet IRI East and West endpoints using a single session.
amsc_iro_net_xfer Uses the AMSC-IRO backend to orchestrate networked data-transfer jobs with L2 network metadata.

Apache Airflow Support

The airflow/ subdirectory provides custom Airflow operators for submitting and monitoring IRI compute jobs as part of larger data pipelines.

Operators

  • IriJobSubmitOperator — Plans, submits, and waits for an IRI job. Accepts service_type, profile, executable, resources, and attributes. Pushes job_id to XCom on completion.
  • IriFetchOutputOperator — Downloads stdout/stderr from a completed job to a local directory.

Quick Start

cd airflow/
bash setup_airflow.sh --start   # installs deps, initialises DB, launches standalone Airflow

Open http://localhost:8080, configure credentials in ~/.amscrot/credentials.yml, then trigger the demo DAG (esnet_iri_example or gpt2_training_job) from the UI or via the REST API:

curl -X POST http://localhost:8080/api/v2/dags/esnet_iri_example/dagRuns \
     -H "Content-Type: application/json" \
     -u "admin:<password>" -d '{}'

See airflow/README.md for the full operator reference and configuration options.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amscrot_py-1.0.0.post16.tar.gz (209.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

amscrot_py-1.0.0.post16-py3-none-any.whl (273.4 kB view details)

Uploaded Python 3

File details

Details for the file amscrot_py-1.0.0.post16.tar.gz.

File metadata

  • Download URL: amscrot_py-1.0.0.post16.tar.gz
  • Upload date:
  • Size: 209.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for amscrot_py-1.0.0.post16.tar.gz
Algorithm Hash digest
SHA256 68de530ddbdfd20d4e7468a816d03e0f33e38f4a98a06c0c11787d973a75e62a
MD5 55bdf1ea6602b17c04da6dfaecd74654
BLAKE2b-256 e88c17daa9a03e73eac76b18de7b9549e246d5c6ed3bdf875a7372daa123e789

See more details on using hashes here.

File details

Details for the file amscrot_py-1.0.0.post16-py3-none-any.whl.

File metadata

File hashes

Hashes for amscrot_py-1.0.0.post16-py3-none-any.whl
Algorithm Hash digest
SHA256 1e7666fa791b2bf43023e853961415ed3a17b83537835728fb60d412d0fd0276
MD5 ade8103dbc57048ced74383efd486e96
BLAKE2b-256 19e3ca9f25f7e7ec84a90cc2f2835a41a844fa12fe42397aae1ea755b84a1ca1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page