Skip to main content

Discover providers for resources (software, workload managers) for agentic science and beyond!

Project description

resource secretary

Discover providers for resources (software, workload managers) for agentic science and beyond!

https://github.com/converged-computing/resource-secretary/blob/main/img/resource-secretary-small.png?raw=true

PyPI - Version

Design

We have different needs to discover resources on a system. If we have a framework with a hub and workers, it does not make sense to hard code worker types. A worker needs to dynamically discovery different kinds of providers in an environment, whether that provider is a workload manager or software package manager. Specifically, a worker needs to come up and do two things:

  • discover hardware, and other cluster environment that does not change (or changes slowly)
  • discover providers (either managers, software, or other providers of resources that will change state). For example, this is flux, slurm with queues, environment modules, etc.

There are no rules about what a cluster is allowed to have. If a cluster is found to have flux AND slurm that is entirely valid! The interaction for job negotiation proceeds as before. But instead of a single hard coded ask to secretary we have an interaction where the prompt still comes in with a specific resource request and policy, however the secretary agent needs a way to query its providers, or ask questions.

Tools (classes) for discovery

The insight that I had is that these are different classes, and the classes need to work akin to mcp servers that provide tools, but the tools are functions. So for example this structure:

resource_secretary/providers/
├── container
│   ├── charliecloud.py
│   ├── oci.py          # includes docker and podman
│   ├── shifter.py
│   └── singularity.py  # includes podman and singularity
├── hardware
│   ├── amd.py
│   ├── cpu.py
│   ├── gpu.py
│   ├── memory.py
│   └── nvidia.py
├── network
│   ├── ethernet.py
│   ├── infiniband.py
│   ├── network.py
│   └── omnipath.py
├── parallel
│   ├── mpich.py
│   ├── openmpi.py
│   └── spectrum.py
├── provider.py
├── software
│   ├── conda.py
│   ├── modules.py      # includes lmod and environment modules
│   └── spack.py
├── storage
│   ├── beegfs.py
│   ├── local.py
│   ├── lustre.py
│   ├── nfs.py
│   └── storage.py
└── workload
    ├── cobalt.py
    ├── flux.py
    ├── kubernetes.py
    ├── moab.py
    ├── oar.py
    ├── pbs.py
    ├── slurm.py
    ├── torque.py
    └── workload.py

We need to automatically detect all providers as type "software" or "workload" based on their base class, BaseProvider. Each provider has a probe function that will return True/False if the provider exists. The secretary will only keep instances for those that return true on startup. Each provider has what you'd expect - different tools (functions) along with metadata. The cool trick is that the base class exposes the functions for the agent like with MCP - but instead of some list I add @secretary_tool

Usage

This library will be used by agents and secretaries. You can also run it locally to detect or list providers.

Providers

$ resource-secretary providers
╭─────────────────────────────────────────╮
│ 🦊 Resource Secretary: Provider Catalog │
╰─────────────────────────────────────────╯
                                               Available Resource Providers
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Category  ┃ Name          ┃ Active ┃ Description                                                                       ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ WORKLOAD  │ COBALT        │   NO   │ Handles discovery for the Cobalt resource manager (commonly used at ALCF).        │
│ WORKLOAD  │ FLUX          │   NO   │ The Flux provider interacts with the Flux Framework using native Python bindings. │
│ WORKLOAD  │ KUBERNETES    │  YES   │ Manages interaction with Kubernetes clusters.                                     │
│ WORKLOAD  │ MOAB          │   NO   │ Handles discovery for the Moab cluster scheduler.                                 │
│ WORKLOAD  │ OAR           │   NO   │ Handles discovery for the OAR resource manager.                                   │
│ WORKLOAD  │ PBS           │   NO   │ Handles discovery and status for OpenPBS and PBS Pro.                             │
│ WORKLOAD  │ SLURM         │   NO   │ The Slurm provider manages interaction with the Slurm Workload Manager.           │
│ WORKLOAD  │ TORQUE        │   NO   │ Handles discovery for the Torque resource manager.                                │
│ SOFTWARE  │ MODULES       │   NO   │ Handles Environment Modules (Lmod or TCL).                                        │
│ SOFTWARE  │ SPACK         │   NO   │ The Spack provider handles software environment discovery and package lookups.    │
│ CONTAINER │ CHARLIECLOUD  │   NO   │ No description provided.                                                          │
│ CONTAINER │ SHIFTER       │   NO   │ No description provided.                                                          │
│ CONTAINER │ APPTAINER     │  YES   │ Provider for the Apptainer container runtime.                                     │
│ CONTAINER │ SINGULARITY   │  YES   │ Provider for the Singularity container runtime.                                   │
│ STORAGE   │ BEEGFS        │   NO   │ Handles discovery and status for BeeGFS parallel filesystems.                     │
│ STORAGE   │ LOCAL-SCRATCH │   NO   │ Identifies high-speed local filesystems (XFS, ZFS, BTRFS) used for local scratch. │
│ STORAGE   │ LUSTRE        │   NO   │ Handles discovery and status for Lustre parallel filesystems.                     │
│ STORAGE   │ NETWORK-FS    │   NO   │ Handles discovery for standard network filesystems (NFS, CIFS).                   │
│ NETWORK   │ ETHERNET      │  YES   │ Handles discovery and status for standard Ethernet interfaces.                    │
│ NETWORK   │ INFINIBAND    │   NO   │ Handles discovery and status for InfiniBand and RDMA fabrics.                     │
│ NETWORK   │ OMNI-PATH     │   NO   │ Handles discovery and status for Intel Omni-Path (OPA) fabrics.                   │
│ HARDWARE  │ AMD-GPU       │   NO   │ Handles discovery and status for AMD GPU accelerators (ROCm).                     │
│ HARDWARE  │ CPU           │  YES   │ Handles discovery of CPU architecture, core counts, and instruction sets.         │
│ HARDWARE  │ MEMORY        │  YES   │ Handles discovery of system memory (RAM).                                         │
│ HARDWARE  │ NVIDIA-GPU    │   NO   │ Handles discovery and status for NVIDIA GPU accelerators.                         │
│ PARALLEL  │ MPICH         │   NO   │ Specialized provider for MPICH implementations.                                   │
│ PARALLEL  │ OPENMPI       │   NO   │ Specialized provider for OpenMPI implementations.                                 │
│ PARALLEL  │ SPECTRUM-MPI  │   NO   │ Specialized provider for IBM Spectrum MPI.                                        │
└───────────┴───────────────┴────────┴───────────────────────────────────────────────────────────────────────────────────┘
Active = YES indicates the resource was discovered on your local system.

Detect

# Run detection for all types and interfaces
$ resource-secretary detect

# Run detection for all containers
$ resource-secretary detect container

# Just detect for singularity
╭────────────────────────────────────────────────╮
│ Resource Secretary - System Detect (Container) │
╰────────────────────────────────────────────────╯
                    Provider Manifest
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Category  ┃ Provider    ┃ Metadata (Static)           ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ CONTAINER │ SINGULARITY │ {                           │
│           │             │   "runtime": "singularity", │
│           │             │   "version": "4.2.1-noble", │
│           │             │   "cache_dir": "default"    │
│           │             │ }                           │
└───────────┴─────────────┴─────────────────────────────┘

Tool Discovery (Agent Visibility)
 • singularity: list_cache

License

HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.

See LICENSE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (MIT)

LLNL-CODE- 842614

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

resource_secretary-0.0.1.tar.gz (88.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

resource_secretary-0.0.1-py3-none-any.whl (131.5 kB view details)

Uploaded Python 3

File details

Details for the file resource_secretary-0.0.1.tar.gz.

File metadata

  • Download URL: resource_secretary-0.0.1.tar.gz
  • Upload date:
  • Size: 88.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for resource_secretary-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f353744ddf1a3275334b79721cefcfd9e2e72192002c8d68bf1dccd5c7a69d7b
MD5 4b66b61548c6af37349a7783275a5640
BLAKE2b-256 d5a8419fa41b2393f7c3af0cda8802cbb38f023c84be4458e9e33b928f372f4a

See more details on using hashes here.

File details

Details for the file resource_secretary-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for resource_secretary-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dbec1cd43a1a9a4adae85a775c592fa47a34f18268578b186c953a01e999e00b
MD5 1586adfa2f1f8c6c562f6d64bb1637b5
BLAKE2b-256 8f606b2ac6fb46aab235375f4865f79dd70c802de235fc3d8055e6e8c95e460c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page