Skip to main content

Python gRPC functions for the Rainbow Scheduler

Project description

rainbow (python)

🌈️ Where keebler elves and schedulers live, somewhere in the clouds, and with marshmallows

https://github.com/converged-computing/rainbow/raw/main/docs/img/rainbow.png

This is the rainbow scheduler prototype, specifically Python bindings for a gRPC client. To learn more about rainbow, visit https://github.com/converged-computing/rainbow.

Example

Assuming that you can run the server with Go, let's first do that (e.g., from the root of the repository linked above, and soon we will provide a container):

Register

make server
go run cmd/server/server.go
2024/02/12 19:38:58 creating 🌈️ server...
2024/02/12 19:38:58 ✨️ creating rainbow.db...
2024/02/12 19:38:58    rainbow.db file created
2024/02/12 19:38:58    create cluster table...
2024/02/12 19:38:58    cluster table created
2024/02/12 19:38:58    create jobs table...
2024/02/12 19:38:58    jobs table created
2024/02/12 19:38:58 starting scheduler server: rainbow v0.1.0-draft
2024/02/12 19:38:58 server listening: [::]:50051

And then let's do a registration, but this time from the Python bindings (client) here! We will use the core bindings in rainbow/client.py but run a custom command from examples. Assuming you've installed everything into a venv:

python -m venv env
source env/bin/activate
pip install -e .

The command below will register and save the secret to a new configuration file. Note that if you provide an existing one, it will use or update it.

python ./examples/flux/register.py keebler --config-path ./rainbow-config.yaml
Saving rainbow config to ./rainbow-config.yaml
🤫️ The token you will need to submit jobs to this cluster is rainbow
🔐️ The secret you will need to accept jobs is 649598a9-e77b-4aa3-ab46-bfbbc5e2d606

Try running it again - you can't register a cluster twice. But of course other cluster names you can register. A "cluster" can actually be a cluster, or a flux instance, or any entity that can accept jobs. The script also accepts arguments (see register.py --help)

python ./examples/flux/register.py --help

🌈️ Rainbow scheduler register

options:
  -h, --help            show this help message and exit
  --cluster CLUSTER     cluster name to register
  --host HOST           host of rainbow cluster
  --secret SECRET       Rainbow cluster registration secret
  --config-path CONFIG_PATH
                        Path to rainbow configuration file to write or use
  --cluster-nodes CLUSTER_NODES
                        Nodes to provide for registration

Register Subsystem

Let's now register the subsystem. Akin to register, this has the path to the subsystem nodes set as a default, and the name --subsystem set to "io." This assumes you've registered your cluster and have the cluster secret in your ./rainbow-config.yaml

python ./examples/flux/register-subsystem.py keebler --config-path ./rainbow-config.yaml
status: REGISTER_SUCCESS

In the server window you'll see the subsystem added:

...
2024/03/09 14:21:50 📝️ received subsystem register: keebler
2024/03/09 14:21:50 Preparing to load 6 nodes and 30 edges
2024/03/09 14:21:50 We have made an in memory graph (subsystem io) with 7 vertices, with 15 connections to the dominant!
{
 "keebler": {
  "Name": "keebler",
  "Counts": {
   "io": 1,
   "mtl1unit": 1,
   "mtl2unit": 1,
   "mtl3unit": 1,
   "nvme": 1,
   "shm": 1
  }
 }
}

Update State

While we likely will have clusters sending back state when they accept jobs, for now we have a separate endpoint to do a one-off request to update the state. You can test that here.

python ./examples/flux/update-state.py keebler --config-path ./rainbow-config.yaml
status: UPDATE_STATE_SUCCESS

In the server terminal (depending on your level of logging) you'll see the state update.

2024/04/05 18:45:16 We have made an in memory graph (subsystem io) with 7 vertices, with 15 connections to the dominant!
Metrics for subsystem io{
 "io": 1,
 "mtl1unit": 1,
 "mtl2unit": 1,
 "mtl3unit": 1,
 "nvme": 1,
 "shm": 1
}
2024/04/05 18:47:18 📝️ received state update: keebler
Updating state cost-per-node to 12
Updating state max-jobs to 100

Note that the path to the state metadata file is provided as a default to make the demo simple. This state metadata will be provided to the selection algorithm to use as needed to make choice for a final cluster.

Submit Job (Simple)

Now let's submit a job to our faux cluster. We need to provide the token we received above. Remember that this is a two stage process:

  1. Query the graph database for one or more cluster matches.
  2. Send that request to rainbow.

The client handles both, so you (as the user) only are exposed to the single submit. We will be providing basic arguments for the job, but note you can provide other arguments too:

python ./examples/flux/submit-job.py --help

🌈️ Rainbow scheduler submit

positional arguments:
  command               Command to submit

options:
  -h, --help            show this help message and exit
  --config-path CONFIG_PATH
                        config path with cluster names
  --host HOST           host of rainbow cluster
  --token TOKEN         Cluster token for permission to submit jobs
  --nodes NODES         Nodes for job (defaults to 1)

And then submit! Remember that you need to have registered first. Note that we need to provide our cluster config path.

$ python examples/flux/submit-job.py --config-path ./rainbow-config.yaml --nodes 1 echo hello world
```bash
```console
{
    "version": 1,
    "resources": [
        {
            "type": "node",
            "count": 1,
            "with": [
                {
                    "type": "slot",
                    "count": 1,
                    "label": "echo",
                    "with": [
                        {
                            "type": "core",
                            "count": 1
                        }
                    ]
                }
            ]
        }
    ],
    "tasks": [
        {
            "command": [
                "echo",
                "hello",
                "world"
            ],
            "slot": "echo",
            "count": {
                "per_slot": 1
            }
        }
    ],
    "attributes": {}
}
clusters: "keebler"
status: RESULT_TYPE_SUCCESS

status: SUBMIT_SUCCESS

Submit Jobspec

We can also submit a jobspec directly, which is an advanced use case. It works predominantly the same, except we load in the Jobspec from the yaml directly.

python examples/flux/submit-jobspec.py --config-path ./rainbow-config.yaml ../../docs/examples/scheduler/jobspec-io.yaml

🌈️ Rainbow scheduler submit

positional arguments:
  jobspec               Jobspec path to submit

options:
  -h, --help            show this help message and exit
  --config-path CONFIG_PATH
                        config path with cluster metadata

It largely looks the same - I'll cut most of it out. It's just a different entry point for the job definition.

clusters: "keebler"
status: RESULT_TYPE_SUCCESS

status: SUBMIT_SUCCESS

Receive Jobs

After we submit jobs, rainbow assigns them to a cluster. For this dummy example we are assigning to the same cluster (keebler) so we can also use our host "keebler" to receive the job. Here is what that looks like.

python ./examples/flux/receive-jobs.py --help

🌈️ Rainbow scheduler receive jobs

options:
  -h, --help            show this help message and exit
  --max-jobs MAX_JOBS   Maximum jobs to request (unset defaults to all)
  --config-path CONFIG_PATH
                        config path with cluster metadata

And then request and accept jobs:

python examples/flux/receive-jobs.py --config-path ./rainbow-config.yaml
Status: REQUEST_JOBS_SUCCESS
Received 1 jobs to accept...

If this were running in Flux, we would be able to run it, and the response above has told rainbow that you've accepted it (and rainbow deletes the record of it).

License

HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.

See LICENSE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (MIT)

LLNL-CODE- 842614

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rainbow-scheduler-0.0.16.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

rainbow_scheduler-0.0.16-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file rainbow-scheduler-0.0.16.tar.gz.

File metadata

  • Download URL: rainbow-scheduler-0.0.16.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for rainbow-scheduler-0.0.16.tar.gz
Algorithm Hash digest
SHA256 4f0aed3aa5f12316857f819b6a4576bd9789bf70405d8a027a0c57931236442a
MD5 8d550303ecb96ba8003363f2f4374fd6
BLAKE2b-256 5a75db05abebbf9f1c2f5376cc23b612995664e3302f6c5b939852346743ec9e

See more details on using hashes here.

File details

Details for the file rainbow_scheduler-0.0.16-py3-none-any.whl.

File metadata

File hashes

Hashes for rainbow_scheduler-0.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 651894c0c9314d779d5aed800bd6cb00191c836a4698289d662373a543252e48
MD5 8df807246d9df115c4b6244802c60665
BLAKE2b-256 c06187a243027e31b2b0e71327f68dcfc7a0a0e94a24f782e27067c3e27f074f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page