Python gRPC functions for the Rainbow Scheduler
Project description
rainbow (python)
🌈️ Where keebler elves and schedulers live, somewhere in the clouds, and with marshmallows
This is the rainbow scheduler prototype, specifically Python bindings for a gRPC client. To learn more about rainbow, visit https://github.com/converged-computing/rainbow.
Example
Assuming that you can run the server with Go, let's first do that (e.g., from the root of the repository linked above, and soon we will provide a container):
Register
make server
go run cmd/server/server.go
2024/02/12 19:38:58 creating 🌈️ server...
2024/02/12 19:38:58 ✨️ creating rainbow.db...
2024/02/12 19:38:58 rainbow.db file created
2024/02/12 19:38:58 create cluster table...
2024/02/12 19:38:58 cluster table created
2024/02/12 19:38:58 create jobs table...
2024/02/12 19:38:58 jobs table created
2024/02/12 19:38:58 starting scheduler server: rainbow v0.1.0-draft
2024/02/12 19:38:58 server listening: [::]:50051
And then let's do a registration, but this time from the Python bindings (client) here! We will use the core bindings in rainbow/client.py but run a custom command from examples. Assuming you've installed everything into a venv:
python -m venv env
source env/bin/activate
pip install -e .
The command below will register and save the secret to a new configuration file. Note that if you provide an existing one, it will use or update it.
python ./examples/flux/register.py keebler --config-path ./rainbow-config.yaml
Saving rainbow config to ./rainbow-config.yaml
🤫️ The token you will need to submit jobs to this cluster is rainbow
🔐️ The secret you will need to accept jobs is 649598a9-e77b-4aa3-ab46-bfbbc5e2d606
Try running it again - you can't register a cluster twice. But of course other cluster names you can register. A "cluster" can actually be a cluster, or a flux instance, or any entity that can accept jobs. The script also accepts arguments (see register.py --help
)
python ./examples/flux/register.py --help
🌈️ Rainbow scheduler register
options:
-h, --help show this help message and exit
--cluster CLUSTER cluster name to register
--host HOST host of rainbow cluster
--secret SECRET Rainbow cluster registration secret
--config-path CONFIG_PATH
Path to rainbow configuration file to write or use
--cluster-nodes CLUSTER_NODES
Nodes to provide for registration
Register Subsystem
Let's now register the subsystem. Akin to register, this has the path to the subsystem nodes set as a default,
and the name --subsystem
set to "io." This assumes you've registered your cluster and have the cluster secret
in your ./rainbow-config.yaml
python ./examples/flux/register-subsystem.py keebler --config-path ./rainbow-config.yaml
status: REGISTER_SUCCESS
In the server window you'll see the subsystem added:
...
2024/03/09 14:21:50 📝️ received subsystem register: keebler
2024/03/09 14:21:50 Preparing to load 6 nodes and 30 edges
2024/03/09 14:21:50 We have made an in memory graph (subsystem io) with 7 vertices, with 15 connections to the dominant!
{
"keebler": {
"Name": "keebler",
"Counts": {
"io": 1,
"mtl1unit": 1,
"mtl2unit": 1,
"mtl3unit": 1,
"nvme": 1,
"shm": 1
}
}
}
Update State
While we likely will have clusters sending back state when they accept jobs, for now we have a separate endpoint to do a one-off request to update the state. You can test that here.
python ./examples/flux/update-state.py keebler --config-path ./rainbow-config.yaml
status: UPDATE_STATE_SUCCESS
In the server terminal (depending on your level of logging) you'll see the state update.
2024/04/05 18:45:16 We have made an in memory graph (subsystem io) with 7 vertices, with 15 connections to the dominant!
Metrics for subsystem io{
"io": 1,
"mtl1unit": 1,
"mtl2unit": 1,
"mtl3unit": 1,
"nvme": 1,
"shm": 1
}
2024/04/05 18:47:18 📝️ received state update: keebler
Updating state cost-per-node to 12
Updating state max-jobs to 100
Note that the path to the state metadata file is provided as a default to make the demo simple. This state metadata will be provided to the selection algorithm to use as needed to make choice for a final cluster.
Submit Job (Simple)
Now let's submit a job to our faux cluster. We need to provide the token we received above. Remember that this is a two stage process:
- Query the graph database for one or more cluster matches.
- Send that request to rainbow.
The client handles both, so you (as the user) only are exposed to the single submit. We will be providing basic arguments for the job, but note you can provide other arguments too:
python ./examples/flux/submit-job.py --help
🌈️ Rainbow scheduler submit
positional arguments:
command Command to submit
options:
-h, --help show this help message and exit
--config-path CONFIG_PATH
config path with cluster names
--host HOST host of rainbow cluster
--token TOKEN Cluster token for permission to submit jobs
--nodes NODES Nodes for job (defaults to 1)
And then submit! Remember that you need to have registered first. Note that we need to provide our cluster config path.
$ python examples/flux/submit-job.py --config-path ./rainbow-config.yaml --nodes 1 echo hello world
```bash
```console
{
"version": 1,
"resources": [
{
"type": "node",
"count": 1,
"with": [
{
"type": "slot",
"count": 1,
"label": "echo",
"with": [
{
"type": "core",
"count": 1
}
]
}
]
}
],
"tasks": [
{
"command": [
"echo",
"hello",
"world"
],
"slot": "echo",
"count": {
"per_slot": 1
}
}
],
"attributes": {}
}
clusters: "keebler"
status: RESULT_TYPE_SUCCESS
status: SUBMIT_SUCCESS
Submit Jobspec
We can also submit a jobspec directly, which is an advanced use case. It works predominantly the same, except we load in the Jobspec from the yaml directly.
python examples/flux/submit-jobspec.py --config-path ./rainbow-config.yaml ../../docs/examples/scheduler/jobspec-io.yaml
🌈️ Rainbow scheduler submit
positional arguments:
jobspec Jobspec path to submit
options:
-h, --help show this help message and exit
--config-path CONFIG_PATH
config path with cluster metadata
It largely looks the same - I'll cut most of it out. It's just a different entry point for the job definition.
clusters: "keebler"
status: RESULT_TYPE_SUCCESS
status: SUBMIT_SUCCESS
Receive Jobs
After we submit jobs, rainbow assigns them to a cluster. For this dummy example we are assigning to the same cluster (keebler) so we can also use our host "keebler" to receive the job. Here is what that looks like.
python ./examples/flux/receive-jobs.py --help
🌈️ Rainbow scheduler receive jobs
options:
-h, --help show this help message and exit
--max-jobs MAX_JOBS Maximum jobs to request (unset defaults to all)
--config-path CONFIG_PATH
config path with cluster metadata
And then request and accept jobs:
python examples/flux/receive-jobs.py --config-path ./rainbow-config.yaml
Status: REQUEST_JOBS_SUCCESS
Received 1 jobs to accept...
If this were running in Flux, we would be able to run it, and the response above has told rainbow that you've accepted it (and rainbow deletes the record of it).
License
HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.
See LICENSE, COPYRIGHT, and NOTICE for details.
SPDX-License-Identifier: (MIT)
LLNL-CODE- 842614
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file rainbow-scheduler-0.0.16.tar.gz
.
File metadata
- Download URL: rainbow-scheduler-0.0.16.tar.gz
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f0aed3aa5f12316857f819b6a4576bd9789bf70405d8a027a0c57931236442a |
|
MD5 | 8d550303ecb96ba8003363f2f4374fd6 |
|
BLAKE2b-256 | 5a75db05abebbf9f1c2f5376cc23b612995664e3302f6c5b939852346743ec9e |
File details
Details for the file rainbow_scheduler-0.0.16-py3-none-any.whl
.
File metadata
- Download URL: rainbow_scheduler-0.0.16-py3-none-any.whl
- Upload date:
- Size: 27.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 651894c0c9314d779d5aed800bd6cb00191c836a4698289d662373a543252e48 |
|
MD5 | 8df807246d9df115c4b6244802c60665 |
|
BLAKE2b-256 | c06187a243027e31b2b0e71327f68dcfc7a0a0e94a24f782e27067c3e27f074f |