Instance tree generation for organization or higher throughput submission
Project description
Flux Hierarchy
Create trees of Flux instances
🚧 under development and experimental 🚧
This tool enables generation and orchestration of Flux hierarchies, or trees of instances. Such a setup can enable programmatic organization and submission of commands, or high throughput. Use cases we want to address:
- Creation (and organization) of a Flux Hierarchy
- Discovery of an existing Flux Hierarchy (e.g, for MCP)
Usage
Let's first create a hierarchy. This will be a Flux job. You'll need to be in a Flux instance where a handle is discoverable. E.g., in the DevContainer:
flux start
Then create a simple, flat hierarchy with all the resources allocated to one broker.
flux-hierarchy start ./examples/hierarchy-one.yaml
You can test throughput (this also starts the hierarchy):
flux-hierarchy throughput ./examples/hierarchy-one.yaml
For either of the above, the hierarchy will continue running (and you need to cancel the job).
flux cancel $(flux job last)
You can also view the shape of the hierarchy without running anything:
flux-hierarchy view ./examples/hierarchy-one.yaml
$ flux-hierarchy view ./examples/corona/hierarchy-2.yaml
=>
🌿 Leaf Broker Workers...{}
level1 [Nodes: 2]
├── level2 [Nodes: 1, Cores: 48]
└── level2 [Nodes: 1, Cores: 48]
To get higher throughput, we need to remove the need for using ssh, and from the root to workers. Instead, we launch the multiprocessing bulk runners on the level of nodes, and they are assigned to the local (local://) sockets on the node instead of ssh (ssh://). This can be done by just adding the --local flag. It seems to make a huge difference!
flux-hierarchy throughput --local --njobs 1000000 ./examples/corona/hierarchy-core.yaml
=> Waiting for 96 leaf brokers...
=> Connected!
Preparing throughput test for command: true
Distributing work to 2 nodes...
Waiting for workers...
flux cancel f4gdJDdyf5
--- Throughput Results ---
number of jobs: 1000000 (on 96 workers)
submit time: 13.347s (74924.4 job/s)
script runtime: 6.685 s
job runtime: 3.706 s
throughput: 269859.1 job/s (script: 149592.4 job/s)
Development
To build and release:
python3 -m build
# or
python3 setup.py sdist bdist_wheel
twine upload dist/flux-hierarchy-<version>*
WIP / TODO / Would be nice
- I can't remember command to get
<host>:<rank>mapping (I came up with something) - Use kvs for uris, saving results, etc. instead of the local dir.
- Have local throughput wait for results not rely on filesystem results (use job wait)
- Some means to deploy submit to node as a service on the node (that knows about URIs)
- Save result to kvs or similar (not filesystem)
- Should be able to read in directory of active sockets to generate tree
- Allow different job shapes / specs.
- Expose simulation duration time
- Expose other resource params
License
HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.
See LICENSE, COPYRIGHT, and NOTICE for details.
SPDX-License-Identifier: (MIT)
LLNL-CODE- 842614
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flux_hierarchy-0.0.12.tar.gz.
File metadata
- Download URL: flux_hierarchy-0.0.12.tar.gz
- Upload date:
- Size: 24.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a5e6bba6755907928b6a67e24d4cd0c7633f835b6173c3fff0d3742f493e2a8
|
|
| MD5 |
b9221f828936f6aae2f82e163e2a2613
|
|
| BLAKE2b-256 |
1c71a1288842f8bd845b3a18025c51d9df88657ece9913bf7ae13af47bdba7be
|
File details
Details for the file flux_hierarchy-0.0.12-py3-none-any.whl.
File metadata
- Download URL: flux_hierarchy-0.0.12-py3-none-any.whl
- Upload date:
- Size: 25.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4194cb94229fcc020016795b1b917ab83436f5e759dcdf6e7462c299bf71634b
|
|
| MD5 |
963bfded59187c34ee28176975f2c3e0
|
|
| BLAKE2b-256 |
6777775b00b806cfc84a495b61189902169875e76748e5ce321ae45a6a36d526
|