Skip to main content

Instance tree generation for organization or higher throughput submission

Project description

Flux Hierarchy

Create trees of Flux instances

🚧 under development and experimental 🚧

PyPI version

https://github.com/converged-computing/flux-hierarchy/blob/main/img/flux-hierarchy-small.png?raw=true

This tool enables generation and orchestration of Flux hierarchies, or trees of instances. Such a setup can enable programmatic organization and submission of commands, or high throughput. Use cases we want to address:

  • Creation (and organization) of a Flux Hierarchy
  • Discovery of an existing Flux Hierarchy (e.g, for MCP)

Usage

Let's first create a hierarchy. This will be a Flux job. You'll need to be in a Flux instance where a handle is discoverable. E.g., in the DevContainer:

flux start

Then create a simple, flat hierarchy with all the resources allocated to one broker.

flux-hierarchy start ./examples/hierarchy-one.yaml

You can test throughput (this also starts the hierarchy):

flux-hierarchy throughput ./examples/hierarchy-one.yaml

For either of the above, the hierarchy will continue running (and you need to cancel the job).

flux cancel $(flux job last)

You can also view the shape of the hierarchy without running anything:

flux-hierarchy view ./examples/hierarchy-one.yaml
$ flux-hierarchy view ./examples/corona/hierarchy-2.yaml
=>
🌿 Leaf Broker Workers...{}
level1 [Nodes: 2]
    ├── level2 [Nodes: 1, Cores: 48]
    └── level2 [Nodes: 1, Cores: 48]

To get higher throughput, we need to remove the need for using ssh, and from the root to workers. Instead, we launch the multiprocessing bulk runners on the level of nodes, and they are assigned to the local (local://) sockets on the node instead of ssh (ssh://). This can be done by just adding the --local flag. It seems to make a huge difference!

flux-hierarchy throughput --local --njobs 1000000 ./examples/corona/hierarchy-core.yaml
=> Waiting for 96 leaf brokers...
=> Connected!
Preparing throughput test for command: true
Distributing work to 2 nodes...
Waiting for workers...
flux cancel f4gdJDdyf5

--- Throughput Results ---
number of jobs: 1000000 (on 96 workers)
   submit time: 13.347s (74924.4 job/s)
script runtime: 6.685 s
   job runtime: 3.706 s
    throughput: 269859.1 job/s (script: 149592.4 job/s)

Development

To build and release:

python3 -m build
# or
python3 setup.py sdist bdist_wheel

twine upload dist/flux-hierarchy-<version>*

WIP / TODO / Would be nice

  • I can't remember command to get <host>:<rank> mapping (I came up with something)
  • Use kvs for uris, saving results, etc. instead of the local dir.
  • Have local throughput wait for results not rely on filesystem results (use job wait)
  • Some means to deploy submit to node as a service on the node (that knows about URIs)
  • Save result to kvs or similar (not filesystem)
  • Should be able to read in directory of active sockets to generate tree
  • Allow different job shapes / specs.
  • Expose simulation duration time
  • Expose other resource params

License

HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.

See LICENSE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (MIT)

LLNL-CODE- 842614

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flux_hierarchy-0.0.12.tar.gz (24.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flux_hierarchy-0.0.12-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file flux_hierarchy-0.0.12.tar.gz.

File metadata

  • Download URL: flux_hierarchy-0.0.12.tar.gz
  • Upload date:
  • Size: 24.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for flux_hierarchy-0.0.12.tar.gz
Algorithm Hash digest
SHA256 5a5e6bba6755907928b6a67e24d4cd0c7633f835b6173c3fff0d3742f493e2a8
MD5 b9221f828936f6aae2f82e163e2a2613
BLAKE2b-256 1c71a1288842f8bd845b3a18025c51d9df88657ece9913bf7ae13af47bdba7be

See more details on using hashes here.

File details

Details for the file flux_hierarchy-0.0.12-py3-none-any.whl.

File metadata

File hashes

Hashes for flux_hierarchy-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 4194cb94229fcc020016795b1b917ab83436f5e759dcdf6e7462c299bf71634b
MD5 963bfded59187c34ee28176975f2c3e0
BLAKE2b-256 6777775b00b806cfc84a495b61189902169875e76748e5ce321ae45a6a36d526

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page