Skip to main content

Process mapping for Flux jobs

Project description

fluxbind

Intelligent detection and mapping of processors for HPC

PyPI version

Run

Use fluxbind to run a job binding to specific cores. For flux, this means we require exclusive, and then for each node customize the binding exactly as we want it. We do this via a shape file.

Basic Examples

# Start with a first match policy
flux start --config ./examples/config/match-first.toml

# 1. Bind each task to a unique physical core, starting from core:0 (common case)
fluxbind run --shape ./examples/shape/1node/shape_packed_cores.yaml sleep 1
# Rank 0: Binds to core:0 (cpuset 0x3).
# Rank 1: Binds to core:1 (cpuset 0xc). etc

# 2. Packed PUs (hyperthreading) bind each task to a unique logical CPU (or hyper-thread).
fluxbind run --shape ./examples/shape/1node/hyper_threading.yaml sleep 1

# 3. An unbound rank - this tests "unbound" to leave Rank 0 unbound, pack all other ranks onto cores, shifted by one.
fluxbind run -N1 -n 3 --shape ./examples/shape/1node/unbound_rank.yaml sleep 1

# 4. L2 cache affinity. Give each task its own dedicated L2 cache to maximize cache performance.
# On mymachine, each core has its own private L2 cache.
# Therefore, binding one task per L2 cache is equivalent to binding one task per core.
fluxbind run -N1 -n 8 --shape ./examples/shape/1node/cache_affinity.yaml sleep 1

Kripke Examples

As we prepare to test with apps, here are some tests I'm thinking of doing.

# baseline - pack each MPI rank onto its own dedicated physical core (8.693519e-09)
fluxbind run -N 1 -n 8 --shape ./examples/shape/kripke/baseline-shapefile.yaml kripke --procs 2,2,2 --zones 16,16,16 --niter 500

# spread cores (memory bandwidth) If Kripke is limited by memory bandwidth, if we place ranks on every other core, we reduce contention for the shared L3 cache
# If Kripke memory bound, this layout might be faster than packed even with half cores. If compute based, worse (1.341355e-08)
fluxbind run -N 1 -n 4 --shape ./examples/shape/kripke/memory-spread-cores-shapefile.yaml kripke --procs 2,2,1 --zones 16,16,16 --niter 500

# problem: we can't override flux and ask for 16 tasks
# packed pus (each of 8 cores has 2 pu == 16). We are testing if Kripke can benefit from SMT (simultaneous multi-threading)
# Maybe better for compute-heavy?
fluxbind run -N 1 -n 16 --shape ./examples/shape/kripke/packed-pus-shapefile.yaml kripke --procs 2,4,2 --zones 16,16,16 --niter 500

# hybrid model: launch just two MPI ranks and give each one a whole L3 cache domain to work with (1.966967e-08)
fluxbind run -N 1 -n 2 --env OMP_NUM_THREADS=4 --env OMP_PLACES=cores --shape ./examples/shape/kripke/hybrid-l3-shapefile.yaml kripke --zones 16,16,16 --niter 500 --procs 2,1,1 --layout GZD

Predict

Use fluxbind to predict binding based on a job shape. This is prediction only, meaning there is no execution of an application or similar. Here are some examples.

# Predict binding on this machine for 8 cores
fluxbind predict core:0-7

# Predict binding on corona (based on xml) for 2 NUMA nodes
fluxbind predict --xml ./examples/topology/corona.xml numa:0,1 x core:0-2

License

DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.

See LICENSE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (MIT)

LLNL-CODE- 842614

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fluxbind-0.0.0.tar.gz (25.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fluxbind-0.0.0-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file fluxbind-0.0.0.tar.gz.

File metadata

  • Download URL: fluxbind-0.0.0.tar.gz
  • Upload date:
  • Size: 25.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for fluxbind-0.0.0.tar.gz
Algorithm Hash digest
SHA256 a38bad18afbc652587c8afd28c57423d073cf0bd4a58656719103e1782aa589e
MD5 9479a10c0f3a531f2d58fbb48c84ec57
BLAKE2b-256 e5615ee989b12370779af2ba7e2aaf3574e26443f8cfb8ea34afe38dd81236d0

See more details on using hashes here.

File details

Details for the file fluxbind-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: fluxbind-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 27.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for fluxbind-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dee448406c2a1b15a31842e984ca4b86007622422d056c65daadd9155ab42253
MD5 23ea85e7eb1a8ea2d5680be9bb65cd74
BLAKE2b-256 0daf286ef62eb4e2af51352faff3cd9465db31c5d5de7ef11c4e00a3c690dd69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page