HPC-tuned Dask helpers for single-node runs on NCI Gadi.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Project description

dask_setup

HPC-tuned Dask helpers for NCI Gadi and other PBS/SLURM systems. Wraps dask.distributed.LocalCluster + Client with sensible defaults for single-node jobs, and extends to multi-node PBS/SLURM clusters via dask-jobqueue in v2.0.

Python 3.11+ | pip install dask-setup

Quick Start

from dask_setup import setup_dask_client

# Pick a workload type and go
client, cluster, dask_tmp = setup_dask_client("cpu")   # heavy compute
client, cluster, dask_tmp = setup_dask_client("io")    # heavy file I/O
client, cluster, dask_tmp = setup_dask_client("mixed") # both

dask_tmp is the path to the spill/temp directory (on $PBS_JOBFS if available). Pass it to Rechunker, Zarr, or anywhere else you want fast local I/O.

For more control, pass a DaskSetupConfig object or a named profile:

from dask_setup import setup_dask_client, DaskSetupConfig

config = DaskSetupConfig(
    workload_type="cpu",
    max_workers=8,
    reserve_mem_gb=32.0,
    spill_compression="lz4",
)
client, cluster, dask_tmp = setup_dask_client(config=config)

# Or use a built-in profile
client, cluster, dask_tmp = setup_dask_client(profile="climate_analysis")

Multi-node (v2.0):

from dask_setup import setup_dask_client, MultiNodeConfig

client, cluster, shared_tmp = setup_dask_client(
    mode="auto",   # detects PBS_JOBID / SLURM_JOB_ID, falls back to local
    multi_node_config=MultiNodeConfig(
        workers_per_node=4,
        cores_per_worker=12,
        mem_per_worker_gb=32.0,
        walltime="04:00:00",
    ),
)

Workload Types

Type	Topology	Best for
`"cpu"`	Many processes, 1 thread each	NumPy/Numba math, xarray reductions
`"io"`	1 process, 8–16 threads	Opening many NetCDF/Zarr files concurrently
`"mixed"`	Processes with 2 threads each	Pipelines that both read and compute
`"gpu"`	1 process per GPU, up to 8 threads	CuPy/RAPIDS CUDA workloads
`"auto"`	Inferred from dataset	Let `dask_setup` decide based on your data

Key Parameters

Parameter	Default	Description
`workload_type`	`"io"`	Worker topology: `"cpu"`, `"io"`, `"mixed"`, `"gpu"`, `"auto"`
`max_workers`	all cores	Hard cap on worker count
`reserve_mem_gb`	auto (20% RAM)	Memory held back for OS/cache (GiB)
`max_mem_gb`	total RAM	Upper bound on Dask's total memory use
`dashboard`	`True`	Start dashboard and print SSH tunnel hint
`profile`	`None`	Named config profile
`config`	`None`	Pre-built `DaskSetupConfig` object — mutually exclusive with `profile`
`mode`	`"auto"`	`"local"`, `"pbs"`, `"slurm"`, or `"auto"` (v2.0)
`multi_node_config`	`None`	`MultiNodeConfig` for PBS/SLURM multi-node jobs (v2.0)

Common Patterns

Big xarray reductions (CPU-bound)

client, cluster, dask_tmp = setup_dask_client("cpu", reserve_mem_gb=60)
ds = ds.chunk({"time": 240, "y": 512, "x": 512})
out = ds.mean(("y", "x")).compute()

Opening many NetCDF/Zarr files (I/O-bound)

client, cluster, dask_tmp = setup_dask_client("io", reserve_mem_gb=40)
ds = xr.open_mfdataset(files, engine="netcdf4", chunks={}, parallel=True)

Rechunking with Rechunker (spill stays on fast local storage)

client, cluster, dask_tmp = setup_dask_client("cpu", reserve_mem_gb=60)
plan = rechunker.rechunk(
    ds.to_array().data,
    target_chunks={"time": 240, "y": 512, "x": 512},
    max_mem="6GB",
    target_store="out.zarr",
    temp_store=f"{dask_tmp}/tmp_rechunk.zarr",
)
plan.execute()

Writing to Zarr in time windows

client, cluster, dask_tmp = setup_dask_client("cpu", max_workers=1, reserve_mem_gb=50)
ds.isel(time=slice(0, 240)).to_zarr("out.zarr", mode="w", consolidated=True)
for start in range(240, ds.sizes["time"], 240):
    stop = min(start + 240, ds.sizes["time"])
    ds.isel(time=slice(start, stop)).to_zarr(
        "out.zarr", mode="a", region={"time": slice(start, stop)}
    )

Built-in Profiles

Use profile= to load a named configuration:

client, cluster, dask_tmp = setup_dask_client(profile="climate_analysis")

Profile	Workload	Reserve	Notes
`climate_analysis`	cpu	60 GB	Heavy compute with large arrays
`zarr_io_heavy`	io	40 GB	Many Zarr files
`development`	mixed	8 GB	Local testing, 2 workers max
`production`	mixed	80 GB	Adaptive, dashboard off
`interactive`	mixed	20 GB	Jupyter notebooks, 4 workers

See Configuration for how to create and save your own profiles.

Dashboard

With dashboard=True (default), the cluster prints an SSH tunnel command:

Dask dashboard: http://127.0.0.1:<PORT>/status
Tunnel from your laptop:
  ssh -N -L 8787:<COMPUTE_HOST>:<PORT> gadi.nci.org.au
Then open: http://localhost:8787

CLI

# Profile management
dask-setup list                                       # list all profiles
dask-setup show climate_analysis                      # show profile details
dask-setup create my_profile --from-profile zarr_io_heavy
dask-setup validate my_profile
dask-setup export climate_analysis -o profile.yaml
dask-setup import https://example.com/team.yaml
dask-setup delete my_profile

# Synthetic benchmark
dask-setup benchmark --profile development --size small --operation mean

# Generate a PBS/SLURM job script (v2.0)
dask-setup submit my_analysis.py --scheduler pbs \
    --workers-per-node 4 --cores-per-worker 12 --mem-per-worker 32 \
    --walltime 04:00:00 --queue normal --project ab01

Documentation

Full documentation lives in the GitHub wiki:

Page	What's covered
Configuration	`DaskSetupConfig`, profiles, site-wide profiles, profile inheritance, CLI, JSON Schema
Multi-Node	`MultiNodeConfig`, PBS/SLURM cluster setup, GPU topology, shared temp dirs, `dask-setup submit`
IO-Optimization	`recommend_chunks`, `recommend_io_chunks`, Zarr v3, Kerchunk, Parquet/Arrow, storage-aware chunking
Benchmarking	`benchmark_config`, `scaling_analysis`, `chunk_impact`, `dask-setup benchmark`
Internals	Resource detection, topology decisions, temp/spill routing, module layout
Troubleshooting	Common errors, OOM, multi-node issues, migration guide

Installation

pip install dask-setup

For multi-node PBS/SLURM support:

pip install dask-setup dask-jobqueue

For GPU workloads (CuPy auto-detection):

pip install dask-setup cupy-cuda12x   # match your CUDA version

Contributing

Bug reports, feature requests, and pull requests are welcome. Please include tests and, for performance changes, benchmarks.

License

Apache-2.0 — see LICENSE for details.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

green22

Release history Release notifications | RSS feed

2.1.0

Mar 24, 2026

This version

2.0.0

Mar 24, 2026

1.0.0

Sep 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask_setup-2.0.0.tar.gz (174.2 kB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dask_setup-2.0.0-py3-none-any.whl (116.8 kB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file dask_setup-2.0.0.tar.gz.

File metadata

Download URL: dask_setup-2.0.0.tar.gz
Upload date: Mar 24, 2026
Size: 174.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dask_setup-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`0036c1cf81cd4832bc70eab821cb5bac901807ac2b31a231a0b3d2dee0c8fc8f`
MD5	`0e7dc2b14c179aa9a5427e9e79d23f6e`
BLAKE2b-256	`b464152fd8760323b454b0b2c90142b21451b15e9bffe01009505be643796319`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dask_setup-2.0.0.tar.gz:

Publisher: publish-to-pypi.yml on 21centuryweather/dask_setup

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dask_setup-2.0.0.tar.gz
- Subject digest: 0036c1cf81cd4832bc70eab821cb5bac901807ac2b31a231a0b3d2dee0c8fc8f
- Sigstore transparency entry: 1170232148
- Sigstore integration time: Mar 24, 2026
Source repository:
- Permalink: 21centuryweather/dask_setup@2ef3f95bc975c959e7a76d24262fc6af9bd79915
- Branch / Tag: refs/tags/v2.0.1
- Owner: https://github.com/21centuryweather
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@2ef3f95bc975c959e7a76d24262fc6af9bd79915
- Trigger Event: release

File details

Details for the file dask_setup-2.0.0-py3-none-any.whl.

File metadata

Download URL: dask_setup-2.0.0-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 116.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dask_setup-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`de3694f3f03ae4836d686ec2c75f45afd265350bbb248099db9548ae0758a65c`
MD5	`eea69e2a03101da14bff9c3944739494`
BLAKE2b-256	`e67d5b7e45386de35b2eddcfac1677c7bab04d9c5551647e3706974ec48ed53d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dask_setup-2.0.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on 21centuryweather/dask_setup

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dask_setup-2.0.0-py3-none-any.whl
- Subject digest: de3694f3f03ae4836d686ec2c75f45afd265350bbb248099db9548ae0758a65c
- Sigstore transparency entry: 1170232218
- Sigstore integration time: Mar 24, 2026
Source repository:
- Permalink: 21centuryweather/dask_setup@2ef3f95bc975c959e7a76d24262fc6af9bd79915
- Branch / Tag: refs/tags/v2.0.1
- Owner: https://github.com/21centuryweather
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@2ef3f95bc975c959e7a76d24262fc6af9bd79915
- Trigger Event: release

dask-setup 2.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

dask_setup

Quick Start

Workload Types

Key Parameters

Common Patterns

Built-in Profiles

Dashboard

CLI

Documentation

Installation

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance