Dask array with query optimization
Project description
dask-array
Dask array with query optimization
Motivation
Dask Array is powerful, but requires expertise to drive effectively.
The dask-array project reimplements dask.array, but represents your
calculation at a level where it can be optimized intelligently before executed.
This allows the project to reorder and replace calculations to provide the same
result but with a more efficient path.
Installation
pip install dask-array
Usage
This project looks and feels like Dask Array
import dask_array as da
x = da.ones((1000, 1000), chunks=(100, 100))
y = x + x.T
result = y[:100, :100]
But when you go to compute, your calculation gets rewritten to be more efficient. This is apparent if you look at the query structure of the underlying array.
>>> result.pprint()
Operation Shape Bytes Chunks
Getitem (100, 100) 78 kiB 100×100
└ Add (1000, 1000) 7.6 MiB 100×100
├ Ones (1000, 1000) 7.6 MiB 100×100
└ Transpose (1000, 1000) 7.6 MiB 100×100
└ Ones (1000, 1000) 7.6 MiB 100×100
This calculation starts from a large array expression, then takes a small slice. It is more efficient to push that slice into the expression before building the task graph.
The optimize function rewrites things automatically.
>>> result.optimize().pprint()
Operation Shape Bytes Chunks
FusedBlockwise (100, 100) 78 kiB 100×100
└ Ones (100, 100) 78 kiB 100×100
You don't need to call optimize though. Dask compute/persist machinery
will do this for you. You just need to change your import.
# import dask.array as da
import dask_array as da
Native Frisky acceleration
The normal Dask scheduler path is pure Python and works without a Rust
toolchain. On platforms with a native wheel, dask-array also includes a Rust
accelerator that Frisky can use to submit compact task records instead of
materializing large Python task graphs. This is automatic when computing with a
Frisky scheduler; otherwise computations stay on the standard Dask path.
Xarray
Dask-array can replace the default dask.array by registering itself as an Xarray chunk manager.
from dask_array.xarray import register
register()
After this all xarray calculations will benefit from query optimization.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dask_array-0.2.0.tar.gz.
File metadata
- Download URL: dask_array-0.2.0.tar.gz
- Upload date:
- Size: 486.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
767109063cc51b2a9d7c3690fcec27066b397019bccd99439271ca9484a46607
|
|
| MD5 |
5b9a2a14de201adbfa9dc48fd37cbd8d
|
|
| BLAKE2b-256 |
e75b861ff3b5d475bf406f82ed6f7fa86ca54f24084124e12a5621a5a1efc1bd
|
Provenance
The following attestation bundles were made for dask_array-0.2.0.tar.gz:
Publisher:
publish.yml on mrocklin/dask-array
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dask_array-0.2.0.tar.gz -
Subject digest:
767109063cc51b2a9d7c3690fcec27066b397019bccd99439271ca9484a46607 - Sigstore transparency entry: 1932320915
- Sigstore integration time:
-
Permalink:
mrocklin/dask-array@301be20bd2d6ffdcdaeaf54ae9d91f94ef465bee -
Branch / Tag:
refs/tags/0.2.0 - Owner: https://github.com/mrocklin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@301be20bd2d6ffdcdaeaf54ae9d91f94ef465bee -
Trigger Event:
push
-
Statement type:
File details
Details for the file dask_array-0.2.0-py3-none-any.whl.
File metadata
- Download URL: dask_array-0.2.0-py3-none-any.whl
- Upload date:
- Size: 356.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55922fc1c620744f1bd42b875c1e8af12dbace948b4eac3c89510e453f4492fc
|
|
| MD5 |
71d3a8f2113e16e1bb8b3b75b18531d5
|
|
| BLAKE2b-256 |
d36b292376d5fbc125516eba4c46f6c1921d8d26210e5dbfca69aaaf5b18d7f5
|
Provenance
The following attestation bundles were made for dask_array-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on mrocklin/dask-array
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dask_array-0.2.0-py3-none-any.whl -
Subject digest:
55922fc1c620744f1bd42b875c1e8af12dbace948b4eac3c89510e453f4492fc - Sigstore transparency entry: 1932321318
- Sigstore integration time:
-
Permalink:
mrocklin/dask-array@301be20bd2d6ffdcdaeaf54ae9d91f94ef465bee -
Branch / Tag:
refs/tags/0.2.0 - Owner: https://github.com/mrocklin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@301be20bd2d6ffdcdaeaf54ae9d91f94ef465bee -
Trigger Event:
push
-
Statement type:
File details
Details for the file dask_array-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: dask_array-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 891.5 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77a9c042cdb0538a7372cfc2b992d9e4313347e0a5732e75ada579747583730e
|
|
| MD5 |
f15e2bce29ffa97144780de75a38f61d
|
|
| BLAKE2b-256 |
4f4567f3ab79f93cab9ffe60130deb7b7f2be0aa305b5db228c85ad3924f3cbf
|
Provenance
The following attestation bundles were made for dask_array-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on mrocklin/dask-array
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dask_array-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
77a9c042cdb0538a7372cfc2b992d9e4313347e0a5732e75ada579747583730e - Sigstore transparency entry: 1932321056
- Sigstore integration time:
-
Permalink:
mrocklin/dask-array@301be20bd2d6ffdcdaeaf54ae9d91f94ef465bee -
Branch / Tag:
refs/tags/0.2.0 - Owner: https://github.com/mrocklin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@301be20bd2d6ffdcdaeaf54ae9d91f94ef465bee -
Trigger Event:
push
-
Statement type:
File details
Details for the file dask_array-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: dask_array-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 885.9 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
320d7c15a6157bc616b94f8d9553674f4842256ee2289fdfca836f6d263fc1d2
|
|
| MD5 |
4f46856516d0ae6e3489639b16c461f6
|
|
| BLAKE2b-256 |
a867dfdeeb285b45b0e6b35247df14f634f4cd5aacd8c06deaa3387b47c38157
|
Provenance
The following attestation bundles were made for dask_array-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
publish.yml on mrocklin/dask-array
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dask_array-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
320d7c15a6157bc616b94f8d9553674f4842256ee2289fdfca836f6d263fc1d2 - Sigstore transparency entry: 1932321176
- Sigstore integration time:
-
Permalink:
mrocklin/dask-array@301be20bd2d6ffdcdaeaf54ae9d91f94ef465bee -
Branch / Tag:
refs/tags/0.2.0 - Owner: https://github.com/mrocklin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@301be20bd2d6ffdcdaeaf54ae9d91f94ef465bee -
Trigger Event:
push
-
Statement type:
File details
Details for the file dask_array-0.2.0-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: dask_array-0.2.0-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 847.3 kB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
034d9adb685becaa5403c2b25b4e7747b16738e3b8d6bb2a2eb9b83010d3726d
|
|
| MD5 |
7d86d89a75f3a65f459f693a453a2869
|
|
| BLAKE2b-256 |
e72ce385000ad89e9cf86d7374054334f9d0c8da794597d8c06a537dc750018f
|
Provenance
The following attestation bundles were made for dask_array-0.2.0-cp310-abi3-macosx_11_0_arm64.whl:
Publisher:
publish.yml on mrocklin/dask-array
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dask_array-0.2.0-cp310-abi3-macosx_11_0_arm64.whl -
Subject digest:
034d9adb685becaa5403c2b25b4e7747b16738e3b8d6bb2a2eb9b83010d3726d - Sigstore transparency entry: 1932321247
- Sigstore integration time:
-
Permalink:
mrocklin/dask-array@301be20bd2d6ffdcdaeaf54ae9d91f94ef465bee -
Branch / Tag:
refs/tags/0.2.0 - Owner: https://github.com/mrocklin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@301be20bd2d6ffdcdaeaf54ae9d91f94ef465bee -
Trigger Event:
push
-
Statement type: