Skip to main content

Dask array with query optimization

Project description

dask-array

Dask array with query optimization

Motivation

Dask Array is powerful, but requires expertise to drive effectively.

The dask-array project reimplements dask.array, but represents your calculation at a level where it can be optimized intelligently before executed. This allows the project to reorder and replace calculations to provide the same result but with a more efficient path.

Installation

pip install dask-array

Usage

This project looks and feels like Dask Array

import dask_array as da

x = da.ones((1000, 1000), chunks=(100, 100))

y = x + x.T
result = y[:100, :100]

But when you go to compute, your calculation gets rewritten to be more efficient. This is apparent if you look at the query structure of the underlying array.

>>> result.pprint()
Operation             Shape    Bytes   Chunks
  Getitem          (100, 100)   78 kiB  100×100
   Add          (1000, 1000)  7.6 MiB  100×100
     Ones       (1000, 1000)  7.6 MiB  100×100
     Transpose  (1000, 1000)  7.6 MiB  100×100
       Ones     (1000, 1000)  7.6 MiB  100×100

This calculation starts from a large array expression, then takes a small slice. It is more efficient to push that slice into the expression before building the task graph.

The optimize function rewrites things automatically.

>>> result.optimize().pprint()
Operation            Shape   Bytes   Chunks
  FusedBlockwise  (100, 100)  78 kiB  100×100
   Ones          (100, 100)  78 kiB  100×100

You don't need to call optimize though. Dask compute/persist machinery will do this for you. You just need to change your import.

# import dask.array as da
import dask_array as da

Native Frisky acceleration

The normal Dask scheduler path is pure Python and works without a Rust toolchain. On platforms with a native wheel, dask-array also includes a Rust accelerator that Frisky can use to submit compact task records instead of materializing large Python task graphs. This is automatic when computing with a Frisky scheduler; otherwise computations stay on the standard Dask path.

Xarray

Dask-array can replace the default dask.array by registering itself as an Xarray chunk manager.

from dask_array.xarray import register
register()

After this all xarray calculations will benefit from query optimization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask_array-0.2.0.tar.gz (486.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dask_array-0.2.0-py3-none-any.whl (356.6 kB view details)

Uploaded Python 3

dask_array-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (891.5 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

dask_array-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (885.9 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

dask_array-0.2.0-cp310-abi3-macosx_11_0_arm64.whl (847.3 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file dask_array-0.2.0.tar.gz.

File metadata

  • Download URL: dask_array-0.2.0.tar.gz
  • Upload date:
  • Size: 486.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dask_array-0.2.0.tar.gz
Algorithm Hash digest
SHA256 767109063cc51b2a9d7c3690fcec27066b397019bccd99439271ca9484a46607
MD5 5b9a2a14de201adbfa9dc48fd37cbd8d
BLAKE2b-256 e75b861ff3b5d475bf406f82ed6f7fa86ca54f24084124e12a5621a5a1efc1bd

See more details on using hashes here.

Provenance

The following attestation bundles were made for dask_array-0.2.0.tar.gz:

Publisher: publish.yml on mrocklin/dask-array

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dask_array-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: dask_array-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 356.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dask_array-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 55922fc1c620744f1bd42b875c1e8af12dbace948b4eac3c89510e453f4492fc
MD5 71d3a8f2113e16e1bb8b3b75b18531d5
BLAKE2b-256 d36b292376d5fbc125516eba4c46f6c1921d8d26210e5dbfca69aaaf5b18d7f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for dask_array-0.2.0-py3-none-any.whl:

Publisher: publish.yml on mrocklin/dask-array

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dask_array-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dask_array-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 77a9c042cdb0538a7372cfc2b992d9e4313347e0a5732e75ada579747583730e
MD5 f15e2bce29ffa97144780de75a38f61d
BLAKE2b-256 4f4567f3ab79f93cab9ffe60130deb7b7f2be0aa305b5db228c85ad3924f3cbf

See more details on using hashes here.

Provenance

The following attestation bundles were made for dask_array-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on mrocklin/dask-array

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dask_array-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for dask_array-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 320d7c15a6157bc616b94f8d9553674f4842256ee2289fdfca836f6d263fc1d2
MD5 4f46856516d0ae6e3489639b16c461f6
BLAKE2b-256 a867dfdeeb285b45b0e6b35247df14f634f4cd5aacd8c06deaa3387b47c38157

See more details on using hashes here.

Provenance

The following attestation bundles were made for dask_array-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on mrocklin/dask-array

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dask_array-0.2.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dask_array-0.2.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 034d9adb685becaa5403c2b25b4e7747b16738e3b8d6bb2a2eb9b83010d3726d
MD5 7d86d89a75f3a65f459f693a453a2869
BLAKE2b-256 e72ce385000ad89e9cf86d7374054334f9d0c8da794597d8c06a537dc750018f

See more details on using hashes here.

Provenance

The following attestation bundles were made for dask_array-0.2.0-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on mrocklin/dask-array

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page