Skip to main content

No project description provided

Project description

dask-databricks

Cluster tools for running Dask on Databricks multi-node clusters.

Quickstart

To launch a Dask cluster on Databricks you need to create an init script with the following contents and configure your multi-node cluster to use it.

#!/bin/bash

# Install Dask + Dask Databricks
/databricks/python/bin/pip install --upgrade dask[complete] dask-databricks

# Start Dask cluster components
dask databricks run

Then from your Databricks Notebook you can quickly connect a Dask Client to the scheduler running on the Spark Driver Node.

import dask_databricks

client = dask_databricks.get_client()

Now you can submit work from your notebook to the multi-node Dask cluster.

def inc(x):
    return x + 1

x = client.submit(inc, 10)
x.result()

Dashboard

You can access the Dask dashboard via the Databricks driver-node proxy. The link can be found in Client or DatabricksCluster repr or via client.dashboard_link.

>>> print(client.dashboard_link)
https://dbc-dp-xxxx.cloud.databricks.com/driver-proxy/o/xxxx/xx-xxx-xxxx/8087/status

Releasing

Releases of this project are automated using GitHub Actions and the pypa/gh-action-pypi-publish action.

To create a new release push a tag to the upstream repo in the format x.x.x. The package will be built and pushed to PyPI automatically and then later picked up by conda-forge.

# Make sure you have an upstream remote
git remote add upstream git@github.com:dask-contrib/dask-databricks.git

# Create a tag and push it upstream
git tag x.x.x && git push upstream main --tags

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask_databricks-0.3.2.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

dask_databricks-0.3.2-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file dask_databricks-0.3.2.tar.gz.

File metadata

  • Download URL: dask_databricks-0.3.2.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for dask_databricks-0.3.2.tar.gz
Algorithm Hash digest
SHA256 09dc89dbb472270ab5491891f65199a4a258d9e6214e7bfb8243077aa71515bc
MD5 2d4ef22f462ac775a4facb79130c3862
BLAKE2b-256 269352bed2f5a9f5c32abef821af49984d919e8a6b094d9562c91c99ab88baa2

See more details on using hashes here.

File details

Details for the file dask_databricks-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for dask_databricks-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 37c3102917c9bd2da22204e66f9d8bec9ef3b79052e9e91f86d6e6d35bee3a9d
MD5 c2cec7445bed4d35969215bb7c5c6c25
BLAKE2b-256 11c338bd87b8451545e29bec360678dbf280ebbb4cc9685323688d48085eaf5a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page