Databricks Connect Client

These details have not been verified by PyPI

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Databricks Connect

Databricks Connect is a Python library to run PySpark DataFrame queries on a remote Spark cluster. Databricks Connect leverages the power of Spark Connect. An application using Databricks Connect runs locally, and when the results of a DataFrame query need to be evaluated, the query is run on a configured Databricks cluster.

The following is a simple Python code that uses Databricks Connect and prints out a number range. The number range query is executed on the Databricks cluster.

from databricks.connect import DatabricksSession

session = DatabricksSession.builder.getOrCreate()

df = session.range(1, 10)
df.show()

Specifying Connection Parameters

DatabricksSession offers a few ways to specify the Databricks workspace, cluster and user credentials, collectively referred to in the rest of this document as connection parameters. The specified credentials are used to execute the DataFrame queries on the cluster. This user must have cluster access permissions and appropriate data access permissions.

NOTE: Currently, Databricks Connect only supports credentials based on Personal Access Token. Other authentication mechanisms are coming soon.

When DatabricksSession is initialized with no additional parameters as below, connection parameters are picked up from the environment.

session = DatabricksSession.builder.getOrCreate()

First, the SPARK_REMOTE environment variable is used if it's configured.

If configured, the SPARK_REMOTE environment variable must contain the spark connect connection string. Read more about spark connect connection string.

SPARK_REMOTE="sc://<databricks workspace url>:443/;token=<bearer token>;x-databricks-cluster-id=<cluster id>"

If this environment variable is not configured, Databricks Connect will now look for connection parameters using the Databricks SDK.

The Databricks Python SDK reads these values from two locations - first from environment variables that may be configured. For parameters not configured via environment variables, the 'DEFAULT' profile, if set up, from the configuration file .databrickscfg. Databricks Python SDK facilitates OAuth token refreshing and enables Service Principal client credentials support on AWS and Azure. The details on the authentication process, environment variables, and other configuration options can be found in the Databricks SDK.

Similar to the authentication environment variables, the Databricks SDK reads the cluster identifier from the environment variable DATABRICKS_CLUSTER_ID or from the cluster_id entry in the config file.

In case specific profile of config file needs to be used it can be achieved as follows:

from databricks.connect import DatabricksSession

session = DatabricksSession.builder.profile("profile_name").getOrCreate()

Connection parameters can also be specified directly in code.

session = DatabricksSession.builder.remote(
    host="<databricks workspace url>",
    cluster_id="<databricks cluster id>",
    token="<bearer token>"
).getOrCreate()

Alternatively, connection can be initialized based on Config object from Databricks SDK

from databricks.sdk.core import Config

config = Config(...)
DatabricksSession.builder.sdkConfig(config).getOrCreate()

The spark connect connection string can also be specified directly in code.

session = DatabricksSession.builder\
    .remote("sc://<databricks workspace url>:443/;token=<bearer token>;x-databricks-cluster-id=<cluster id>")\
    .getOrCreate()

In summary, connection parameters are collected in the following order. When all connection parameters are available, evaluation is stopped.

Specified directly using remote(), either as a connection string or as keyword arguments.
Specified via the Databricks SDK using sdkConfig() or using profile.
Specified in the SPARK_REMOTE environment variable.
Specified via the Databricks SDK's default authentication.

Debugging

Databricks connect can generate debug logs in case they are needed for inspection.

Debug logs can be enabled by setting environment variable SPARK_CONNECT_LOG_LEVEL=debug, i.e:

$ SPARK_CONNECT_LOG_LEVEL=debug python3 myprogram.py
2023-07-24 14:40:28,505 50147 DEBUG Enabled debug logs for databricks-connect
2023-07-24 14:40:28,505 50147 DEBUG IPython module is present.
2023-07-24 14:40:28,505 50147 DEBUG Falling back to default configuration from the SDK.
2023-07-24 14:40:28,505 50147 DEBUG Loaded from environment
2023-07-24 14:40:28,505 50147 DEBUG Attempting to configure auth: pat
...

OAuth

The Databricks Connect module, via the Databricks SDK, supports OAuth authentication mechanism. This can be configured via configuration profiles in the .databrickscfg file. See [TBD: link here] on how to set up and use configuration profiles.

The following configuration profile snippet sets up OAuth integration via the Azure CLI, and should be added to the .databrickscfg file.

[azure-cli]
host = https://adb-XXX.azuredatabricks.net
auth_type = azure-cli
cluster_id = <databricks cluster id>

Similarly, the following snippet sets up OAuth integration via Azure Active Directory (AAD) service principal.

[azure-aad]
host = https://adb-XXX.azuredatabricks.net
azure_tenant_id = 00000000-0000-0000-0000-000000000001
azure_client_id = 00000000-0000-0000-0000-000000000002
azure_client_secret = s0M3p@$$wrd
cluster_id = YYY

Custom Headers

Databricks Session supports setting custom headers (in case your remote needs it). You can do it as follows:

DatabricksSession.builder.header('x-custom-header', 'value').getOrCreate()

This can be combined with other session configurations.

Project details

These details have not been verified by PyPI

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

14.3.1

Feb 22, 2024

14.3.0

Feb 4, 2024

14.2.1

Jan 12, 2024

14.2.0

Dec 7, 2023

14.1.1

Jan 12, 2024

14.1.0

Oct 24, 2023

14.0.2

Jan 12, 2024

14.0.1

Sep 29, 2023

14.0.0

Sep 20, 2023

13.3.3

Oct 12, 2023

This version

13.3.2

Sep 21, 2023

13.3.1

Sep 19, 2023

13.3.0

Aug 28, 2023

13.2.1

Aug 22, 2023

13.2.0

Jul 7, 2023

13.1.0

Jun 2, 2023

13.1.0b1 pre-release

May 25, 2023

13.0.1

May 10, 2023

13.0.0

Apr 17, 2023

13.0.0b7 pre-release

Apr 14, 2023

12.2.27

Apr 29, 2024

12.2.26

Apr 9, 2024

12.2.24

Mar 14, 2024

12.2.23 yanked

Mar 1, 2024

Reason this release was yanked:

12.2.26 released

12.2.22 yanked

Feb 14, 2024

Reason this release was yanked:

12.2.24 released

12.2.21 yanked

Feb 2, 2024

Reason this release was yanked:

Please upgrade to 12.3.23

12.2.20 yanked

Jan 24, 2024

Reason this release was yanked:

Please upgrade to 12.2.22

12.2.19 yanked

Dec 14, 2023

Reason this release was yanked:

12.2.21 released

12.2.18 yanked

Dec 5, 2023

Reason this release was yanked:

Please update to 12.2.20

12.2.17 yanked

Nov 13, 2023

12.2.15 yanked

Oct 18, 2023

Reason this release was yanked:

12.2.17 released

12.2.14 yanked

Sep 29, 2023

Reason this release was yanked:

12.2.16 released

12.2.13 yanked

Sep 15, 2023

Reason this release was yanked:

Please upgrade to 12.2.15

12.2.10 yanked

Aug 7, 2023

12.2.9 yanked

Jul 24, 2023

Reason this release was yanked:

12.2.11 released

12.2.8 yanked

Jul 6, 2023

Reason this release was yanked:

12.2.10 released

12.2.7 yanked

Jun 15, 2023

Reason this release was yanked:

12.2.9 released

11.3.34

Apr 29, 2024

11.3.33

Apr 9, 2024

11.3.31

Mar 14, 2024

11.3.30 yanked

Mar 1, 2024

Reason this release was yanked:

11.3.33 released

11.3.29 yanked

Feb 14, 2024

Reason this release was yanked:

11.3.31 released

11.3.26 yanked

Dec 14, 2023

Reason this release was yanked:

11.3.28 released

11.3.25 yanked

Dec 5, 2023

Reason this release was yanked:

Please update to 11.3.27

11.3.24 yanked

Nov 13, 2023

11.3.23 yanked

Oct 31, 2023

11.3.21 yanked

Sep 29, 2023

Reason this release was yanked:

11.3.23 released

11.3.20 yanked

Sep 15, 2023

Reason this release was yanked:

Please upgrade to 11.3.22

11.3.19 yanked

Aug 31, 2023

Reason this release was yanked:

Please update to 11.3.21

11.3.17 yanked

Aug 7, 2023

11.3.16 yanked

Jul 24, 2023

Reason this release was yanked:

11.3.18 released

11.3.15 yanked

Jul 6, 2023

Reason this release was yanked:

11.3.17 released

11.3.14 yanked

Jun 15, 2023

Reason this release was yanked:

11.3.16 released

11.3.12 yanked

May 16, 2023

Reason this release was yanked:

11.3.14 released

11.3.11 yanked

May 2, 2023

Reason this release was yanked:

11.3.13 released

11.3.10 yanked

Apr 12, 2023

Reason this release was yanked:

11.3.12 released

11.3.7 yanked

Mar 9, 2023

Reason this release was yanked:

11.3.11 released

10.4.49

Apr 29, 2024

10.4.48

Apr 9, 2024

10.4.46

Mar 14, 2024

10.4.45 yanked

Mar 1, 2024

Reason this release was yanked:

10.4.48 released

10.4.43 yanked

Feb 2, 2024

Reason this release was yanked:

Please upgrade to 10.4.45

10.4.41 yanked

Dec 14, 2023

Reason this release was yanked:

10.4.43 released

10.4.40 yanked

Dec 5, 2023

Reason this release was yanked:

Please update to 10.4.42

10.4.39 yanked

Nov 13, 2023

10.4.36 yanked

Sep 29, 2023

Reason this release was yanked:

10.4.38 released

10.4.35 yanked

Sep 15, 2023

Reason this release was yanked:

Please upgrade to 10.4.37

10.4.34 yanked

Aug 31, 2023

Reason this release was yanked:

10.4.36 is released

10.4.32 yanked

Aug 7, 2023

10.4.31 yanked

Jul 24, 2023

Reason this release was yanked:

10.4.33 released

10.4.30 yanked

Jul 6, 2023

Reason this release was yanked:

10.4.32 released

10.4.29 yanked

Jun 15, 2023

Reason this release was yanked:

10.4.31 released

10.4.27 yanked

May 16, 2023

Reason this release was yanked:

10.4.29 released

10.4.26 yanked

May 2, 2023

Reason this release was yanked:

10.4.28 released

10.4.25 yanked

Apr 12, 2023

Reason this release was yanked:

10.4.27 released

10.4.22 yanked

Mar 9, 2023

Reason this release was yanked:

10.4.26 released

10.4.21 yanked

Feb 22, 2023

Reason this release was yanked:

10.4.25 released

9.1.61

Apr 29, 2024

9.1.60

Apr 9, 2024

9.1.58

Mar 14, 2024

9.1.57 yanked

Mar 1, 2024

Reason this release was yanked:

9.1.60 released

9.1.56 yanked

Feb 14, 2024

Reason this release was yanked:

9.1.58 released

9.1.55 yanked

Feb 2, 2024

Reason this release was yanked:

Please upgrade to 9.1.57

9.1.54 yanked

Jan 24, 2024

Reason this release was yanked:

Please upgrade to 9.1.56

9.1.53 yanked

Dec 14, 2023

Reason this release was yanked:

9.1.55 released

9.1.52 yanked

Dec 5, 2023

Reason this release was yanked:

Please upgrade to 9.1.54

9.1.51 yanked

Nov 13, 2023

9.1.45 yanked

Aug 21, 2023

9.1.44 yanked

Aug 7, 2023

9.1.42 yanked

Jul 6, 2023

Reason this release was yanked:

9.1.44 released

9.1.41 yanked

Jun 15, 2023

Reason this release was yanked:

9.1.43 released

9.1.38 yanked

May 2, 2023

Reason this release was yanked:

9.1.40 released

8.1.14 yanked

Sep 23, 2021

Reason this release was yanked:

Unsupported release. Please use the latest release for the Databricks Runtime Version.

7.3.72

Sep 15, 2023

7.3.71

Aug 31, 2023

7.3.66 yanked

Jun 15, 2023

Reason this release was yanked:

7.3.68 released

6.4.49 yanked

Feb 24, 2022

Reason this release was yanked:

This Databricks Runtime Version is no longer supported.

5.5.3 yanked

Oct 26, 2019

Reason this release was yanked:

This Databricks Runtime Version is no longer supported.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

databricks_connect-13.3.2-py2.py3-none-any.whl (2.0 MB view hashes)

Uploaded Sep 21, 2023 Python 2 Python 3

Hashes for databricks_connect-13.3.2-py2.py3-none-any.whl

Hashes for databricks_connect-13.3.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`c44ed26f0de3c30e815279b8be8bcbbcab81ae16a0257acd088d22360462e67c`
MD5	`4f9d31f3ab236e174e3c6a4fa1506d5c`
BLAKE2b-256	`a81add17e5cfe1c965db23c7c2b8249ab50ed5f813aa90a954681831d729bf4c`