Databricks SDK for Python (Beta)

These details have not been verified by PyPI

Project links

Documentation

Project description

Databricks SDK for Python (Beta)

PyPI

Beta: This SDK is supported for production use cases, but we do expect future releases to have some interface changes; see Interface stability. We are keen to hear feedback from you on these SDKs. Please file issues, and we will address them. | See also the SDK for Java | See also the SDK for Go | See also the Terraform Provider | See also cloud-specific docs (AWS, Azure, GCP) | See also the API reference on readthedocs

The Databricks SDK for Python includes functionality to accelerate development with Python for the Databricks Lakehouse. It covers all public Databricks REST API operations. The SDK's internal HTTP client is robust and handles failures on different levels by performing intelligent retries.

Getting started
Code examples
Authentication
Long-running operations
Paginated responses
Single-sign-on with OAuth
User Agent Request Attribution
Error handling
Logging
Integration with dbutils
Interface stability

Getting started

Please install Databricks SDK for Python via pip install databricks-sdk and instantiate WorkspaceClient:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
for c in w.clusters.list():
    print(c.cluster_name)

Databricks SDK for Python is compatible with Python 3.7 (until June 2023), 3.8, 3.9, 3.10, and 3.11.
Note: Databricks Runtime starting from version 13.1 includes a bundled version of the Python SDK.
It is highly recommended to upgrade to the latest version which you can do by running the following in a notebook cell:

%pip install --upgrade databricks-sdk

followed by

dbutils.library.restartPython()

Code examples

The Databricks SDK for Python comes with a number of examples demonstrating how to use the library for various common use-cases, including

These examples and more are located in the examples/ directory of the Github repository.

Some other examples of using the SDK include:

Unity Catalog Automated Migration heavily relies on Python SDK for working with Databricks APIs.
ip-access-list-analyzer checks & prunes invalid entries from IP Access Lists.

Authentication

If you use Databricks configuration profiles or Databricks-specific environment variables for Databricks authentication, the only code required to start working with a Databricks workspace is the following code snippet, which instructs the Databricks SDK for Python to use its default authentication flow:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
w. # press <TAB> for autocompletion

The conventional name for the variable that holds the workspace-level client of the Databricks SDK for Python is w, which is shorthand for workspace.

In this section

Default authentication flow
Databricks native authentication
Azure native authentication
Overriding .databrickscfg
Additional authentication configuration options

Default authentication flow

If you run the Databricks Terraform Provider, the Databricks SDK for Go, the Databricks CLI, or applications that target the Databricks SDKs for other languages, most likely they will all interoperate nicely together. By default, the Databricks SDK for Python tries the following authentication methods, in the following order, until it succeeds:

Databricks native authentication
Azure native authentication
If the SDK is unsuccessful at this point, it returns an authentication error and stops running.

You can instruct the Databricks SDK for Python to use a specific authentication method by setting the auth_type argument as described in the following sections.

For each authentication method, the SDK searches for compatible authentication credentials in the following locations, in the following order. Once the SDK finds a compatible set of credentials that it can use, it stops searching:

Credentials that are hard-coded into configuration arguments.

:warning: Caution: Databricks does not recommend hard-coding credentials into arguments, as they can be exposed in plain text in version control systems. Use environment variables or configuration profiles instead.
Credentials in Databricks-specific environment variables.
For Databricks native authentication, credentials in the .databrickscfg file's DEFAULT configuration profile from its default file location (~ for Linux or macOS, and %USERPROFILE% for Windows).
For Azure native authentication, the SDK searches for credentials through the Azure CLI as needed.

Depending on the Databricks authentication method, the SDK uses the following information. Presented are the WorkspaceClient and AccountClient arguments (which have corresponding .databrickscfg file fields), their descriptions, and any corresponding environment variables.

Databricks native authentication

By default, the Databricks SDK for Python initially tries Databricks token authentication (auth_type='pat' argument). If the SDK is unsuccessful, it then tries Workload Identity Federation (WIF). See Supported WIF for the supported JWT token providers.

For Databricks token authentication, you must provide host and token; or their environment variable or .databrickscfg file field equivalents.
For Databricks OIDC authentication, you must provide the host, client_id and token_audience (optional) either directly, through the corresponding environment variables, or in your .databrickscfg configuration file.
For Azure DevOps OIDC authentication, the token_audience is irrelevant as the audience is always set to api://AzureADTokenExchange. Also, the System.AccessToken pipeline variable required for OIDC request must be exposed as the SYSTEM_ACCESSTOKEN environment variable, following Pipeline variables

Argument	Description	Environment variable
`host`	(String) The Databricks host URL for either the Databricks workspace endpoint or the Databricks accounts endpoint.	`DATABRICKS_HOST`
`account_id`	(String) The Databricks account ID for the Databricks accounts endpoint. Only has effect when `Host` is either `https://accounts.cloud.databricks.com/` (AWS), `https://accounts.azuredatabricks.net/` (Azure), or `https://accounts.gcp.databricks.com/` (GCP).	`DATABRICKS_ACCOUNT_ID`
`token`	(String) The Databricks personal access token (PAT) (AWS, Azure, and GCP) or Azure Active Directory (Azure AD) token (Azure).	`DATABRICKS_TOKEN`
`client_id`	(String) The Databricks Service Principal Application ID.	`DATABRICKS_CLIENT_ID`
`token_audience`	(String) When using Workload Identity Federation, the audience to specify when fetching an ID token from the ID token supplier.	`TOKEN_AUDIENCE`

For example, to use Databricks token authentication:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient(host=input('Databricks Workspace URL: '), token=input('Token: '))

Azure native authentication

By default, the Databricks SDK for Python first tries Azure client secret authentication (auth_type='azure-client-secret' argument). If the SDK is unsuccessful, it then tries Azure CLI authentication (auth_type='azure-cli' argument). See Manage service principals.

The Databricks SDK for Python picks up an Azure CLI token, if you've previously authenticated as an Azure user by running az login on your machine. See Get Azure AD tokens for users by using the Azure CLI.

To authenticate as an Azure Active Directory (Azure AD) service principal, you must provide one of the following. See also Add a service principal to your Azure Databricks account:

azure_workspace_resource_id, azure_client_secret, azure_client_id, and azure_tenant_id; or their environment variable or .databrickscfg file field equivalents.
azure_workspace_resource_id and azure_use_msi; or their environment variable or .databrickscfg file field equivalents.

Argument	Description	Environment variable
`azure_workspace_resource_id`	(String) The Azure Resource Manager ID for the Azure Databricks workspace, which is exchanged for a Databricks host URL.	`DATABRICKS_AZURE_RESOURCE_ID`
`azure_use_msi`	(Boolean) `true` to use Azure Managed Service Identity passwordless authentication flow for service principals. This feature is not yet implemented in the Databricks SDK for Python.	`ARM_USE_MSI`
`azure_client_secret`	(String) The Azure AD service principal's client secret.	`ARM_CLIENT_SECRET`
`azure_client_id`	(String) The Azure AD service principal's application ID.	`ARM_CLIENT_ID`
`azure_tenant_id`	(String) The Azure AD service principal's tenant ID.	`ARM_TENANT_ID`
`azure_environment`	(String) The Azure environment type (such as Public, UsGov, China, and Germany) for a specific set of API endpoints. Defaults to `PUBLIC`.	`ARM_ENVIRONMENT`

For example, to use Azure client secret authentication:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient(host=input('Databricks Workspace URL: '),
                    azure_workspace_resource_id=input('Azure Resource ID: '),
                    azure_tenant_id=input('AAD Tenant ID: '),
                    azure_client_id=input('AAD Client ID: '),
                    azure_client_secret=input('AAD Client Secret: '))

Please see more examples in this document.

Google Cloud Platform native authentication

By default, the Databricks SDK for Python first tries GCP credentials authentication (auth_type='google-credentials', argument). If the SDK is unsuccessful, it then tries Google Cloud Platform (GCP) ID authentication (auth_type='google-id', argument).

The Databricks SDK for Python picks up an OAuth token in the scope of the Google Default Application Credentials (DAC) flow. This means that if you have run gcloud auth application-default login on your development machine, or launch the application on the compute, that is allowed to impersonate the Google Cloud service account specified in google_service_account. Authentication should then work out of the box. See Creating and managing service accounts.

To authenticate as a Google Cloud service account, you must provide one of the following:

host and google_credentials; or their environment variable or .databrickscfg file field equivalents.
host and google_service_account; or their environment variable or .databrickscfg file field equivalents.

Argument	Description	Environment variable
`google_credentials`	(String) GCP Service Account Credentials JSON or the location of these credentials on the local filesystem.	`GOOGLE_CREDENTIALS`
`google_service_account`	(String) The Google Cloud Platform (GCP) service account e-mail used for impersonation in the Default Application Credentials Flow that does not require a password.	`DATABRICKS_GOOGLE_SERVICE_ACCOUNT`

For example, to use Google ID authentication:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient(host=input('Databricks Workspace URL: '),
                    google_service_account=input('Google Service Account: '))

Overriding `.databrickscfg`

For Databricks native authentication, you can override the default behavior for using .databrickscfg as follows:

Argument	Description	Environment variable
`profile`	(String) A connection profile specified within `.databrickscfg` to use instead of `DEFAULT`.	`DATABRICKS_CONFIG_PROFILE`
`config_file`	(String) A non-default location of the Databricks CLI credentials file.	`DATABRICKS_CONFIG_FILE`

For example, to use a profile named MYPROFILE instead of DEFAULT:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient(profile='MYPROFILE')
# Now call the Databricks workspace APIs as desired...

Additional authentication configuration options

For all authentication methods, you can override the default behavior in client arguments as follows:

Argument	Description	Environment variable
`auth_type`	(String) When multiple auth attributes are available in the environment, use the auth type specified by this argument. This argument also holds the currently selected auth.	`DATABRICKS_AUTH_TYPE`
`http_timeout_seconds`	(Integer) Number of seconds for HTTP timeout. Default is 60.	(None)
`retry_timeout_seconds`	(Integer) Number of seconds to keep retrying HTTP requests. Default is 300 (5 minutes).	(None)
`debug_truncate_bytes`	(Integer) Truncate JSON fields in debug logs above this limit. Default is 96.	`DATABRICKS_DEBUG_TRUNCATE_BYTES`
`debug_headers`	(Boolean) `true` to debug HTTP headers of requests made by the application. Default is `false`, as headers contain sensitive data, such as access tokens.	`DATABRICKS_DEBUG_HEADERS`
`rate_limit`	(Integer) Maximum number of requests per second made to Databricks REST API.	`DATABRICKS_RATE_LIMIT`

For example, here's how you can update the overall retry timeout:

from databricks.sdk import WorkspaceClient
from databricks.sdk.core import Config
w = WorkspaceClient(config=Config(retry_timeout_seconds=300))
# Now call the Databricks workspace APIs as desired...

Long-running operations

When you invoke a long-running operation, the SDK provides a high-level API to trigger these operations and wait for the related entities to reach the correct state or return the error message in case of failure. All long-running operations return generic Wait instance with result() method to get a result of long-running operation, once it's finished. Databricks SDK for Python picks the most reasonable default timeouts for every method, but sometimes you may find yourself in a situation, where you'd want to provide datetime.timedelta() as the value of timeout argument to result() method.

There are a number of long-running operations in Databricks APIs such as managing:

Clusters,
Command execution
Jobs
Libraries
Delta Live Tables pipelines
Databricks SQL warehouses.

For example, in the Clusters API, once you create a cluster, you receive a cluster ID, and the cluster is in the PENDING state Meanwhile Databricks takes care of provisioning virtual machines from the cloud provider in the background. The cluster is only usable in the RUNNING state and so you have to wait for that state to be reached.

Another example is the API for running a job or repairing the run: right after the run starts, the run is in the PENDING state. The job is only considered to be finished when it is in either the TERMINATED or SKIPPED state. Also you would likely need the error message if the long-running operation times out and fails with an error code. Other times you may want to configure a custom timeout other than the default of 20 minutes.

In the following example, w.clusters.create returns ClusterInfo only once the cluster is in the RUNNING state, otherwise it will timeout in 10 minutes:

import datetime
import logging
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
info = w.clusters.create_and_wait(cluster_name='Created cluster',
                                  spark_version='12.0.x-scala2.12',
                                  node_type_id='m5d.large',
                                  autotermination_minutes=10,
                                  num_workers=1,
                                  timeout=datetime.timedelta(minutes=10))
logging.info(f'Created: {info}')

Please look at the examples/starting_job_and_waiting.py for a more advanced usage:

import datetime
import logging
import time

from databricks.sdk import WorkspaceClient
import databricks.sdk.service.jobs as j

w = WorkspaceClient()

# create a dummy file on DBFS that just sleeps for 10 seconds
py_on_dbfs = f'/home/{w.current_user.me().user_name}/sample.py'
with w.dbfs.open(py_on_dbfs, write=True, overwrite=True) as f:
    f.write(b'import time; time.sleep(10); print("Hello, World!")')

# trigger one-time-run job and get waiter object
waiter = w.jobs.submit(run_name=f'py-sdk-run-{time.time()}', tasks=[
    j.RunSubmitTaskSettings(
        task_key='hello_world',
        new_cluster=j.BaseClusterInfo(
            spark_version=w.clusters.select_spark_version(long_term_support=True),
            node_type_id=w.clusters.select_node_type(local_disk=True),
            num_workers=1
        ),
        spark_python_task=j.SparkPythonTask(
            python_file=f'dbfs:{py_on_dbfs}'
        ),
    )
])

logging.info(f'starting to poll: {waiter.run_id}')

# callback, that receives a polled entity between state updates
def print_status(run: j.Run):
    statuses = [f'{t.task_key}: {t.state.life_cycle_state}' for t in run.tasks]
    logging.info(f'workflow intermediate status: {", ".join(statuses)}')

# If you want to perform polling in a separate thread, process, or service,
# you can use w.jobs.wait_get_run_job_terminated_or_skipped(
#   run_id=waiter.run_id,
#   timeout=datetime.timedelta(minutes=15),
#   callback=print_status) to achieve the same results.
#
# Waiter interface allows for `w.jobs.submit(..).result()` simplicity in
# the scenarios, where you need to block the calling thread for the job to finish.
run = waiter.result(timeout=datetime.timedelta(minutes=15),
                    callback=print_status)

logging.info(f'job finished: {run.run_page_url}')

Paginated responses

On the platform side the Databricks APIs have different wait to deal with pagination:

Some APIs follow the offset-plus-limit pagination
Some start their offsets from 0 and some from 1
Some use the cursor-based iteration
Others just return all results in a single response

The Databricks SDK for Python hides this complexity under Iterator[T] abstraction, where multi-page results yield items. Python typing helps to auto-complete the individual item fields.

import logging
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
for repo in w.repos.list():
    logging.info(f'Found repo: {repo.path}')

Please look at the examples/last_job_runs.py for a more advanced usage:

import logging
from collections import defaultdict
from datetime import datetime, timezone
from databricks.sdk import WorkspaceClient

latest_state = {}
all_jobs = {}
durations = defaultdict(list)

w = WorkspaceClient()
for job in w.jobs.list():
    all_jobs[job.job_id] = job
    for run in w.jobs.list_runs(job_id=job.job_id, expand_tasks=False):
        durations[job.job_id].append(run.run_duration)
        if job.job_id not in latest_state:
            latest_state[job.job_id] = run
            continue
        if run.end_time < latest_state[job.job_id].end_time:
            continue
        latest_state[job.job_id] = run

summary = []
for job_id, run in latest_state.items():
    summary.append({
        'job_name': all_jobs[job_id].settings.name,
        'last_status': run.state.result_state,
        'last_finished': datetime.fromtimestamp(run.end_time/1000, timezone.utc),
        'average_duration': sum(durations[job_id]) / len(durations[job_id])
    })

for line in sorted(summary, key=lambda s: s['last_finished'], reverse=True):
    logging.info(f'Latest: {line}')

Single-Sign-On (SSO) with OAuth

Authorization Code flow with PKCE

For a regular web app running on a server, it's recommended to use the Authorization Code Flow to obtain an Access Token and a Refresh Token. This method is considered safe because the Access Token is transmitted directly to the server hosting the app, without passing through the user's web browser and risking exposure.

To enhance the security of the Authorization Code Flow, the PKCE (Proof Key for Code Exchange) mechanism can be employed. With PKCE, the calling application generates a secret called the Code Verifier, which is verified by the authorization server. The app also creates a transform value of the Code Verifier, called the Code Challenge, and sends it over HTTPS to obtain an Authorization Code. By intercepting the Authorization Code, a malicious attacker cannot exchange it for a token without possessing the Code Verifier.

The presented sample is a Python3 script that uses the Flask web framework along with Databricks SDK for Python to demonstrate how to implement the OAuth Authorization Code flow with PKCE security. It can be used to build an app where each user uses their identity to access Databricks resources. The script can be executed with or without client and secret credentials for a custom OAuth app.

Databricks SDK for Python exposes the oauth_client.initiate_consent() helper to acquire user redirect URL and initiate PKCE state verification. Application developers are expected to persist RefreshableCredentials in the webapp session and restore it via RefreshableCredentials.from_dict(oauth_client, session['creds']) helpers.

Works for both AWS and Azure. Not supported for GCP at the moment.

from databricks.sdk.oauth import OAuthClient

oauth_client = OAuthClient(host='<workspace-url>',
                           client_id='<oauth client ID>',
                           redirect_url=f'http://host.domain/callback',
                           scopes=['clusters'])

import secrets
from flask import Flask, render_template_string, request, redirect, url_for, session

APP_NAME = 'flask-demo'
app = Flask(APP_NAME)
app.secret_key = secrets.token_urlsafe(32)


@app.route('/callback')
def callback():
   from databricks.sdk.oauth import Consent
   consent = Consent.from_dict(oauth_client, session['consent'])
   session['creds'] = consent.exchange_callback_parameters(request.args).as_dict()
   return redirect(url_for('index'))


@app.route('/')
def index():
   if 'creds' not in session:
      consent = oauth_client.initiate_consent()
      session['consent'] = consent.as_dict()
      return redirect(consent.auth_url)

   from databricks.sdk import WorkspaceClient
   from databricks.sdk.oauth import SessionCredentials

   credentials_provider = SessionCredentials.from_dict(oauth_client, session['creds'])
   workspace_client = WorkspaceClient(host=oauth_client.host,
                                      product=APP_NAME,
                                      credentials_provider=credentials_provider)

   return render_template_string('...', w=workspace_client)

SSO for local scripts on development machines

For applications, that do run on developer workstations, Databricks SDK for Python provides auth_type='external-browser' utility, that opens up a browser for a user to go through SSO flow. Azure support is still in the early experimental stage.

from databricks.sdk import WorkspaceClient

host = input('Enter Databricks host: ')

w = WorkspaceClient(host=host, auth_type='external-browser')
clusters = w.clusters.list()

for cl in clusters:
    print(f' - {cl.cluster_name} is {cl.state}')

Creating custom OAuth applications

In order to use OAuth with Databricks SDK for Python, you should use account_client.custom_app_integration.create API.

import logging, getpass
from databricks.sdk import AccountClient
account_client = AccountClient(host='https://accounts.cloud.databricks.com',
                               account_id=input('Databricks Account ID: '),
                               username=input('Username: '),
                               password=getpass.getpass('Password: '))

logging.info('Enrolling all published apps...')
account_client.o_auth_enrollment.create(enable_all_published_apps=True)

status = account_client.o_auth_enrollment.get()
logging.info(f'Enrolled all published apps: {status}')

custom_app = account_client.custom_app_integration.create(
    name='awesome-app',
    redirect_urls=[f'https://host.domain/path/to/callback'],
    confidential=True)
logging.info(f'Created new custom app: '
             f'--client_id {custom_app.client_id} '
             f'--client_secret {custom_app.client_secret}')

User Agent Request Attribution

The Databricks SDK for Python uses the User-Agent header to include request metadata along with each request. By default, this includes the version of the Python SDK, the version of the Python language used by your application, and the underlying operating system. To statically add additional metadata, you can use the with_partner() and with_product() functions in the databricks.sdk.useragent module. with_partner() can be used by partners to indicate that code using the Databricks SDK for Go should be attributed to a specific partner. Multiple partners can be registered at once. Partner names can contain any number, digit, ., -, _ or +.

from databricks.sdk import useragent
useragent.with_product("partner-abc")
useragent.with_partner("partner-xyz")

with_product() can be used to define the name and version of the product that is built with the Databricks SDK for Python. The product name has the same restrictions as the partner name above, and the product version must be a valid SemVer. Subsequent calls to with_product() replace the original product with the new user-specified one.

from databricks.sdk import useragent
useragent.with_product("databricks-example-product", "1.2.0")

If both the DATABRICKS_SDK_UPSTREAM and DATABRICKS_SDK_UPSTREAM_VERSION environment variables are defined, these will also be included in the User-Agent header.

If additional metadata needs to be specified that isn't already supported by the above interfaces, you can use the with_user_agent_extra() function to register arbitrary key-value pairs to include in the user agent. Multiple values associated with the same key are allowed. Keys have the same restrictions as the partner name above. Values must be either as described above or SemVer strings.

Additional User-Agent information can be associated with different instances of DatabricksConfig. To add metadata to a specific instance of DatabricksConfig, use the with_user_agent_extra() method.

Error handling

The Databricks SDK for Python provides a robust error-handling mechanism that allows developers to catch and handle API errors. When an error occurs, the SDK will raise an exception that contains information about the error, such as the HTTP status code, error message, and error details. Developers can catch these exceptions and handle them appropriately in their code.

from databricks.sdk import WorkspaceClient
from databricks.sdk.errors import ResourceDoesNotExist

w = WorkspaceClient()
try:
    w.clusters.get(cluster_id='1234-5678-9012')
except ResourceDoesNotExist as e:
    print(f'Cluster not found: {e}')

The SDK handles inconsistencies in error responses amongst the different services, providing a consistent interface for developers to work with. Simply catch the appropriate exception type and handle the error as needed. The errors returned by the Databricks API are defined in databricks/sdk/errors/platform.py.

Logging

The Databricks SDK for Python seamlessly integrates with the standard Logging facility for Python. This allows developers to easily enable and customize logging for their Databricks Python projects. To enable debug logging in your Databricks Python project, you can follow the example below:

import logging, sys
logging.basicConfig(stream=sys.stderr,
                    level=logging.INFO,
                    format='%(asctime)s [%(name)s][%(levelname)s] %(message)s')
logging.getLogger('databricks.sdk').setLevel(logging.DEBUG)

from databricks.sdk import WorkspaceClient
w = WorkspaceClient(debug_truncate_bytes=1024, debug_headers=False)
for cluster in w.clusters.list():
    logging.info(f'Found cluster: {cluster.cluster_name}')

In the above code snippet, the logging module is imported and the basicConfig() method is used to set the logging level to DEBUG. This will enable logging at the debug level and above. Developers can adjust the logging level as needed to control the verbosity of the logging output. The SDK will log all requests and responses to standard error output, using the format > for requests and < for responses. In some cases, requests or responses may be truncated due to size considerations. If this occurs, the log message will include the text ... (XXX additional elements) to indicate that the request or response has been truncated. To increase the truncation limits, developers can set the debug_truncate_bytes configuration property or the DATABRICKS_DEBUG_TRUNCATE_BYTES environment variable. To protect sensitive data, such as authentication tokens, passwords, or any HTTP headers, the SDK will automatically replace these values with **REDACTED** in the log output. Developers can disable this redaction by setting the debug_headers configuration property to True.

2023-03-22 21:19:21,702 [databricks.sdk][DEBUG] GET /api/2.0/clusters/list
< 200 OK
< {
<   "clusters": [
<     {
<       "autotermination_minutes": 60,
<       "cluster_id": "1109-115255-s1w13zjj",
<       "cluster_name": "DEFAULT Test Cluster",
<       ... truncated for brevity
<     },
<     "... (47 additional elements)"
<   ]
< }

Overall, the logging capabilities provided by the Databricks SDK for Python can be a powerful tool for monitoring and troubleshooting your Databricks Python projects. Developers can use the various logging methods and configuration options provided by the SDK to customize the logging output to their specific needs.

Interaction with `dbutils`

You can use the client-side implementation of dbutils by accessing dbutils property on the WorkspaceClient. Most of the dbutils.fs operations and dbutils.secrets are implemented natively in Python within Databricks SDK. Non-SDK implementations still require a Databricks cluster, that you have to specify through the cluster_id configuration attribute or DATABRICKS_CLUSTER_ID environment variable. Don't worry if cluster is not running: internally, Databricks SDK for Python calls w.clusters.ensure_cluster_is_running().

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
dbutils = w.dbutils

files_in_root = dbutils.fs.ls('/')
print(f'number of files in root: {len(files_in_root)}')

Alternatively, you can import dbutils from databricks.sdk.runtime module, but you have to make sure that all configuration is already present in the environment variables:

from databricks.sdk.runtime import dbutils

for secret_scope in dbutils.secrets.listScopes():
    for secret_metadata in dbutils.secrets.list(secret_scope.name):
        print(f'found {secret_metadata.key} secret in {secret_scope.name} scope')

Interface stability

Databricks is actively working on stabilizing the Databricks SDK for Python's interfaces. API clients for all services are generated from specification files that are synchronized from the main platform. You are highly encouraged to pin the exact dependency version and read the changelog where Databricks documents the changes. Databricks may have minor documented backward-incompatible changes, such as renaming some type names to bring more consistency.

Project details

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.85.0

Feb 5, 2026

0.84.0

Feb 4, 2026

0.83.0

Feb 3, 2026

0.82.0

Jan 29, 2026

0.81.0

Jan 27, 2026

0.80.0

Jan 22, 2026

0.79.0

Jan 22, 2026

0.78.0

Jan 15, 2026

0.77.0

Jan 6, 2026

0.76.0

Dec 17, 2025

0.75.0

Dec 17, 2025

0.74.0

Dec 10, 2025

0.73.0

Nov 5, 2025

0.72.0

Nov 4, 2025

0.71.0

Oct 30, 2025

0.70.0

Oct 23, 2025

0.69.0

Oct 20, 2025

0.68.0

Oct 14, 2025

0.67.0

Sep 25, 2025

0.66.0

Sep 22, 2025

0.65.0

Sep 2, 2025

0.64.0

Aug 20, 2025

0.63.0

Aug 13, 2025

0.62.0

Aug 6, 2025

0.61.0

Jul 31, 2025

0.60.0

Jul 24, 2025

0.59.0

Jul 17, 2025

0.58.0

Jul 9, 2025

0.57.0

Jun 13, 2025

0.56.0

Jun 5, 2025

0.55.0

May 27, 2025

0.54.0

May 22, 2025

0.53.0

May 13, 2025

0.52.0

May 2, 2025

0.51.0

Apr 30, 2025

0.50.0

Apr 14, 2025

0.49.0

Mar 28, 2025

0.48.0

Mar 27, 2025

0.47.0

Mar 21, 2025

0.46.0

Mar 12, 2025

0.45.0

Mar 7, 2025

0.44.1

Feb 14, 2025

0.44.0

Feb 11, 2025

0.43.0

Feb 4, 2025

0.42.0

Jan 30, 2025

0.41.0

Jan 20, 2025

0.40.0

Dec 19, 2024

0.39.0

Dec 11, 2024

0.38.0

Nov 18, 2024

0.37.0

Nov 14, 2024

0.36.0

Oct 22, 2024

0.35.0

Oct 17, 2024

0.34.0

Oct 7, 2024

0.33.0

Sep 26, 2024

0.32.3

Sep 19, 2024

0.32.2

Sep 17, 2024

0.32.1

Sep 10, 2024

0.32.0

Sep 4, 2024

0.31.1

Aug 28, 2024

0.31.0

Aug 26, 2024

0.30.0

Aug 13, 2024

0.29.0

Jun 27, 2024

0.28.0

May 23, 2024

0.27.1

May 16, 2024

0.27.0

May 3, 2024

0.26.0

Apr 24, 2024

0.25.1

Apr 12, 2024

0.25.0

Apr 12, 2024

0.24.0

Apr 2, 2024

0.23.0

Mar 20, 2024

0.22.0

Mar 15, 2024

0.21.0

Mar 7, 2024

0.20.0

Feb 19, 2024

0.19.1

Feb 15, 2024

0.19.0

Feb 9, 2024

0.18.0

Jan 23, 2024

0.17.0

Jan 11, 2024

0.16.0

Dec 20, 2023

0.15.0

Dec 12, 2023

0.14.0

Nov 29, 2023

0.13.0

Nov 14, 2023

0.12.0

Oct 24, 2023

0.11.0

Oct 12, 2023

0.10.0

Oct 3, 2023

0.9.0

Sep 20, 2023

0.8.0

Sep 4, 2023

0.7.1

Aug 31, 2023

0.7.0

Aug 29, 2023

0.6.0

Aug 17, 2023

0.5.0

Aug 11, 2023

0.4.0

Aug 7, 2023

0.3.1

Aug 2, 2023

0.3.0

Jul 27, 2023

0.2.1

Jul 18, 2023

0.2.0

Jul 18, 2023

0.1.12

Jun 28, 2023

0.1.11

Jun 21, 2023

0.1.10

Jun 15, 2023

0.1.9

Jun 9, 2023

0.1.8

May 22, 2023

0.1.7

May 17, 2023

0.1.6

May 10, 2023

0.1.5

May 8, 2023

0.1.4

May 5, 2023

0.1.3

May 3, 2023

0.1.2

May 3, 2023

0.1.1

Apr 28, 2023

0.1.0

Apr 20, 2023

0.0.7

Mar 29, 2023

0.0.6

Mar 24, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_sdk-0.85.0.tar.gz (846.3 kB view details)

Uploaded Feb 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

databricks_sdk-0.85.0-py3-none-any.whl (796.9 kB view details)

Uploaded Feb 5, 2026 Python 3

File details

Details for the file databricks_sdk-0.85.0.tar.gz.

File metadata

Download URL: databricks_sdk-0.85.0.tar.gz
Upload date: Feb 5, 2026
Size: 846.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databricks_sdk-0.85.0.tar.gz
Algorithm	Hash digest
SHA256	`0b5f415fba69ea0c5bfc4d0b21cb3366c6b66f678e78e4b3c94cbcf2e9e0972f`
MD5	`3e26505fa5e02c7f779a57061511d811`
BLAKE2b-256	`7d403941b6919c3854bd107e04be1686b3e0f1ce3ca4fbeea0c7fd81909bd90c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_sdk-0.85.0.tar.gz:

Publisher: release.yml on databricks/databricks-sdk-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: databricks_sdk-0.85.0.tar.gz
- Subject digest: 0b5f415fba69ea0c5bfc4d0b21cb3366c6b66f678e78e4b3c94cbcf2e9e0972f
- Sigstore transparency entry: 919180156
- Sigstore integration time: Feb 5, 2026
Source repository:
- Permalink: databricks/databricks-sdk-py@ad70a4797c1f99c9df7a41a00e3d07299d9c0417
- Branch / Tag: refs/tags/v0.85.0
- Owner: https://github.com/databricks
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ad70a4797c1f99c9df7a41a00e3d07299d9c0417
- Trigger Event: push

File details

Details for the file databricks_sdk-0.85.0-py3-none-any.whl.

File metadata

Download URL: databricks_sdk-0.85.0-py3-none-any.whl
Upload date: Feb 5, 2026
Size: 796.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databricks_sdk-0.85.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2a2da176a55d55fb84696e0255520e99e838dd942b97b971dff724041fe00c64`
MD5	`b52ba0aaf7d334a97f8c8c20eddc9d34`
BLAKE2b-256	`e9e81a3292820762a9b48c4774d2f9297b2e2c43319dc4b5d31a585fb76e3a05`

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_sdk-0.85.0-py3-none-any.whl:

Publisher: release.yml on databricks/databricks-sdk-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: databricks_sdk-0.85.0-py3-none-any.whl
- Subject digest: 2a2da176a55d55fb84696e0255520e99e838dd942b97b971dff724041fe00c64
- Sigstore transparency entry: 919180173
- Sigstore integration time: Feb 5, 2026
Source repository:
- Permalink: databricks/databricks-sdk-py@ad70a4797c1f99c9df7a41a00e3d07299d9c0417
- Branch / Tag: refs/tags/v0.85.0
- Owner: https://github.com/databricks
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ad70a4797c1f99c9df7a41a00e3d07299d9c0417
- Trigger Event: push

databricks-sdk 0.85.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Databricks SDK for Python (Beta)

Contents

Getting started

Code examples

Authentication

In this section

Default authentication flow

Databricks native authentication

Azure native authentication

Google Cloud Platform native authentication

Overriding .databrickscfg

Additional authentication configuration options

Long-running operations

Paginated responses

Single-Sign-On (SSO) with OAuth

Authorization Code flow with PKCE

SSO for local scripts on development machines

Creating custom OAuth applications

User Agent Request Attribution

Error handling

Logging

Interaction with dbutils

Interface stability

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Overriding `.databrickscfg`

Interaction with `dbutils`