Skip to main content

REST client for Databricks

Project description

databricks-client

About

A REST client for the Databricks REST API.

This module is a thin layer allowing to build HTTP Requests. It does not expose API operations as distinct methods, but rather exposes generic methods allowing to build API calls.

The Databricks API sometimes returns 200 error codes and HTML content when the request is not properly authenticated. The client intercepts such occurrences (detecting non-JSON returned content) and wraps them into an exception.

This open-source project is not developed by nor affiliated with Databricks.

Installing

pip install databricks-client

Usage

import databricks_client

client = databricks_client.create("https://northeurope.azuredatabricks.net/api/2.0")
client.auth_pat_token(pat_token)
client.ensure_available()
clusters_list = client.get('clusters/list')
for cluster in clusters_list["clusters"]:
    print(cluster)

Usage with a newly provisioned workspace

If using this module as part of a provisioning job, you need to call client.ensure_available().

When the first user logs it to a new Databricks workspace, workspace provisioning is triggered, and the API is not available until that job has completed (that usually takes under a minute, but could take longer depending on the network configuration). In that case you would get an error such as the following when calling the API:

"Succeeded{"error_code":"INVALID_PARAMETER_VALUE","message":"Unknown worker environment WorkerEnvId(workerenv-4312344789891641)"}

The method client.ensure_available(url="instance-pools/list", retries=100, delay_seconds=6) prevents this error by attempting to connect to the provided URL and retries as long as the workspace is in provisioning state, or until the given number of retries has elapsed.

Usage with Azure Active Directory

Note: Azure AD authentication for Databricks is currently in preview.

The client generates short-lived Azure AD tokens. If you need to use your client for longer than the lifetime (typically 30 minutes), rerun client.auth_azuread periodically.

Azure AD authentication with Azure CLI

Install the Azure CLI.

pip install databricks-client[azurecli]
az login
import databricks_client

client = databricks_client.create("https://northeurope.azuredatabricks.net/api/2.0")
client.auth_azuread("/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/my-rg/providers/Microsoft.Databricks/workspaces/my-workspace")
# or client.auth_azuread(resource_group="my-rg", workspace_name="my-workspace")
client.ensure_available()
clusters_list = client.get('clusters/list')
for cluster in clusters_list["clusters"]:
    print(cluster)

This is recommended with Azure DevOps Pipelines using the Azure CLI task.

Azure AD authentication with ADAL

pip install databricks-client
pip install adal
import databricks_client
import adal

authority_host_uri = 'https://login.microsoftonline.com'
authority_uri = authority_host_uri + '/' + tenant_id
context = adal.AuthenticationContext(authority_uri)

def token_callback(resource):
    return context.acquire_token_with_client_credentials(resource, client_id, client_secret)["accessToken"]

client = databricks_client.create("https://northeurope.azuredatabricks.net/api/2.0")
client.auth_azuread("/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/my-rg/providers/Microsoft.Databricks/workspaces/my-workspace", token_callback)
# or client.auth_azuread(resource_group="my-rg", workspace_name="my-workspace", token_callback=token_callback)
client.ensure_available()
clusters_list = client.get('clusters/list')
for cluster in clusters_list["clusters"]:
    print(cluster)

Example usages

Generating a PAT token

response = client.post(
    'token/create',
    json={"lifetime_seconds": 60, "comment": "Unit Test Token"}
)
pat_token = response['token_value']

Uploading a notebook

import base64

with open(notebook_file, "rb") as f:
    file_content = f.read()

client.post(
    'workspace/import',
    json={
        "content": base64.b64encode(file_content).decode('ascii'),
        "path": notebook_path,
        "overwrite": False,
        "language": "PYTHON",
        "format": "SOURCE"
    }
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_client-0.0.3.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

databricks_client-0.0.3-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file databricks_client-0.0.3.tar.gz.

File metadata

  • Download URL: databricks_client-0.0.3.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for databricks_client-0.0.3.tar.gz
Algorithm Hash digest
SHA256 1cb4600ec562a78e4c4e601931d4e2a3722eff6a972a825e6016d063edce25cf
MD5 ef38fb080f27ae16e73da7f7098f4b4c
BLAKE2b-256 d32dce9b221b49889d17ecac400ac54b7e434abec6176e42fecc7cde637ed5d0

See more details on using hashes here.

File details

Details for the file databricks_client-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: databricks_client-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for databricks_client-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 86262d9853644c282a7175e2a97597728838ba7760f77cc58882c23d8f03c4a4
MD5 995e425131348edf43ba81d6146b9ec8
BLAKE2b-256 453f5225615040e9d632e2ae9e355286b73f4125411e55300d0b00a70bd0a630

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page