Skip to main content

A Harlequin adapter for Databricks.

Project description

harlequin-databricks

PyPI Conda Python Version Code Quality Checks Ruff License: MIT Downloads

A Harlequin adapter for Databricks. Supports connecting to Databricks SQL warehouses or Databricks Runtime (DBR) interactive clusters.

harlequin-databricks

Installation

harlequin-databricks depends on harlequin, so installing this package using any of the methods below will also install harlequin.

Using uv

The recommended way to install harlequin-databricks is using uv:

uv tool install harlequin --with harlequin-databricks

This command will install harlequin with the databricks adapter into an isolated environment and add it to your PATH so you can easily run the executable.

Alternative installation methods

Alternatively, if you know what you're doing, after installing Python 3.10 or above, install harlequin-databricks using pip, pipx, poetry, or any other program that can install Python packages from PyPI. For example:

pip install harlequin-databricks

Connecting to Databricks

To connect to Databricks you are going to need to provide as CLI arguments:

  • server-hostname
  • http-path
  • credentials for one of the following authentication methods:
    • a personal access token (PAT)
    • a username and password
    • an OAuth U2M type
    • a service principle client ID and secret for OAuth M2M

Personal Access Token (PAT) authentication:

harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --access-token dabpi***

Username and password (basic) authentication:

harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --username *** --password ***

OAuth U2M authentication:

For OAuth user-to-machine (U2M) authentication supply either databricks-oauth or azure-oauth to the --auth-type CLI argument:

harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --auth-type databricks-oauth

OAuth M2M authentication:

For OAuth machine-to-machine (M2M) authentication you need to pip install databricks-sdk as an additional dependency (databricks-sdk is an optional dependency of harlequin-databricks) and supply --client-id and --client-secret CLI arguments:

harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --client-id *** --client-secret ***

Store an alias for your connection string

We recommend you include an alias for your connection string in your .bash_profile/.zprofile so you can launch harlequin-databricks with a short command like hdb each time.

Run this command (once) to create the alias:

echo 'alias hdb="harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --access-token dabpi***"' >> .bash_profile    

Using Unity Catalog and want fast Data Catalog indexing?

Supply the --skip-legacy-indexing command line flag if you do not care about legacy metastores (e.g. hive_metastore) being indexed in Harlequin's Data Catalog pane.

This flag will skip indexing of old non-Unity Catalog metastores (i.e. they won't appear in the Data Catalog pane with this flag).

Because of the way legacy Databricks metastores works, a separate SQL query is required to fetch the metadata of each table in a legacy metastore. This means indexing them for Harlequin's Data Catalog pane is slow.

Databricks's Unity Catalog upgrade brought Information Schema, which allows harlequin-databricks to fetch metadata for all Unity Catalog assets with only two SQL queries.

So if your Databricks instance is running Unity Catalog, and you no longer care about the legacy metastores, setting the --skip-legacy-indexing CLI flag is recommended as it will mean much faster indexing & refreshing of the assets in the Data Catalog pane.

Initialization Scripts

Each time you start Harlequin, it will execute SQL commands from a Databricks initialization script. For example:

USE CATALOG my_catalog;
SET TIME ZONE 'Asia/Tokyo';
DECLARE yesterday DATE DEFAULT CURRENT_DATE - INTERVAL '1' DAY;

Multi-line SQL is allowed, but must be terminated by a semicolon.

Configuring the Script Location

By default, Harlequin will execute the script found at ~/.databricksrc. However, you can provide a different path using the --init-path option (aliased to -i or -init):

harlequin -a databricks --init-path /path/to/my/script.sql

Disabling Initialization

If you would like to open Harlequin without running the script you have at ~/.databricksrc, you can either pass a nonexistent path (or /dev/null) to the option above, or start Harlequin with the --no-init option:

harlequin -a databricks --no-init

Other CLI options:

For more details on other command line options, run:

harlequin --help

For more information, see the harlequin-databricks Docs.

Issues, Contributions and Feature Requests

Please report bugs/issues with this adapter via the GitHub issues page. You are welcome to attempt fixes yourself by forking this repo then opening a PR.

For feature suggestions, please post in the discussions.

Special thanks to...

Ted Conbeer, Josh Temple & Tyler Hillery.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harlequin_databricks-0.6.3.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harlequin_databricks-0.6.3-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file harlequin_databricks-0.6.3.tar.gz.

File metadata

  • Download URL: harlequin_databricks-0.6.3.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harlequin_databricks-0.6.3.tar.gz
Algorithm Hash digest
SHA256 43d0041cc7b1c6927def75fcb735be89ab568d95fcc16f43cd4118315dcbedcf
MD5 80e11bda541925359642926e143ee0fd
BLAKE2b-256 fd671e1d0098129d6b2a569afd0a8cc4c2a30d33e4e244bb72c2a30b829ec049

See more details on using hashes here.

File details

Details for the file harlequin_databricks-0.6.3-py3-none-any.whl.

File metadata

  • Download URL: harlequin_databricks-0.6.3-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harlequin_databricks-0.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7d6ef38fc37ac20fc31fee8aa99f70269f096454f1d014815da011be7a5cfdf0
MD5 743b257a7b15a818218fb3243d7d0fd2
BLAKE2b-256 2f2e5f42400091f8e3e1cc720db7b1d2b2fa19725454b5529e2da57c22975cc7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page