A Harlequin adapter for Databricks.
Project description
harlequin-databricks
A Harlequin adapter for Databricks. Supports connecting to Databricks SQL warehouses or Databricks Runtime (DBR) interactive clusters.
Installation
harlequin-databricks
depends on harlequin
, so installing this package will also install Harlequin.
Using pip
To install this adapter into an activated virtual environment:
pip install harlequin-databricks
Using poetry
poetry add harlequin-databricks
Using pipx
If you do not already have Harlequin installed:
pipx install harlequin-databricks
If you would like to add the Databricks adapter to an existing Harlequin installation:
pipx inject harlequin harlequin-databricks
As an Extra
Alternatively, you can install Harlequin with the databricks
extra:
pip install harlequin[databricks]
poetry add harlequin[databricks]
pipx install harlequin[databricks]
Usage and Configuration
For a minimum connection you are going to need:
- server-hostname
- http-path
- access-token
harlequin -a databricks --server-hostname my_databricks.cloud.databricks.com --http-path /sql/1.0/endpoints/1234567890abcdef --access-token dabpi***
Authentication is also possible using a username and password (known as basic authentication):
harlequin -a databricks --server-hostname my_databricks.cloud.databricks.com --http-path /sql/1.0/endpoints/1234567890abcdef --username my_user --password my_pass
Or by using OAuth user-to-machine (U2M) authentication:
harlequin -a databricks --server-hostname my_databricks.cloud.databricks.com --http-path /sql/1.0/endpoints/1234567890abcdef --auth-type databricks-oauth
For more details on command line options, run:
harlequin --help
For more information, see the harlequin-databricks Docs.
Using Unity Catalog and want fast Data Catalog indexing?
Supply the --skip-legacy-indexing
command line flag if you do not care about legacy metastores
(e.g. hive_metastore
) being indexed in Harlequin's Data Catalog pane.
This flag will skip indexing of old non-Unity Catalog metastores (i.e. they won't appear in the Data Catalog pane with this flag).
Because of the way legacy Databricks metastores works, a separate SQL query is required to fetch the metadata of each table in a legacy metastore. This means indexing them for Harlequin's Data Catalog pane is slow.
Databricks's Unity Catalog upgrade brought Information Schema, which allows harlequin-databricks to fetch metadata for all Unity Catalog assets with only two SQL queries.
So if your Databricks instance is running Unity Catalog, and you no longer care about the legacy
metastores, setting the --skip-legacy-indexing
CLI flag is recommended as it will mean
much faster indexing & refreshing of the assets in the Data Catalog pane.
Issues, Contributions and Feature Requests
Please report bugs/issues with this adapter via the GitHub issues page. You are welcome to attempt fixes yourself by forking this repo then opening an PR.
For feature suggestions, please post in the discussions.
Special thanks to...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for harlequin_databricks-0.2.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | a00f4c2ba23ae1674ab5b26eb33554e13dbddacce42f7c57993d912d5b912bbd |
|
MD5 | b11a3585314ab6924972c7bc6f1b6e82 |
|
BLAKE2b-256 | 64d1b07f5ec1efe0fae9fa72ad4def7fca5508864aa812859fb5a8840f8867e3 |
Hashes for harlequin_databricks-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | accf59f2eafb3407c9137708b49453cc12ce8d0d03dd0dee754392c8b988f625 |
|
MD5 | 94913a851423bdb3d7dd9809e9b1ab83 |
|
BLAKE2b-256 | 63db4cdc9d0fb77b5ec0f143d08a2157c20b57559c92dc5a640dfeb2d2df1b3f |