A DBAPI 2.0 interface and SQLAlchemy dialect for Databricks interactive clusters.
Project description
A thin wrapper around pyhive for creating a DBAPI connection to an interactive Databricks cluster.
Also provides a SQLAlchemy Dialect for Databricks interactive clusters.
Installation
Install using pip:
pip install databricks-dbapi
For SQLAlchemy support install with:
pip install databricks-dbapi[sqlalchemy]
Usage
The connect() function returns a pyhive Hive connection object, which internally wraps a thrift connection.
Using a Databricks API token (recommended):
import os
from databricks_dbapi import databricks
token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]
connection = databricks.connect(
host=host,
cluster=cluster,
token=token,
)
cursor = connection.cursor()
cursor.execute("SELECT * FROM some_table LIMIT 100")
print(cursor.fetchone())
print(cursor.fetchall())
Using your username and password (not recommended):
import os
from databricks_dbapi import databricks
user = os.environ["DATABRICKS_USER"]
password = os.environ["DATABRICKS_PASSWORD"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]
connection = databricks.connect(
host=host,
cluster=cluster,
user=user,
password=password
)
cursor = connection.cursor()
cursor.execute("SELECT * FROM some_table LIMIT 100")
print(cursor.fetchone())
print(cursor.fetchall())
Connecting on Azure platform, or with http_path:
import os
from databricks_dbapi import databricks
token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
http_path = os.environ["DATABRICKS_HTTP_PATH"]
connection = databricks.connect(
host=host,
http_path=http_path,
token=token,
)
cursor = connection.cursor()
cursor.execute("SELECT * FROM some_table LIMIT 100")
print(cursor.fetchone())
print(cursor.fetchall())
The pyhive connection also provides async functionality:
import os
from databricks_dbapi import databricks
from TCLIService.ttypes import TOperationState
token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]
connection = databricks.connect(
host=host,
cluster=cluster,
token=token,
)
cursor = connection.cursor()
cursor.execute("SELECT * FROM some_table LIMIT 100", async_=True)
status = cursor.poll().operationState
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
logs = cursor.fetch_logs()
for message in logs:
print(message)
# If needed, an asynchronous query can be cancelled at any time with:
# cursor.cancel()
status = cursor.poll().operationState
print(cursor.fetchall())
SQLAlchemy
Once the databricks-dbapi package is installed, the databricks+pyhive dialect/driver will be registered to SQLAlchemy. Fill in the required information when passing the engine URL.
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
# Standard Databricks with user + password
# provide user, password, company name for url, database name, cluster name
engine = create_engine(
"databricks+pyhive://<user>:<password>@<companyname>.cloud.databricks.com:443/<database>",
connect_args={"cluster": "<cluster>"}
)
# Standard Databricks with token
# provide token, company name for url, database name, cluster name
engine = create_engine(
"databricks+pyhive://token:<databricks_token>@<companyname>.cloud.databricks.com:443/<database>",
connect_args={"cluster": "<cluster>"}
)
# Azure Databricks with user + password
# provide user, password, region for url, database name, http_path (with cluster name)
engine = create_engine(
"databricks+pyhive://<user>:<password>@<region>.azuredatabricks.net:443/<database>",
connect_args={"http_path": "<azure_databricks_http_path>"}
)
# Azure Databricks with token
# provide token, region for url, database name, http_path (with cluster name)
engine = create_engine(
"databricks+pyhive://token:<databrickstoken>@<region>.azuredatabricks.net:443/<database>",
connect_args={"http_path": "<azure_databricks_http_path>"}
)
logs = Table("my_table", MetaData(bind=engine), autoload=True)
print(select([func.count("*")], from_obj=logs).scalar())
Refer to the following documentation for more details on hostname, cluster name, and http path:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databricks_dbapi-0.4.0.tar.gz.
File metadata
- Download URL: databricks_dbapi-0.4.0.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.4 CPython/3.7.4 Darwin/20.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f952ea3722944217ea9632fbf56a95cc6d1ae4bab3131b190df9257271973c1e
|
|
| MD5 |
e803b5809b7bea05e2d0ce9a17db399c
|
|
| BLAKE2b-256 |
a79e22e1676919aeaa6167f534fdd98faecf18939f9531fd2b0ce54d5e7d1b4a
|
File details
Details for the file databricks_dbapi-0.4.0-py2.py3-none-any.whl.
File metadata
- Download URL: databricks_dbapi-0.4.0-py2.py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.4 CPython/3.7.4 Darwin/20.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85e26e4b18b5f6f442f0b76ba5510c661e1e025a8dc9330bff53554b55dd452d
|
|
| MD5 |
05692c06a4c42a9598cb0454fb11ffff
|
|
| BLAKE2b-256 |
3b89ac61843d4eafb83da666357bc30b9374a2a9d66ede25bc9a42026ab456e6
|