Databricks DBAPI.
Project description
A thin wrapper around pyhive for creating a DBAPI connection to an interactive Databricks cluster.
Installation
Install using pip install databricks-dbapi
Usage
The connect() function returns a pyhive Hive connection object, which internally wraps a thrift connection.
Using a Databricks API token (recommended):
import os
from databricks_dbapi import databricks
token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]
connection = databricks.connect(
host=host,
cluster=cluster,
token=token,
)
cursor = connection.cursor()
cursor.execute("SELECT * FROM some_table LIMIT 100")
print(cursor.fetchone())
print(cursor.fetchall())
Using your username and password (not recommended):
import os
from databricks_dbapi import databricks
user = os.environ["DATABRICKS_USER"]
password = os.environ["DATABRICKS_PASSWORD"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]
connection = databricks.connect(
host=host,
cluster=cluster,
user=user,
password=password
)
cursor = connection.cursor()
cursor.execute("SELECT * FROM some_table LIMIT 100")
print(cursor.fetchone())
print(cursor.fetchall())
Connecting on Azure platform, or with http_path:
import os
from databricks_dbapi import databricks
token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
http_path = os.environ["DATABRICKS_HTTP_PATH"]
connection = databricks.connect(
host=host,
http_path=http_path,
token=token,
)
cursor = connection.cursor()
cursor.execute("SELECT * FROM some_table LIMIT 100")
print(cursor.fetchone())
print(cursor.fetchall())
The pyhive connection also provides async functionality:
import os
from databricks_dbapi import databricks
from TCLIService.ttypes import TOperationState
token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]
connection = databricks.connect(
host=host,
cluster=cluster,
token=token,
)
cursor = connection.cursor()
cursor.execute("SELECT * FROM some_table LIMIT 100", async_=True)
status = cursor.poll().operationState
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
logs = cursor.fetch_logs()
for message in logs:
print(message)
# If needed, an asynchronous query can be cancelled at any time with:
# cursor.cancel()
status = cursor.poll().operationState
print(cursor.fetchall())
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
databricks_dbapi-0.2.0.tar.gz
(4.2 kB
view hashes)
Built Distribution
Close
Hashes for databricks_dbapi-0.2.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59ab8930f308cf13e7f5645b7deeb432f8ed2cd374127a86497defc2c49714f5 |
|
MD5 | c618a052cdbb074204256f5fc2a8cc21 |
|
BLAKE2b-256 | 756638cccdc14866d22ad4d25207e85d99feb17e20cbe500ba1245d5f1c0ba6d |