A DBAPI 2.0 interface and SQLAlchemy dialect for Databricks interactive clusters.
Project description
A thin wrapper around pyhive and pyodbc for creating a DBAPI connection to Databricks Workspace and SQL Analytics clusters. SQL Analytics clusters require the Simba ODBC driver.
Also provides SQLAlchemy Dialects using pyhive and pyodbc for Databricks clusters. Databricks SQL Analytics clusters only support the pyodbc-driven dialect.
Installation
Install using pip. You must specify at least one of the extras {hive or odbc}. For odbc the Simba driver is required:
pip install databricks-dbapi[hive,odbc]
For SQLAlchemy support install with:
pip install databricks-dbapi[hive,odbc,sqlalchemy]
Usage
PyHive
The connect() function returns a pyhive Hive connection object, which internally wraps a thrift connection.
Connecting with http_path, host, and a token:
import os
from databricks_dbapi import hive
token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
http_path = os.environ["DATABRICKS_HTTP_PATH"]
connection = hive.connect(
host=host,
http_path=http_path,
token=token,
)
cursor = connection.cursor()
cursor.execute("SELECT * FROM some_table LIMIT 100")
print(cursor.fetchone())
print(cursor.fetchall())
The pyhive connection also provides async functionality:
import os
from databricks_dbapi import hive
from TCLIService.ttypes import TOperationState
token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]
connection = hive.connect(
host=host,
cluster=cluster,
token=token,
)
cursor = connection.cursor()
cursor.execute("SELECT * FROM some_table LIMIT 100", async_=True)
status = cursor.poll().operationState
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
logs = cursor.fetch_logs()
for message in logs:
print(message)
# If needed, an asynchronous query can be cancelled at any time with:
# cursor.cancel()
status = cursor.poll().operationState
print(cursor.fetchall())
ODBC
The ODBC DBAPI requires the Simba ODBC driver.
Connecting with http_path, host, and a token:
import os
from databricks_dbapi import odbc
token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
http_path = os.environ["DATABRICKS_HTTP_PATH"]
connection = odbc.connect(
host=host,
http_path=http_path,
token=token,
driver_path="/path/to/simba/driver",
)
cursor = connection.cursor()
cursor.execute("SELECT * FROM some_table LIMIT 100")
print(cursor.fetchone())
print(cursor.fetchall())
SQLAlchemy Dialects
databricks+pyhive
Installing registers the databricks+pyhive dialect/driver with SQLAlchemy. Fill in the required information when passing the engine URL.
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
engine = create_engine(
"databricks+pyhive://token:<databricks_token>@<host>:<port>/<database>",
connect_args={"http_path": "<cluster_http_path>"}
)
logs = Table("my_table", MetaData(bind=engine), autoload=True)
print(select([func.count("*")], from_obj=logs).scalar())
databricks+pyodbc
Installing registers the databricks+pyodbc dialect/driver with SQLAlchemy. Fill in the required information when passing the engine URL.
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
engine = create_engine(
"databricks+pyodbc://token:<databricks_token>@<host>:<port>/<database>",
connect_args={"http_path": "<cluster_http_path>", "driver_path": "/path/to/simba/driver"}
)
logs = Table("my_table", MetaData(bind=engine), autoload=True)
print(select([func.count("*")], from_obj=logs).scalar())
Refer to the following documentation for more details on hostname, cluster name, and http path:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databricks_dbapi-0.6.0.tar.gz.
File metadata
- Download URL: databricks_dbapi-0.6.0.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.6 CPython/3.7.4 Darwin/20.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8789f375bc866f04c82a720e28f125f895cdc86c7d146eb4b05cec807b1313d1
|
|
| MD5 |
6371551200805e24e3cd2970295880c1
|
|
| BLAKE2b-256 |
f71bd2ce1c8f8c83cd7f9ebb4f98d9bccdc080edf4325686b3e29582f6f4eb91
|
File details
Details for the file databricks_dbapi-0.6.0-py2.py3-none-any.whl.
File metadata
- Download URL: databricks_dbapi-0.6.0-py2.py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.6 CPython/3.7.4 Darwin/20.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0bf0dc008c58aa0b60fa10f23b448388e4eebb54128e8a70c842fa7e8338053
|
|
| MD5 |
0fb435f6ae09f6a58fa9d21492254226
|
|
| BLAKE2b-256 |
c040362b5058f657bbeff0b4b991105023e4c4a6285a3c57f9cd647673f15b4a
|