Skip to main content

A DBAPI 2.0 interface and SQLAlchemy dialect for Databricks interactive clusters.

Project description

pypi pyversions

A thin wrapper around pyhive and pyodbc for creating a DBAPI connection to Databricks Workspace and SQL Analytics clusters. SQL Analytics clusters require the Simba ODBC driver.

Also provides SQLAlchemy Dialects using pyhive and pyodbc for Databricks clusters. Databricks SQL Analytics clusters only support the pyodbc-driven dialect.

Installation

Install using pip:

pip install databricks-dbapi

For SQLAlchemy support install with:

pip install databricks-dbapi[sqlalchemy]

Usage

PyHive

The connect() function returns a pyhive Hive connection object, which internally wraps a thrift connection.

Connecting with http_path, host, and a token:

import os

from databricks_dbapi import hive


token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
http_path = os.environ["DATABRICKS_HTTP_PATH"]


connection = hive.connect(
    host=host,
    http_path=http_path,
    token=token,
)
cursor = connection.cursor()

cursor.execute("SELECT * FROM some_table LIMIT 100")

print(cursor.fetchone())
print(cursor.fetchall())

The pyhive connection also provides async functionality:

import os

from databricks_dbapi import hive
from TCLIService.ttypes import TOperationState


token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]


connection = hive.connect(
    host=host,
    cluster=cluster,
    token=token,
)
cursor = connection.cursor()

cursor.execute("SELECT * FROM some_table LIMIT 100", async_=True)

status = cursor.poll().operationState
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
    logs = cursor.fetch_logs()
    for message in logs:
        print(message)

    # If needed, an asynchronous query can be cancelled at any time with:
    # cursor.cancel()

    status = cursor.poll().operationState

print(cursor.fetchall())

ODBC

The ODBC DBAPI requires the Simba ODBC driver.

Connecting with http_path, host, and a token:

import os

from databricks_dbapi import odbc


token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
http_path = os.environ["DATABRICKS_HTTP_PATH"]


connection = odbc.connect(
    host=host,
    http_path=http_path,
    token=token,
    driver_path="/path/to/simba/driver",
)
cursor = connection.cursor()

cursor.execute("SELECT * FROM some_table LIMIT 100")

print(cursor.fetchone())
print(cursor.fetchall())

SQLAlchemy Dialects

databricks+pyhive

Installing registers the databricks+pyhive dialect/driver with SQLAlchemy. Fill in the required information when passing the engine URL.

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *


engine = create_engine(
    "databricks+pyhive://token:<databricks_token>@<host>:<port>/<database>",
    connect_args={"http_path": "<cluster_http_path>"}
)

logs = Table("my_table", MetaData(bind=engine), autoload=True)
print(select([func.count("*")], from_obj=logs).scalar())

databricks+pyodbc

Installing registers the databricks+pyodbc dialect/driver with SQLAlchemy. Fill in the required information when passing the engine URL.

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *


engine = create_engine(
    "databricks+pyodbc://token:<databricks_token>@<host>:<port>/<database>",
    connect_args={"http_path": "<cluster_http_path>", "driver_path": "/path/to/simba/driver"}
)

logs = Table("my_table", MetaData(bind=engine), autoload=True)
print(select([func.count("*")], from_obj=logs).scalar())

Refer to the following documentation for more details on hostname, cluster name, and http path:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_dbapi-0.5.0.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

databricks_dbapi-0.5.0-py2.py3-none-any.whl (9.7 kB view details)

Uploaded Python 2Python 3

File details

Details for the file databricks_dbapi-0.5.0.tar.gz.

File metadata

  • Download URL: databricks_dbapi-0.5.0.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.7.4 Darwin/20.1.0

File hashes

Hashes for databricks_dbapi-0.5.0.tar.gz
Algorithm Hash digest
SHA256 f3da439d3fc2b011cd4ecd7c7113984da125c58de28bed34498fbddef622b9ce
MD5 21f1e6192438f28d64fac372ca07e1e8
BLAKE2b-256 b2d128fc21b0c3ba3a3c159fbb2afb039effe402abd1e8fb5650005bcc14d29a

See more details on using hashes here.

File details

Details for the file databricks_dbapi-0.5.0-py2.py3-none-any.whl.

File metadata

  • Download URL: databricks_dbapi-0.5.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.7.4 Darwin/20.1.0

File hashes

Hashes for databricks_dbapi-0.5.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2224b57ea3bda01d792d3eb0ac75b6fc353e10f6b098045231ac9ce762d91e73
MD5 ee380644caeffb92170f7bad124e5ea7
BLAKE2b-256 b2625129cc7e670862cb6f2d70e6f8ba087b9f00432cdd5b88f4522966934b57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page