Skip to main content

A DBAPI 2.0 interface and SQLAlchemy dialect for Databricks interactive clusters.

Project description

pypi pyversions

A thin wrapper around pyhive for creating a DBAPI connection to an interactive Databricks cluster.

Also provides a SQLAlchemy Dialect for Databricks interactive clusters.

Installation

Install using pip:

pip install databricks-dbapi

For SQLAlchemy support install with:

pip install databricks-dbapi[sqlalchemy]

Usage

The connect() function returns a pyhive Hive connection object, which internally wraps a thrift connection.

Using a Databricks API token (recommended):

import os

from databricks_dbapi import databricks


token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]


connection = databricks.connect(
    host=host,
    cluster=cluster,
    token=token,
)
cursor = connection.cursor()

cursor.execute("SELECT * FROM some_table LIMIT 100")

print(cursor.fetchone())
print(cursor.fetchall())

Using your username and password (not recommended):

import os

from databricks_dbapi import databricks


user = os.environ["DATABRICKS_USER"]
password = os.environ["DATABRICKS_PASSWORD"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]


connection = databricks.connect(
    host=host,
    cluster=cluster,
    user=user,
    password=password
)
cursor = connection.cursor()

cursor.execute("SELECT * FROM some_table LIMIT 100")

print(cursor.fetchone())
print(cursor.fetchall())

Connecting on Azure platform, or with http_path:

import os

from databricks_dbapi import databricks


token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
http_path = os.environ["DATABRICKS_HTTP_PATH"]


connection = databricks.connect(
    host=host,
    http_path=http_path,
    token=token,
)
cursor = connection.cursor()

cursor.execute("SELECT * FROM some_table LIMIT 100")

print(cursor.fetchone())
print(cursor.fetchall())

The pyhive connection also provides async functionality:

import os

from databricks_dbapi import databricks
from TCLIService.ttypes import TOperationState


token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]


connection = databricks.connect(
    host=host,
    cluster=cluster,
    token=token,
)
cursor = connection.cursor()

cursor.execute("SELECT * FROM some_table LIMIT 100", async_=True)

status = cursor.poll().operationState
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
    logs = cursor.fetch_logs()
    for message in logs:
        print(message)

    # If needed, an asynchronous query can be cancelled at any time with:
    # cursor.cancel()

    status = cursor.poll().operationState

print(cursor.fetchall())

SQLAlchemy

Once the databricks-dbapi package is installed, the databricks+pyhive dialect/driver will be registered to SQLAlchemy. Fill in the required information when passing the engine URL.

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *


# Standard Databricks with user + password
# provide user, password, company name for url, database name, cluster name
engine = create_engine(
    "databricks+pyhive://<user>:<password>@<companyname>.cloud.databricks.com:443/<database>",
    connect_args={"cluster": "<cluster>"}
)

# Standard Databricks with token
# provide token, company name for url, database name, cluster name
engine = create_engine(
    "databricks+pyhive://token:<databricks_token>@<companyname>.cloud.databricks.com:443/<database>",
    connect_args={"cluster": "<cluster>"}
)

# Azure Databricks with user + password
# provide user, password, region for url, database name, http_path (with cluster name)
engine = create_engine(
    "databricks+pyhive://<user>:<password>@<region>.azuredatabricks.net:443/<database>",
    connect_args={"http_path": "<azure_databricks_http_path>"}
)

# Azure Databricks with token
# provide token, region for url, database name, http_path (with cluster name)
engine = create_engine(
    "databricks+pyhive://token:<databrickstoken>@<region>.azuredatabricks.net:443/<database>",
    connect_args={"http_path": "<azure_databricks_http_path>"}
)


logs = Table("my_table", MetaData(bind=engine), autoload=True)
print(select([func.count("*")], from_obj=logs).scalar())

Refer to the following documentation for more details on hostname, cluster name, and http path:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_dbapi-0.4.0.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databricks_dbapi-0.4.0-py2.py3-none-any.whl (7.5 kB view details)

Uploaded Python 2Python 3

File details

Details for the file databricks_dbapi-0.4.0.tar.gz.

File metadata

  • Download URL: databricks_dbapi-0.4.0.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.7.4 Darwin/20.1.0

File hashes

Hashes for databricks_dbapi-0.4.0.tar.gz
Algorithm Hash digest
SHA256 f952ea3722944217ea9632fbf56a95cc6d1ae4bab3131b190df9257271973c1e
MD5 e803b5809b7bea05e2d0ce9a17db399c
BLAKE2b-256 a79e22e1676919aeaa6167f534fdd98faecf18939f9531fd2b0ce54d5e7d1b4a

See more details on using hashes here.

File details

Details for the file databricks_dbapi-0.4.0-py2.py3-none-any.whl.

File metadata

  • Download URL: databricks_dbapi-0.4.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.7.4 Darwin/20.1.0

File hashes

Hashes for databricks_dbapi-0.4.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 85e26e4b18b5f6f442f0b76ba5510c661e1e025a8dc9330bff53554b55dd452d
MD5 05692c06a4c42a9598cb0454fb11ffff
BLAKE2b-256 3b89ac61843d4eafb83da666357bc30b9374a2a9d66ede25bc9a42026ab456e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page