databricks-dbapi

A DBAPI 2.0 interface and SQLAlchemy dialect for Databricks interactive clusters.

These details have not been verified by PyPI

Project links

Project description

A thin wrapper around pyhive for creating a DBAPI connection to an interactive Databricks cluster.

Also provides a SQLAlchemy Dialect for Databricks interactive clusters.

Installation

Install using pip:

pip install databricks-dbapi

For SQLAlchemy support install with:

pip install databricks-dbapi[sqlalchemy]

Usage

The connect() function returns a pyhive Hive connection object, which internally wraps a thrift connection.

Using a Databricks API token (recommended):

import os

from databricks_dbapi import databricks


token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]


connection = databricks.connect(
    host=host,
    cluster=cluster,
    token=token,
)
cursor = connection.cursor()

cursor.execute("SELECT * FROM some_table LIMIT 100")

print(cursor.fetchone())
print(cursor.fetchall())

Using your username and password (not recommended):

import os

from databricks_dbapi import databricks


user = os.environ["DATABRICKS_USER"]
password = os.environ["DATABRICKS_PASSWORD"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]


connection = databricks.connect(
    host=host,
    cluster=cluster,
    user=user,
    password=password
)
cursor = connection.cursor()

cursor.execute("SELECT * FROM some_table LIMIT 100")

print(cursor.fetchone())
print(cursor.fetchall())

Connecting on Azure platform, or with http_path:

import os

from databricks_dbapi import databricks


token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
http_path = os.environ["DATABRICKS_HTTP_PATH"]


connection = databricks.connect(
    host=host,
    http_path=http_path,
    token=token,
)
cursor = connection.cursor()

cursor.execute("SELECT * FROM some_table LIMIT 100")

print(cursor.fetchone())
print(cursor.fetchall())

The pyhive connection also provides async functionality:

import os

from databricks_dbapi import databricks
from TCLIService.ttypes import TOperationState


token = os.environ["DATABRICKS_TOKEN"]
host = os.environ["DATABRICKS_HOST"]
cluster = os.environ["DATABRICKS_CLUSTER"]


connection = databricks.connect(
    host=host,
    cluster=cluster,
    token=token,
)
cursor = connection.cursor()

cursor.execute("SELECT * FROM some_table LIMIT 100", async_=True)

status = cursor.poll().operationState
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
    logs = cursor.fetch_logs()
    for message in logs:
        print(message)

    # If needed, an asynchronous query can be cancelled at any time with:
    # cursor.cancel()

    status = cursor.poll().operationState

print(cursor.fetchall())

SQLAlchemy

Once the databricks-dbapi package is installed, the databricks+pyhive dialect/driver will be registered to SQLAlchemy. Fill in the required information when passing the engine URL.

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *


# Standard Databricks with user + password
# provide user, password, company name for url, database name, cluster name
engine = create_engine(
    "databricks+pyhive://<user>:<password>@<companyname>.cloud.databricks.com:443/<database>",
    connect_args={"cluster": "<cluster>"}
)

# Standard Databricks with token
# provide token, company name for url, database name, cluster name
engine = create_engine(
    "databricks+pyhive://token:<databricks_token>@<companyname>.cloud.databricks.com:443/<database>",
    connect_args={"cluster": "<cluster>"}
)

# Azure Databricks with user + password
# provide user, password, region for url, database name, http_path (with cluster name)
engine = create_engine(
    "databricks+pyhive://<user>:<password>@<region>.azuredatabricks.net:443/<database>",
    connect_args={"http_path": "<azure_databricks_http_path>"}
)

# Azure Databricks with token
# provide token, region for url, database name, http_path (with cluster name)
engine = create_engine(
    "databricks+pyhive://token:<databrickstoken>@<region>.azuredatabricks.net:443/<database>",
    connect_args={"http_path": "<azure_databricks_http_path>"}
)


logs = Table("my_table", MetaData(bind=engine), autoload=True)
print(select([func.count("*")], from_obj=logs).scalar())

Refer to the following documentation for more details on hostname, cluster name, and http path:

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.0

Nov 22, 2021

0.5.0

Feb 25, 2021

This version

0.4.0

Jan 9, 2021

0.3.0

Aug 14, 2019

0.2.0

Dec 13, 2018

0.1.1

Oct 11, 2018

0.1.0

Sep 11, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_dbapi-0.4.0.tar.gz (7.3 kB view hashes)

Uploaded Jan 9, 2021 Source

Built Distribution

databricks_dbapi-0.4.0-py2.py3-none-any.whl (7.5 kB view hashes)

Uploaded Jan 9, 2021 Python 2 Python 3

Hashes for databricks_dbapi-0.4.0.tar.gz

Hashes for databricks_dbapi-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`f952ea3722944217ea9632fbf56a95cc6d1ae4bab3131b190df9257271973c1e`
MD5	`e803b5809b7bea05e2d0ce9a17db399c`
BLAKE2b-256	`a79e22e1676919aeaa6167f534fdd98faecf18939f9531fd2b0ce54d5e7d1b4a`

Hashes for databricks_dbapi-0.4.0-py2.py3-none-any.whl

Hashes for databricks_dbapi-0.4.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`85e26e4b18b5f6f442f0b76ba5510c661e1e025a8dc9330bff53554b55dd452d`
MD5	`05692c06a4c42a9598cb0454fb11ffff`
BLAKE2b-256	`3b89ac61843d4eafb83da666357bc30b9374a2a9d66ede25bc9a42026ab456e6`

databricks-dbapi 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Usage

SQLAlchemy

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

databricks-dbapi 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Usage

SQLAlchemy

Related

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution