Utilities for building apps on Databricks: settings, authentication, SQL client, and query registry.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

databricks-app-utils

A lightweight Python library for building Streamlit apps on Databricks. It handles everything that sits below the business logic: reading configuration, authenticating with Databricks, executing SQL, and loading query files. Application code should depend on these abstractions rather than touching the Databricks connector directly.

License: GPL-3.0

Modules at a glance

Module	Class / function	Responsibility
`settings.py`	`AppSettings`	Reads all configuration from environment variables / `.env`
`auth.py`	`DatabricksAuth`, `build_auth()`	Translates settings into an auth value object
`databricks_client.py`	`DatabricksClient`	Executes SQL queries against a Databricks SQL Warehouse
`query_registry.py`	`QueryRegistry`, `SqlQuery`	Loads and caches `.sql` files from a Python package

Settings management

AppSettings is a Pydantic Settings model. It reads every value from environment variables and optionally from a .env file in the working directory. Unknown variables are silently ignored.

Environment variables

Variable	Required	Default	Description
`DATABRICKS_SERVER_HOSTNAME`	✅	—	`adb-xxx.azuredatabricks.net` (no `https://`)
`DATABRICKS_HTTP_PATH`	✅	—	`/sql/1.0/warehouses/…`
`DATABRICKS_AUTH_METHOD`		`obo`	`pat` \| `u2m` \| `obo`
`DATABRICKS_PAT`	✅ if `auth_method=pat`	—	Personal access token
`DATABRICKS_DEFAULT_CATALOG`		`None`	Applied as `USE CATALOG` before each query
`DATABRICKS_DEFAULT_SCHEMA`		`None`	Applied as `USE SCHEMA` before each query
`DATABRICKS_CONNECT_TIMEOUT_S`		`30`	Connection timeout in seconds
`DATABRICKS_RETRY_ATTEMPTS`		`1`	Extra attempts on transient failures
`DATABRICKS_RETRY_BACKOFF_S`		`0.5`	Initial backoff between retries (doubles each attempt)
`QUERY_TAG`		`streamlit-app`	Prepended as a SQL comment: `/* streamlit-app */`

`.env` file (recommended for local development)

Create a .env file in the project root (never commit it):

DATABRICKS_SERVER_HOSTNAME=adb-1234567890123456.7.azuredatabricks.net
DATABRICKS_HTTP_PATH=/sql/1.0/warehouses/abcdef1234567890
DATABRICKS_AUTH_METHOD=u2m
DATABRICKS_DEFAULT_CATALOG=my_catalog
DATABRICKS_DEFAULT_SCHEMA=my_schema

For PAT authentication, add:

DATABRICKS_AUTH_METHOD=pat
DATABRICKS_PAT=dapi0123456789abcdef

Usage

from databricks_app_utils.settings import AppSettings

settings = AppSettings()
print(settings.databricks_server_hostname)
print(settings.databricks_auth_method)   # AuthMethod.U2M

In a Streamlit app, wrap it with @st.cache_resource so settings are read only once per server process:

@st.cache_resource
def get_settings() -> AppSettings:
    return AppSettings()

Authentication

See docs/authentication.md for a full technical deep-dive. The summary is:

Method	`DATABRICKS_AUTH_METHOD`	Best for
PAT	`pat`	CI/CD, service accounts
U2M	`u2m`	Local development — browser OAuth, zero secrets
OBO	`obo`	Deployed Databricks Apps

Usage

build_auth() converts settings into a DatabricksAuth value object. You rarely need to call it directly — DatabricksClient takes one as a constructor argument.

from databricks_app_utils.settings import AppSettings
from databricks_app_utils.auth import build_auth

settings = AppSettings()
auth = build_auth(settings)

PAT

DATABRICKS_AUTH_METHOD=pat
DATABRICKS_PAT=dapi0123456789abcdef

auth = build_auth(settings)
# auth.method  == AuthMethod.PAT
# auth.access_token == "dapi…"

U2M (browser OAuth — recommended for local dev)

DATABRICKS_AUTH_METHOD=u2m

No secrets needed. On the first query, a browser window opens for the user to log in. Subsequent queries within the same server process reuse the cached token silently.

auth = build_auth(settings)
# auth.method         == AuthMethod.U2M
# auth.oauth_persistence  ← in-memory OAuthPersistenceCache, held for process lifetime

OBO (Databricks Apps)

DATABRICKS_AUTH_METHOD=obo

The token is read from the X-Forwarded-Access-Token request header on every query. The token provider must be injected at runtime from the Streamlit layer:

auth = DatabricksAuth(
    method=AuthMethod.OBO,
    token_provider=lambda: st.context.headers["X-Forwarded-Access-Token"],
)

Database interface

DatabricksClient is the single interface for all SQL execution. It opens a short-lived connection per query (robust against warehouse idle timeouts) and applies USE CATALOG / USE SCHEMA automatically when defaults are configured.

Query methods

Method	Returns	Use when
`query_polars(sql, params)`	`polars.DataFrame`	You need a DataFrame for display or transformation
`query_pandas(sql, params)`	`pandas.DataFrame`	Interoperability with pandas-based libraries
`query(sql, params)`	`list[dict]`	Lightweight lookups; no Arrow overhead
`merge_dataframe(df, table, id_cols)`	`None`	Upsert a DataFrame into a Delta table

Named parameters

Use :name syntax in SQL. Lists are automatically expanded for IN clauses:

db.query_polars(
    "SELECT * FROM orders WHERE status = :status AND region IN :regions",
    params={"status": "shipped", "regions": ["EU", "US"]},
)
# Executes: SELECT * FROM orders WHERE status = ? AND region IN (?, ?)

Polars query

from databricks_app_utils.databricks_client import DatabricksClient

df = db.query_polars("SELECT id, name FROM customers LIMIT :n", params={"n": 100})
# Returns a polars.DataFrame

Pandas query

df = db.query_pandas("SELECT id, name FROM customers LIMIT :n", params={"n": 100})
# Returns a pandas.DataFrame

Plain dict query

rows = db.query("SELECT state, COUNT(*) AS cnt FROM customers GROUP BY state")
# Returns [{"state": "CA", "cnt": 1234}, …]

Upsert (MERGE)

Merge a DataFrame into a Delta table using one or more identity columns:

import polars as pl

updates = pl.DataFrame({"id": [1, 2], "score": [9.5, 7.1]})

db.merge_dataframe(
    df=updates,
    target_table="customer_scores",
    id_columns=["id"],
)

Optionally, supply a version_column for optimistic locking — rows whose version has changed since the data was read are silently skipped:

db.merge_dataframe(
    df=updates,
    target_table="customer_scores",
    id_columns=["id"],
    version_column="updated_at",
)

Retry behaviour

DatabricksClient retries failed queries with exponential backoff. Configure via settings:

DATABRICKS_RETRY_ATTEMPTS=2      # 2 extra attempts (3 total)
DATABRICKS_RETRY_BACKOFF_S=1.0   # 1 s, then 2 s

Wiring it up in Streamlit

@st.cache_resource
def get_db() -> DatabricksClient:
    settings = get_settings()
    auth = build_auth(settings)
    return DatabricksClient(settings=settings, auth=auth)

Query registry

QueryRegistry loads .sql files from a Python package directory at runtime and caches them in memory. This keeps SQL out of Python source files and makes queries easy to find, review, and test independently.

File layout

SQL files live under a queries sub-package inside your app and are organised into sub-packages:

src/<your_app>/queries/
├── __init__.py
└── customers/
    ├── list_customers.sql
    ├── list_customers_by_state.sql
    └── list_states.sql

Loading a query

from databricks_app_utils.query_registry import QueryRegistry

registry = QueryRegistry(package="your_app.queries")
q = registry.get("customers/list_customers")

print(q.name)   # "customers/list_customers"
print(q.sql)    # "SELECT customerid, first_name …\n"

The registry is lazy — a file is read from disk only on first access, then cached for the lifetime of the instance.

Passing a query to `DatabricksClient`

q = registry.get("customers/list_customers_by_state")
df = db.query_polars(q.sql, params={"states": ["CA", "NY"], "limit": 200})

Wiring it up in Streamlit

@st.cache_resource
def get_queries() -> QueryRegistry:
    return QueryRegistry(package="your_app.queries")

Why GPL-3.0?

We believe in open source software and want to ensure that improvements to this library remain open and available to everyone. The GPL-3.0 license guarantees that all derivatives and modifications stay free and open source.

Made with ❤️ by the contributors

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

cstotzer

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.2

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_app_utils-0.2.2.tar.gz (64.7 kB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

databricks_app_utils-0.2.2-py3-none-any.whl (24.3 kB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file databricks_app_utils-0.2.2.tar.gz.

File metadata

Download URL: databricks_app_utils-0.2.2.tar.gz
Upload date: Mar 1, 2026
Size: 64.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databricks_app_utils-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`bceeaa194bb1eecf6a4453d2aa8294316b1e220a5c6a3268000a6f7aeb456f0e`
MD5	`61e82b9167b0c1b0534e75bc02358969`
BLAKE2b-256	`a36447a6494eed0429f40dbbd7dd708bf754480819721d27220d108a55a9512f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_app_utils-0.2.2.tar.gz:

Publisher: publish.yml on cstotzer/databricks-app-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: databricks_app_utils-0.2.2.tar.gz
- Subject digest: bceeaa194bb1eecf6a4453d2aa8294316b1e220a5c6a3268000a6f7aeb456f0e
- Sigstore transparency entry: 1005613908
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: cstotzer/databricks-app-utils@0b5e037a2052907f23d9675d6d5c4160d6f3eece
- Branch / Tag: refs/heads/main
- Owner: https://github.com/cstotzer
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0b5e037a2052907f23d9675d6d5c4160d6f3eece
- Trigger Event: workflow_dispatch

File details

Details for the file databricks_app_utils-0.2.2-py3-none-any.whl.

File metadata

Download URL: databricks_app_utils-0.2.2-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 24.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databricks_app_utils-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`56f119ab01f3b79e197ae98842ce6c3ff0ba0bb861bf5b0bedb6f782910b6815`
MD5	`c40f4468571d0ae0e5de584d28aeec19`
BLAKE2b-256	`8b5ecf36eb5fc8df59c25be504885dc6f117a2037658a6658a2ed4709a698920`

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_app_utils-0.2.2-py3-none-any.whl:

Publisher: publish.yml on cstotzer/databricks-app-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: databricks_app_utils-0.2.2-py3-none-any.whl
- Subject digest: 56f119ab01f3b79e197ae98842ce6c3ff0ba0bb861bf5b0bedb6f782910b6815
- Sigstore transparency entry: 1005613910
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: cstotzer/databricks-app-utils@0b5e037a2052907f23d9675d6d5c4160d6f3eece
- Branch / Tag: refs/heads/main
- Owner: https://github.com/cstotzer
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0b5e037a2052907f23d9675d6d5c4160d6f3eece
- Trigger Event: workflow_dispatch

databricks-app-utils 0.2.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

databricks-app-utils

Modules at a glance

Settings management

Environment variables

.env file (recommended for local development)

Usage

Authentication

Usage

PAT

U2M (browser OAuth — recommended for local dev)

OBO (Databricks Apps)

Database interface

Query methods

Named parameters

Polars query

Pandas query

Plain dict query

Upsert (MERGE)

Retry behaviour

Wiring it up in Streamlit

Query registry

File layout

Loading a query

Passing a query to DatabricksClient

Wiring it up in Streamlit

Why GPL-3.0?

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`.env` file (recommended for local development)

Passing a query to `DatabricksClient`