Utilities for building apps on Databricks: settings, authentication, SQL client, and query registry.
Project description
databricks-app-utils
A lightweight Python library for building Streamlit apps on Databricks. It handles everything that sits below the business logic: reading configuration, authenticating with Databricks, executing SQL, and loading query files. Application code should depend on these abstractions rather than touching the Databricks connector directly.
License: GPL-3.0
Modules at a glance
| Module | Class / function | Responsibility |
|---|---|---|
settings.py |
AppSettings |
Reads all configuration from environment variables / .env |
auth.py |
DatabricksAuth, build_auth() |
Translates settings into an auth value object |
databricks_client.py |
DatabricksClient |
Executes SQL queries against a Databricks SQL Warehouse |
query_registry.py |
QueryRegistry, SqlQuery |
Loads and caches .sql files from a Python package |
Settings management
AppSettings is a Pydantic Settings model. It reads every value from environment variables and optionally from a .env file in the working directory. Unknown variables are silently ignored.
Environment variables
| Variable | Required | Default | Description |
|---|---|---|---|
DATABRICKS_SERVER_HOSTNAME |
✅ | — | adb-xxx.azuredatabricks.net (no https://) |
DATABRICKS_HTTP_PATH |
✅ | — | /sql/1.0/warehouses/… |
DATABRICKS_AUTH_METHOD |
obo |
pat | u2m | obo |
|
DATABRICKS_PAT |
✅ if auth_method=pat |
— | Personal access token |
DATABRICKS_DEFAULT_CATALOG |
None |
Applied as USE CATALOG before each query |
|
DATABRICKS_DEFAULT_SCHEMA |
None |
Applied as USE SCHEMA before each query |
|
DATABRICKS_CONNECT_TIMEOUT_S |
30 |
Connection timeout in seconds | |
DATABRICKS_RETRY_ATTEMPTS |
1 |
Extra attempts on transient failures | |
DATABRICKS_RETRY_BACKOFF_S |
0.5 |
Initial backoff between retries (doubles each attempt) | |
QUERY_TAG |
streamlit-app |
Prepended as a SQL comment: /* streamlit-app */ |
.env file (recommended for local development)
Create a .env file in the project root (never commit it):
DATABRICKS_SERVER_HOSTNAME=adb-1234567890123456.7.azuredatabricks.net
DATABRICKS_HTTP_PATH=/sql/1.0/warehouses/abcdef1234567890
DATABRICKS_AUTH_METHOD=u2m
DATABRICKS_DEFAULT_CATALOG=my_catalog
DATABRICKS_DEFAULT_SCHEMA=my_schema
For PAT authentication, add:
DATABRICKS_AUTH_METHOD=pat
DATABRICKS_PAT=dapi0123456789abcdef
Usage
from databricks_app_utils.settings import AppSettings
settings = AppSettings()
print(settings.databricks_server_hostname)
print(settings.databricks_auth_method) # AuthMethod.U2M
In a Streamlit app, wrap it with @st.cache_resource so settings are read only once per server process:
@st.cache_resource
def get_settings() -> AppSettings:
return AppSettings()
Authentication
See docs/authentication.md for a full technical deep-dive. The summary is:
| Method | DATABRICKS_AUTH_METHOD |
Best for |
|---|---|---|
| PAT | pat |
CI/CD, service accounts |
| U2M | u2m |
Local development — browser OAuth, zero secrets |
| OBO | obo |
Deployed Databricks Apps |
Usage
build_auth() converts settings into a DatabricksAuth value object. You rarely need to call it directly — DatabricksClient takes one as a constructor argument.
from databricks_app_utils.settings import AppSettings
from databricks_app_utils.auth import build_auth
settings = AppSettings()
auth = build_auth(settings)
PAT
DATABRICKS_AUTH_METHOD=pat
DATABRICKS_PAT=dapi0123456789abcdef
auth = build_auth(settings)
# auth.method == AuthMethod.PAT
# auth.access_token == "dapi…"
U2M (browser OAuth — recommended for local dev)
DATABRICKS_AUTH_METHOD=u2m
No secrets needed. On the first query, a browser window opens for the user to log in. Subsequent queries within the same server process reuse the cached token silently.
auth = build_auth(settings)
# auth.method == AuthMethod.U2M
# auth.oauth_persistence ← in-memory OAuthPersistenceCache, held for process lifetime
OBO (Databricks Apps)
DATABRICKS_AUTH_METHOD=obo
The token is read from the X-Forwarded-Access-Token request header on every query. The token provider must be injected at runtime from the Streamlit layer:
auth = DatabricksAuth(
method=AuthMethod.OBO,
token_provider=lambda: st.context.headers["X-Forwarded-Access-Token"],
)
Database interface
DatabricksClient is the single interface for all SQL execution. It opens a short-lived connection per query (robust against warehouse idle timeouts) and applies USE CATALOG / USE SCHEMA automatically when defaults are configured.
Query methods
| Method | Returns | Use when |
|---|---|---|
query_polars(sql, params) |
polars.DataFrame |
You need a DataFrame for display or transformation |
query_pandas(sql, params) |
pandas.DataFrame |
Interoperability with pandas-based libraries |
query(sql, params) |
list[dict] |
Lightweight lookups; no Arrow overhead |
merge_dataframe(df, table, id_cols) |
None |
Upsert a DataFrame into a Delta table |
Named parameters
Use :name syntax in SQL. Lists are automatically expanded for IN clauses:
db.query_polars(
"SELECT * FROM orders WHERE status = :status AND region IN :regions",
params={"status": "shipped", "regions": ["EU", "US"]},
)
# Executes: SELECT * FROM orders WHERE status = ? AND region IN (?, ?)
Polars query
from databricks_app_utils.databricks_client import DatabricksClient
df = db.query_polars("SELECT id, name FROM customers LIMIT :n", params={"n": 100})
# Returns a polars.DataFrame
Pandas query
df = db.query_pandas("SELECT id, name FROM customers LIMIT :n", params={"n": 100})
# Returns a pandas.DataFrame
Plain dict query
rows = db.query("SELECT state, COUNT(*) AS cnt FROM customers GROUP BY state")
# Returns [{"state": "CA", "cnt": 1234}, …]
Upsert (MERGE)
Merge a DataFrame into a Delta table using one or more identity columns:
import polars as pl
updates = pl.DataFrame({"id": [1, 2], "score": [9.5, 7.1]})
db.merge_dataframe(
df=updates,
target_table="customer_scores",
id_columns=["id"],
)
Optionally, supply a version_column for optimistic locking — rows whose version has changed since the data was read are silently skipped:
db.merge_dataframe(
df=updates,
target_table="customer_scores",
id_columns=["id"],
version_column="updated_at",
)
Retry behaviour
DatabricksClient retries failed queries with exponential backoff. Configure via settings:
DATABRICKS_RETRY_ATTEMPTS=2 # 2 extra attempts (3 total)
DATABRICKS_RETRY_BACKOFF_S=1.0 # 1 s, then 2 s
Wiring it up in Streamlit
@st.cache_resource
def get_db() -> DatabricksClient:
settings = get_settings()
auth = build_auth(settings)
return DatabricksClient(settings=settings, auth=auth)
Query registry
QueryRegistry loads .sql files from a Python package directory at runtime and caches them in memory. This keeps SQL out of Python source files and makes queries easy to find, review, and test independently.
File layout
SQL files live under a queries sub-package inside your app and are organised into sub-packages:
src/<your_app>/queries/
├── __init__.py
└── customers/
├── list_customers.sql
├── list_customers_by_state.sql
└── list_states.sql
Loading a query
from databricks_app_utils.query_registry import QueryRegistry
registry = QueryRegistry(package="your_app.queries")
q = registry.get("customers/list_customers")
print(q.name) # "customers/list_customers"
print(q.sql) # "SELECT customerid, first_name …\n"
The registry is lazy — a file is read from disk only on first access, then cached for the lifetime of the instance.
Passing a query to DatabricksClient
q = registry.get("customers/list_customers_by_state")
df = db.query_polars(q.sql, params={"states": ["CA", "NY"], "limit": 200})
Wiring it up in Streamlit
@st.cache_resource
def get_queries() -> QueryRegistry:
return QueryRegistry(package="your_app.queries")
Why GPL-3.0?
We believe in open source software and want to ensure that improvements to this library remain open and available to everyone. The GPL-3.0 license guarantees that all derivatives and modifications stay free and open source.
Made with ❤️ by the contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databricks_app_utils-0.2.2.tar.gz.
File metadata
- Download URL: databricks_app_utils-0.2.2.tar.gz
- Upload date:
- Size: 64.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bceeaa194bb1eecf6a4453d2aa8294316b1e220a5c6a3268000a6f7aeb456f0e
|
|
| MD5 |
61e82b9167b0c1b0534e75bc02358969
|
|
| BLAKE2b-256 |
a36447a6494eed0429f40dbbd7dd708bf754480819721d27220d108a55a9512f
|
Provenance
The following attestation bundles were made for databricks_app_utils-0.2.2.tar.gz:
Publisher:
publish.yml on cstotzer/databricks-app-utils
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databricks_app_utils-0.2.2.tar.gz -
Subject digest:
bceeaa194bb1eecf6a4453d2aa8294316b1e220a5c6a3268000a6f7aeb456f0e - Sigstore transparency entry: 1005613908
- Sigstore integration time:
-
Permalink:
cstotzer/databricks-app-utils@0b5e037a2052907f23d9675d6d5c4160d6f3eece -
Branch / Tag:
refs/heads/main - Owner: https://github.com/cstotzer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0b5e037a2052907f23d9675d6d5c4160d6f3eece -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file databricks_app_utils-0.2.2-py3-none-any.whl.
File metadata
- Download URL: databricks_app_utils-0.2.2-py3-none-any.whl
- Upload date:
- Size: 24.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56f119ab01f3b79e197ae98842ce6c3ff0ba0bb861bf5b0bedb6f782910b6815
|
|
| MD5 |
c40f4468571d0ae0e5de584d28aeec19
|
|
| BLAKE2b-256 |
8b5ecf36eb5fc8df59c25be504885dc6f117a2037658a6658a2ed4709a698920
|
Provenance
The following attestation bundles were made for databricks_app_utils-0.2.2-py3-none-any.whl:
Publisher:
publish.yml on cstotzer/databricks-app-utils
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databricks_app_utils-0.2.2-py3-none-any.whl -
Subject digest:
56f119ab01f3b79e197ae98842ce6c3ff0ba0bb861bf5b0bedb6f782910b6815 - Sigstore transparency entry: 1005613910
- Sigstore integration time:
-
Permalink:
cstotzer/databricks-app-utils@0b5e037a2052907f23d9675d6d5c4160d6f3eece -
Branch / Tag:
refs/heads/main - Owner: https://github.com/cstotzer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0b5e037a2052907f23d9675d6d5c4160d6f3eece -
Trigger Event:
workflow_dispatch
-
Statement type: