extends databricks dbutils
Project description
databricks-ddbxutils
ddbxutils extends Databricks dbutils with features it lacks out of the box.
한국어 문서는 README.ko.md 를 참고하세요.
Features
- Jinja2 template support for
dbutils.widgets -
EnvConfig— a YAML-based, environment-aware config loader (Jinja2 templates, secrets, variable references, date/number/string operations) -
PythonFunctionDataSource— a custom data source built on the PySpark v2 DataSource API
Install
pip install databricks-ddbxutils
Testing
uv run pytest tests/ -v
EnvConfig
Highlights
- Single YAML file that places
dev/stg/prdvalues side by side, keyed by config name - Jinja2 templates — secret references, variable references, date arithmetic, numeric arithmetic, and string operations
- Multi-pass rendering — resolves inter-variable dependencies automatically in up to 10 passes
project_prefix— override any key at runtime via aPREFIX__KEYenvironment variable
Resolution priority
env var {PREFIX}__{KEY} > YAML value (after Jinja2 rendering) > ENV_DEFAULTS fallback
Initialization via environment variables
If EnvConfig() is called without arguments, the values below are read from environment variables. Constructor arguments take precedence when provided.
| Env var | Description | Default |
|---|---|---|
ENV |
Environment name (dev / stg / prd) |
dev |
PROJECT_PREFIX |
Prefix used for {PREFIX}__{KEY} overrides |
(none) |
CONFIG_PATH |
Path to the YAML config file | conf/settings.yml |
{PREFIX}__CONFIG_PATH |
Prefix-scoped config path (highest-priority env var) | - |
export ENV=prd
export PROJECT_PREFIX=MYAPP
export CONFIG_PATH=conf/my-settings.yml
cfg = EnvConfig() # initialized from the environment variables above
Basic usage
from ddbxutils import EnvConfig
# Environment inferred from the ENV variable (default: dev)
cfg = EnvConfig()
# Explicit environment + project_prefix override support
cfg = EnvConfig(env="prd", project_prefix="MYAPP")
# Accessing values (attribute / dict / get all work the same way)
cfg.run_date # attribute access
cfg["run_date"] # dict-style access
cfg.get("run_date", "2024-01-01") # with a default
# The built-in keys (env / catalog / schema) are accessible the same way
cfg.env == cfg["env"] == cfg.get("env")
cfg.catalog == cfg["catalog"] == cfg.get("catalog")
cfg.schema == cfg["schema"] == cfg.get("schema")
# Extras
cfg.full_table_name # "dev_catalog.analytics_dev"
cfg.keys() # built-in keys + YAML keys
"run_date" in cfg # membership check
# Summary output
cfg.print_summary()
conf/settings.yml
# ===========================================================
# Per-environment values
# ===========================================================
storage_path:
dev: "s3://my-project-dev/data"
stg: "s3://my-project-stg/data"
prd: "s3://my-project-prd/data"
retry_count:
dev: 1
stg: 2
prd: 3
log_level:
dev: DEBUG
stg: INFO
prd: WARNING
# ===========================================================
# Date arithmetic (values can reference other variables)
# ===========================================================
run_date: "{{today()}}"
start_date: "{{add_days(run_date, -7)}}"
last_month: "{{add_months(run_date, -1)}}"
month_start: "{{start_of_month(run_date)}}"
month_end: "{{end_of_month(run_date)}}"
# ===========================================================
# Numeric arithmetic
# ===========================================================
batch_size:
dev: 1000
stg: 5000
prd: 10000
double_batch: "{{ batch_size * 2 }}"
page_count: "{{ batch_size // 256 }}"
# ===========================================================
# String operations
# ===========================================================
app_name: "my_project"
upper_app: "{{ app_name | upper }}"
archive_path: "{{ storage_path | replace('/data', '/archive') }}"
full_table: "{{ catalog ~ '.' ~ schema ~ '.events' }}"
# ===========================================================
# Formatting + nested calls
# ===========================================================
partition_date: "{{format_date(add_days(run_date, -1), '%Y%m%d')}}"
year_month: "{{format_date(run_date, '%Y%m')}}"
# ===========================================================
# Variable references + path composition
# ===========================================================
full_path: "{{storage_path}}/processed/{{format_date(run_date, '%Y/%m/%d')}}"
# ===========================================================
# Environment / catalog references
# ===========================================================
current_env: "{{ENV}}"
table_prefix: "{{catalog}}.{{schema}}"
# ===========================================================
# Secrets (automatically calls dbutils.secrets.get)
# ===========================================================
db_password:
dev: "{{secrets/dev-scope/db-password}}"
stg: "{{secrets/stg-scope/db-password}}"
prd: "{{secrets/prd-scope/db-password}}"
api_key: "{{secrets/common-scope/api-key}}"
# ===========================================================
# Shared across environments
# ===========================================================
version: "1.0.0"
Overriding with project_prefix env vars
When initialized with project_prefix="MYAPP", any key can be overridden via a MYAPP__KEY environment variable.
export MYAPP__RUN_DATE=2024-06-01
export MYAPP__CATALOG=override_catalog
export MYAPP__SCHEMA=custom_schema
cfg = EnvConfig(env="prd", project_prefix="MYAPP")
cfg.run_date # "2024-06-01" ← env var wins
cfg.catalog # "override_catalog"
Jinja2 template reference
Date functions
| Syntax | Description |
|---|---|
{{today()}} |
Today's date |
{{now()}} |
Current timestamp |
{{make_date(2024, 1, 15)}} |
Build a specific date |
Date arithmetic
| Syntax | Description |
|---|---|
{{add_days(run_date, -7)}} |
Add/subtract n days |
{{add_months(run_date, -1)}} |
Add/subtract n months |
{{add_years(run_date, 1)}} |
Add/subtract n years |
{{start_of_month(run_date)}} |
First day of the month |
{{end_of_month(run_date)}} |
Last day of the month |
{{format_date(run_date, '%Y%m%d')}} |
Format a date |
{{format_date(add_months(run_date, -3), '%Y%m')}} |
Nested calls |
Numeric arithmetic
| Syntax | Description |
|---|---|
{{ batch_size * 2 }} |
Multiplication |
{{ batch_size + 500 }} |
Addition |
{{ total // page_size }} |
Integer division |
{{ count % 7 }} |
Modulo |
{{ 2 ** 10 }} |
Exponent |
{{ 'large' if count > 100 else 'small' }} |
Conditional expression |
String operations
| Syntax | Description |
|---|---|
{{ a ~ '_' ~ b }} |
String concatenation (~ operator) |
{{ name | upper }} |
Uppercase |
{{ name | lower }} |
Lowercase |
{{ name | title }} |
Title-case every word |
{{ name | capitalize }} |
Capitalize the first letter only |
{{ name | trim }} |
Strip surrounding whitespace |
{{ path | replace('/raw', '/processed') }} |
String replacement |
{{ tags.split(',') | join('-') }} |
Split then join |
{{ name | length }} |
String length |
{{ code[:3] }} |
Slicing |
{{ 'yes' if path.startswith('/data') else 'no' }} |
startswith check |
{{ 'yes' if 'raw' in path else 'no' }} |
Substring membership |
Variable references
| Syntax | Description |
|---|---|
{{storage_path}} |
Reference another YAML config value |
{{env.HOME}} |
Reference an OS environment variable |
{{ENV}}, {{catalog}}, {{schema}} |
Current environment info |
{{secrets/SCOPE/KEY}} |
Calls dbutils.secrets.get('SCOPE', 'KEY') |
Run
On Databricks without an init script (Serverless)
- Create a Volume for the wheel and upload it:
/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/ddbxutils-<VERSION>-py3-none-any.whl
- In the notebook's right-hand Environment panel, add the wheel file and click Apply.
- Usage:
# dbutils.widgets.text('rawdate', '2025-05-24', 'Raw Date') # dbutils.widgets.text('next_day', '{{add_days(rawdate, "%Y-%m-%d", "", 1)}}', 'Next Day') import ddbxutils next_day = ddbxutils.widgets.get('next_day') # next_day: 2025-05-25 from ddbxutils import EnvConfig cfg = EnvConfig(project_prefix="MYAPP") cfg.print_summary()
On Databricks with an init script
- Prepare the wheel and the init script:
/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/ddbxutils-<VERSION>-py3-none-any.whl
/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/init_script_ddbxutils.sh#! /bin/bash STARTUP_SCRIPT=/tmp/pyspark_startup.py cat >> ${STARTUP_SCRIPT} << EOF prefix = 'PYTHONSTARTUP_ddbxutils' print(f'{prefix} custom startup script loading...') try: import ddbxutils print(f'{prefix} Custom modules [ddbxutils] are loaded.') except Exception as e: print(f'{prefix} e={e}') print(f'{prefix} import ddbxutils failed') EOF
- Spark config:
spark.executorEnv.PYTHONSTARTUP /tmp/pyspark_startup.py
- Environment variables:
PYTHONSTARTUP=/tmp/pyspark_startup.py
- Init scripts:
/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/init_script_ddbxutils.sh
- Usage:
# dbutils.widgets.text('rawdate', '2025-05-24', 'Raw Date') # dbutils.widgets.text('next_day', '{{add_days(rawdate, "%Y-%m-%d", "", 1)}}', 'Next Day') next_day = ddbxutils.widgets.get('next_day') # next_day: 2025-05-25
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databricks_ddbxutils-0.5.2.tar.gz.
File metadata
- Download URL: databricks_ddbxutils-0.5.2.tar.gz
- Upload date:
- Size: 70.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58cb416119da49f77f38be33715c1337691de76d7182ffb25d2a58f581284eab
|
|
| MD5 |
bf4abd3956c766f3d5404399d210cf5d
|
|
| BLAKE2b-256 |
b0a9be7423ebe9f614d52489da2faee3e3091fbdfa30b57116fa8c096c8ea5dc
|
File details
Details for the file databricks_ddbxutils-0.5.2-py3-none-any.whl.
File metadata
- Download URL: databricks_ddbxutils-0.5.2-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe38a969e11c9d5e7f77c55ad0f68cef5c5287ee507337242382f31d76ec602c
|
|
| MD5 |
c1043854d02ebc46793d1cb3dc75ff6a
|
|
| BLAKE2b-256 |
c1c7eac6c02f7945e0be6d519fc54c1539be56d3058239548f6a168756c5b5e8
|