extends databricks dbutils

Project description

databricks-ddbxutils

ddbxutils extends Databricks dbutils with features it lacks out of the box.

한국어 문서는 README.ko.md 를 참고하세요.

Features

Jinja2 template support for dbutils.widgets
EnvConfig — a YAML-based, environment-aware config loader (Jinja2 templates, secrets, variable references, date/number/string operations)
PythonFunctionDataSource — a custom data source built on the PySpark v2 DataSource API

Install

pip install databricks-ddbxutils

Testing

uv run pytest tests/ -v

EnvConfig

Highlights

Single YAML file that places dev / stg / prd values side by side, keyed by config name
Jinja2 templates — secret references, variable references, date arithmetic, numeric arithmetic, and string operations
Multi-pass rendering — resolves inter-variable dependencies automatically in up to 10 passes
project_prefix — override any key at runtime via a PREFIX__KEY environment variable

Resolution priority

env var {PREFIX}__{KEY}  >  YAML value (after Jinja2 rendering)  >  ENV_DEFAULTS fallback

Initialization via environment variables

If EnvConfig() is called without arguments, the values below are read from environment variables. Constructor arguments take precedence when provided.

Env var	Description	Default
`ENV`	Environment name (`dev` / `stg` / `prd`)	`dev`
`PROJECT_PREFIX`	Prefix used for `{PREFIX}__{KEY}` overrides	(none)
`CONFIG_PATH`	Path to the YAML config file	`conf/settings.yml`
`{PREFIX}__CONFIG_PATH`	Prefix-scoped config path (highest-priority env var)	-

export ENV=prd
export PROJECT_PREFIX=MYAPP
export CONFIG_PATH=conf/my-settings.yml

cfg = EnvConfig()   # initialized from the environment variables above

Basic usage

from ddbxutils import EnvConfig

# Environment inferred from the ENV variable (default: dev)
cfg = EnvConfig()

# Explicit environment + project_prefix override support
cfg = EnvConfig(env="prd", project_prefix="MYAPP")

# Accessing values (attribute / dict / get all work the same way)
cfg.run_date                       # attribute access
cfg["run_date"]                    # dict-style access
cfg.get("run_date", "2024-01-01")  # with a default

# The built-in keys (env / catalog / schema) are accessible the same way
cfg.env              == cfg["env"]     == cfg.get("env")
cfg.catalog          == cfg["catalog"] == cfg.get("catalog")
cfg.schema           == cfg["schema"]  == cfg.get("schema")

# Extras
cfg.full_table_name   # "dev_catalog.analytics_dev"
cfg.keys()            # built-in keys + YAML keys
"run_date" in cfg     # membership check

# Summary output
cfg.print_summary()

conf/settings.yml

# ===========================================================
# Per-environment values
# ===========================================================
storage_path:
  dev: "s3://my-project-dev/data"
  stg: "s3://my-project-stg/data"
  prd: "s3://my-project-prd/data"

retry_count:
  dev: 1
  stg: 2
  prd: 3

log_level:
  dev: DEBUG
  stg: INFO
  prd: WARNING

# ===========================================================
# Date arithmetic (values can reference other variables)
# ===========================================================
run_date: "{{today()}}"
start_date: "{{add_days(run_date, -7)}}"
last_month: "{{add_months(run_date, -1)}}"
month_start: "{{start_of_month(run_date)}}"
month_end: "{{end_of_month(run_date)}}"

# ===========================================================
# Numeric arithmetic
# ===========================================================
batch_size:
  dev: 1000
  stg: 5000
  prd: 10000
double_batch: "{{ batch_size * 2 }}"
page_count: "{{ batch_size // 256 }}"

# ===========================================================
# String operations
# ===========================================================
app_name: "my_project"
upper_app: "{{ app_name | upper }}"
archive_path: "{{ storage_path | replace('/data', '/archive') }}"
full_table: "{{ catalog ~ '.' ~ schema ~ '.events' }}"

# ===========================================================
# Formatting + nested calls
# ===========================================================
partition_date: "{{format_date(add_days(run_date, -1), '%Y%m%d')}}"
year_month: "{{format_date(run_date, '%Y%m')}}"

# ===========================================================
# Variable references + path composition
# ===========================================================
full_path: "{{storage_path}}/processed/{{format_date(run_date, '%Y/%m/%d')}}"

# ===========================================================
# Environment / catalog references
# ===========================================================
current_env: "{{ENV}}"
table_prefix: "{{catalog}}.{{schema}}"

# ===========================================================
# Secrets (automatically calls dbutils.secrets.get)
# ===========================================================
db_password:
  dev: "{{secrets/dev-scope/db-password}}"
  stg: "{{secrets/stg-scope/db-password}}"
  prd: "{{secrets/prd-scope/db-password}}"

api_key: "{{secrets/common-scope/api-key}}"

# ===========================================================
# Shared across environments
# ===========================================================
version: "1.0.0"

Overriding with `project_prefix` env vars

When initialized with project_prefix="MYAPP", any key can be overridden via a MYAPP__KEY environment variable.

export MYAPP__RUN_DATE=2024-06-01
export MYAPP__CATALOG=override_catalog
export MYAPP__SCHEMA=custom_schema

cfg = EnvConfig(env="prd", project_prefix="MYAPP")
cfg.run_date   # "2024-06-01"  ← env var wins
cfg.catalog    # "override_catalog"

Jinja2 template reference

Date functions

Syntax	Description
`{{today()}}`	Today's date
`{{now()}}`	Current timestamp
`{{make_date(2024, 1, 15)}}`	Build a specific date

Date arithmetic

Syntax	Description
`{{add_days(run_date, -7)}}`	Add/subtract n days
`{{add_months(run_date, -1)}}`	Add/subtract n months
`{{add_years(run_date, 1)}}`	Add/subtract n years
`{{start_of_month(run_date)}}`	First day of the month
`{{end_of_month(run_date)}}`	Last day of the month
`{{format_date(run_date, '%Y%m%d')}}`	Format a date
`{{format_date(add_months(run_date, -3), '%Y%m')}}`	Nested calls

Numeric arithmetic

Syntax	Description
`{{ batch_size * 2 }}`	Multiplication
`{{ batch_size + 500 }}`	Addition
`{{ total // page_size }}`	Integer division
`{{ count % 7 }}`	Modulo
`{{ 2 ** 10 }}`	Exponent
`{{ 'large' if count > 100 else 'small' }}`	Conditional expression

String operations

Syntax	Description
`{{ a ~ '_' ~ b }}`	String concatenation (`~` operator)
`{{ name \| upper }}`	Uppercase
`{{ name \| lower }}`	Lowercase
`{{ name \| title }}`	Title-case every word
`{{ name \| capitalize }}`	Capitalize the first letter only
`{{ name \| trim }}`	Strip surrounding whitespace
`{{ path \| replace('/raw', '/processed') }}`	String replacement
`{{ tags.split(',') \| join('-') }}`	Split then join
`{{ name \| length }}`	String length
`{{ code[:3] }}`	Slicing
`{{ 'yes' if path.startswith('/data') else 'no' }}`	`startswith` check
`{{ 'yes' if 'raw' in path else 'no' }}`	Substring membership

Variable references

Syntax	Description
`{{storage_path}}`	Reference another YAML config value
`{{env.HOME}}`	Reference an OS environment variable
`{{ENV}}`, `{{catalog}}`, `{{schema}}`	Current environment info
`{{secrets/SCOPE/KEY}}`	Calls `dbutils.secrets.get('SCOPE', 'KEY')`

Run

On Databricks without an init script (Serverless)

Create a Volume for the wheel and upload it:
- /Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/ddbxutils-<VERSION>-py3-none-any.whl
In the notebook's right-hand Environment panel, add the wheel file and click Apply.

Usage:

# dbutils.widgets.text('rawdate', '2025-05-24', 'Raw Date')
# dbutils.widgets.text('next_day', '{{add_days(rawdate, "%Y-%m-%d", "", 1)}}', 'Next Day')
import ddbxutils
next_day = ddbxutils.widgets.get('next_day')
# next_day: 2025-05-25

from ddbxutils import EnvConfig
cfg = EnvConfig(project_prefix="MYAPP")
cfg.print_summary()

On Databricks with an init script

Prepare the wheel and the init script:
- /Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/ddbxutils-<VERSION>-py3-none-any.whl

/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/init_script_ddbxutils.sh

#! /bin/bash

STARTUP_SCRIPT=/tmp/pyspark_startup.py

cat >> ${STARTUP_SCRIPT} << EOF

prefix = 'PYTHONSTARTUP_ddbxutils'
print(f'{prefix} custom startup script loading...')
try:
  import ddbxutils
  print(f'{prefix} Custom modules [ddbxutils] are loaded.')
except Exception as e:
  print(f'{prefix} e={e}')
  print(f'{prefix} import ddbxutils failed')
EOF

Spark config:

spark.executorEnv.PYTHONSTARTUP /tmp/pyspark_startup.py

Environment variables:
```
PYTHONSTARTUP=/tmp/pyspark_startup.py
```

Init scripts:

/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/init_script_ddbxutils.sh

Usage:

# dbutils.widgets.text('rawdate', '2025-05-24', 'Raw Date')
# dbutils.widgets.text('next_day', '{{add_days(rawdate, "%Y-%m-%d", "", 1)}}', 'Next Day')
next_day = ddbxutils.widgets.get('next_day')
# next_day: 2025-05-25

Project details

Release history Release notifications | RSS feed

0.6.0

Apr 14, 2026

0.5.3

Apr 13, 2026

This version

0.5.2

Apr 13, 2026

0.5.1

Apr 13, 2026

0.5.0

Apr 12, 2026

0.4.0

Feb 6, 2026

0.3.0

Feb 6, 2026

0.2.0

Feb 6, 2026

0.1.0

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_ddbxutils-0.5.2.tar.gz (70.1 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

databricks_ddbxutils-0.5.2-py3-none-any.whl (17.9 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file databricks_ddbxutils-0.5.2.tar.gz.

File metadata

Download URL: databricks_ddbxutils-0.5.2.tar.gz
Upload date: Apr 13, 2026
Size: 70.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for databricks_ddbxutils-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`58cb416119da49f77f38be33715c1337691de76d7182ffb25d2a58f581284eab`
MD5	`bf4abd3956c766f3d5404399d210cf5d`
BLAKE2b-256	`b0a9be7423ebe9f614d52489da2faee3e3091fbdfa30b57116fa8c096c8ea5dc`

See more details on using hashes here.

File details

Details for the file databricks_ddbxutils-0.5.2-py3-none-any.whl.

File metadata

Download URL: databricks_ddbxutils-0.5.2-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 17.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for databricks_ddbxutils-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fe38a969e11c9d5e7f77c55ad0f68cef5c5287ee507337242382f31d76ec602c`
MD5	`c1043854d02ebc46793d1cb3dc75ff6a`
BLAKE2b-256	`c1c7eac6c02f7945e0be6d519fc54c1539be56d3058239548f6a168756c5b5e8`

See more details on using hashes here.

databricks-ddbxutils 0.5.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

databricks-ddbxutils

Features

Install

Testing

EnvConfig

Highlights

Resolution priority

Initialization via environment variables

Basic usage

conf/settings.yml

Overriding with project_prefix env vars

Jinja2 template reference

Date functions

Date arithmetic

Numeric arithmetic

String operations

Variable references

Run

On Databricks without an init script (Serverless)

On Databricks with an init script

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Overriding with `project_prefix` env vars