Reusable CLI for uploading, submitting, validating, fetching logs, and cleaning Databricks job runs

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

ryan_neo4j

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
Programming Language
Typing
- Typed

Project description

databricks-job-runner

Reusable CLI for uploading, submitting, and cleaning Databricks job runs.

Wraps the Databricks Python SDK into a small library that each project configures with a Runner instance. One Runner gives you nine CLI subcommands — upload, download, submit, validate, logs, clean, catalog, schema, and volume — without writing any Databricks API code in your project.

Installation

uv add databricks-job-runner

Or with pip:

pip install databricks-job-runner

For local development against a checkout:

# pyproject.toml
[tool.uv.sources]
databricks-job-runner = { path = "../databricks-job-runner", editable = true }

Quick start

Create a cli/ package in your project with two files:

cli/__init__.py

from databricks_job_runner import Runner

runner = Runner(
    run_name_prefix="my_project",
    wheel_package="my_package",  # optional
)

cli/__main__.py

from cli import runner
runner.main()

Then run from your project's root (not from the databricks-job-runner repo — this is a library, not a standalone CLI):

uv run python -m cli upload --all          # upload agent_modules/*.py
uv run python -m cli upload test_hello.py  # upload a single file
uv run python -m cli upload --wheel        # build and upload wheel
uv run python -m cli upload --data         # upload data/ to UC volume
uv run python -m cli upload --data exports # upload a specific data subdirectory
uv run python -m cli download results/out.csv        # download a file from the volume
uv run python -m cli download results/out.csv -o /tmp/out.csv  # specify local destination
uv run python -m cli download --list               # list files at the volume root
uv run python -m cli download --list results       # list a subdirectory
uv run python -m cli submit test_hello.py          # submit a job and wait
uv run python -m cli submit test_hello.py --no-wait
uv run python -m cli submit test_hello.py --upload  # upload all scripts then submit
uv run python -m cli submit test_hello.py --compute serverless  # override compute mode
uv run python -m cli validate              # list remote workspace contents
uv run python -m cli validate test_hello.py  # verify a specific file is uploaded
uv run python -m cli logs                  # stdout/stderr from the most recent run
uv run python -m cli logs 12345            # stdout/stderr from a specific run
uv run python -m cli clean --yes           # clean workspace + runs
uv run python -m cli clean --runs --yes    # clean only runs

# Unity Catalog management
uv run python -m cli catalog list
uv run python -m cli catalog get my_catalog              # show storage location
uv run python -m cli catalog create my_catalog --comment "Analytics"
uv run python -m cli catalog create my_catalog --storage-root "abfss://container@account.dfs.core.windows.net/path"
uv run python -m cli catalog delete my_catalog --force --yes

uv run python -m cli schema list my_catalog
uv run python -m cli schema create my_catalog.my_schema
uv run python -m cli schema delete my_catalog.my_schema --yes

uv run python -m cli volume list my_catalog.my_schema
uv run python -m cli volume create my_catalog.my_schema.my_vol
uv run python -m cli volume create my_catalog.my_schema.ext_vol --volume-type EXTERNAL --storage-location s3://bucket/path
uv run python -m cli volume delete my_catalog.my_schema.my_vol --yes

Configuration

The runner reads a .env file from the project root. Core keys (all prefixed with DATABRICKS_ for consistency):

Key	Default	Required	Description
`DATABRICKS_PROFILE`	—	no	CLI profile in `~/.databrickscfg`. When unset, the SDK's unified auth falls back to env vars (`DATABRICKS_HOST`/`DATABRICKS_TOKEN`), Azure CLI, service principals, etc.
`DATABRICKS_COMPUTE_MODE`	`cluster`	no	`cluster` or `serverless`. Selects the compute backend for submitted jobs.
`DATABRICKS_CLUSTER_ID`	—	when `DATABRICKS_COMPUTE_MODE=cluster`	All-purpose cluster to run jobs on. Started automatically if terminated.
`DATABRICKS_SERVERLESS_ENV_VERSION`	`3`	no	Serverless environment version (e.g. `3` for Python 3.12).
`DATABRICKS_WORKSPACE_DIR`	—	yes	Remote workspace path (e.g. `/Users/you/my_project`)
`DATABRICKS_VOLUME_PATH`	—	when using `upload --wheel`, `upload --data`, or `download`	UC Volume path for wheel/data uploads and downloads.
`DATABRICKS_SECRET_SCOPE`	—	when using `secret_keys`	Databricks secret scope name. Values for keys listed in `secret_keys` are fetched from this scope at runtime instead of passed as plaintext parameters.

Precedence: pre-existing environment variables override .env values, matching 12-factor conventions (CI/CD and shell exports can override the file).

Additional non-core keys are captured in RunnerConfig.extras and automatically passed to submitted jobs as KEY=VALUE parameters. Scripts call inject_params() at startup to load them into os.environ, then use pydantic BaseSettings to read configuration.

Compute modes

Classic cluster (DATABRICKS_COMPUTE_MODE=cluster, the default): jobs submit to an existing all-purpose cluster identified by DATABRICKS_CLUSTER_ID. The runner auto-starts the cluster if it is terminated, and attaches wheels via Library(whl=...).
Serverless (DATABRICKS_COMPUTE_MODE=serverless): jobs submit to Databricks serverless compute with a job-level environment spec. No cluster ID needed; wheels attach as Environment.dependencies entries (UC Volume paths are supported directly).

Example `.env` (classic cluster)

DATABRICKS_PROFILE=my-profile
DATABRICKS_CLUSTER_ID=0123-456789-abcdef
DATABRICKS_WORKSPACE_DIR=/Users/ryan.knight@example.com/my_project
DATABRICKS_VOLUME_PATH=/Volumes/catalog/schema/volume
NEO4J_URI=neo4j+s://abc123.databases.neo4j.io
NEO4J_PASSWORD=secret

Example `.env` (serverless)

DATABRICKS_PROFILE=my-profile
DATABRICKS_COMPUTE_MODE=serverless
DATABRICKS_SERVERLESS_ENV_VERSION=3
DATABRICKS_WORKSPACE_DIR=/Users/ryan.knight@example.com/my_project
DATABRICKS_VOLUME_PATH=/Volumes/catalog/schema/volume

All DATABRICKS_* keys listed above become typed fields on RunnerConfig; any other keys (like NEO4J_URI above) go into config.extras.

API

`Runner`

Runner(
    run_name_prefix: str,
    project_dir: Path | str | None = None,
    wheel_package: str | None = None,
    secret_keys: list[str] | None = None,
    scripts_dir: str = "agent_modules",
    extra_files: list[str] | None = None,
    cli_command: str = "uv run python -m cli",
)

Parameter	Description
`run_name_prefix`	Prefix for job run names and cleanup filtering
`project_dir`	Project root (defaults to `cwd()`). Must contain `.env` and the scripts directory
`wheel_package`	Package name for wheel builds. Enables `upload --wheel`. Wheels upload to `<DATABRICKS_VOLUME_PATH>/wheels/`
`secret_keys`	`.env` key names whose values are stored in a Databricks secret scope instead of forwarded as plaintext parameters. Requires `DATABRICKS_SECRET_SCOPE` in `.env`
`scripts_dir`	Name of the local subdirectory containing scripts to upload and submit (default: `"agent_modules"`). Change to match your project layout, e.g. `"jobs"` or `"scripts"`
`extra_files`	Paths relative to `project_dir` that are uploaded into the remote scripts directory alongside Python scripts. Use for non-Python assets job scripts read via `Path(__file__).parent` on the cluster (e.g. `["sql/gold_schema.sql"]`)
`cli_command`	Command string printed in "Next steps" hints after a job is submitted (default: `"uv run python -m cli"`). Override when your project uses a custom entry point, e.g. `"uv run myproject"`

`RunnerConfig`

Pydantic model holding parsed .env values. Frozen (immutable) after construction.

Field	Type	Description
`databricks_profile`	`str \| None`	CLI profile name, or `None` for unified-auth fallback
`databricks_compute_mode`	`Literal["cluster", "serverless"]`	Compute backend (`"cluster"` by default)
`databricks_cluster_id`	`str \| None`	Cluster ID (required when `databricks_compute_mode == "cluster"`)
`databricks_serverless_env_version`	`str`	Serverless environment version (default `"3"`)
`databricks_workspace_dir`	`str`	Remote workspace root (required)
`databricks_volume_path`	`str \| None`	UC Volume path for wheel/data uploads and downloads
`secret_scope`	`str \| None`	Databricks secret scope name (set via `DATABRICKS_SECRET_SCOPE`)
`extras`	`dict[str, str]`	All non-core keys from `.env`

The env_params() method returns DATABRICKS_WORKSPACE_DIR, DATABRICKS_VOLUME_PATH (when set), secret-scope metadata (when secret_keys is configured), and all extras as KEY=VALUE strings suitable for job parameter injection.

`inject_params`

from databricks_job_runner import inject_params
inject_params()

Call at the top of submitted scripts to parse KEY=VALUE parameters from sys.argv into os.environ. This lets scripts use pydantic BaseSettings or os.getenv() to read configuration that the runner injected from .env. Uses setdefault so pre-existing env vars take precedence.

Note: databricks_job_runner is not available on the Databricks cluster. For standalone scripts (not part of a wheel), inline the equivalent logic instead of importing:
import os, sys
for _arg in sys.argv[1:]:
    if "=" in _arg and not _arg.startswith("-"):
        _key, _, _value = _arg.partition("=")
        os.environ.setdefault(_key, _value)
For wheel-based scripts, the wheel's entry point can call inject_params() only if databricks_job_runner is listed as a wheel dependency and installed in the cluster environment. Since submitted scripts run on Databricks where the package is not available by default, the inline approach is preferred.

`RunnerError`

Raised when a runner operation cannot proceed (missing config, file not found, cluster stopped, job failed). The CLI formats and exits; library callers can catch and handle.

Project layout

The runner expects this layout in your project:

my_project/
  .env
  agent_modules/
    test_hello.py
    run_lab2.py
    ...
  cli/
    __init__.py    # Runner config
    __main__.py    # entry point

Scripts in agent_modules/ are uploaded to {DATABRICKS_WORKSPACE_DIR}/agent_modules/ on Databricks and submitted as Spark Python tasks.

Subcommands

`upload`

upload <file> — Upload a single file from agent_modules/
upload --all — Upload all *.py files from agent_modules/
upload --wheel — Build a wheel with uv build and upload to the UC Volume (requires wheel_package and DATABRICKS_VOLUME_PATH)
upload --data [DIR] — Upload a local data directory to the UC Volume (default: data/). Requires DATABRICKS_VOLUME_PATH. Preserves subdirectory structure under the volume path.

`download`

download <path> — Download a file from the UC Volume. Path is relative to DATABRICKS_VOLUME_PATH, or absolute (starting with /Volumes).
download <path> --dest/-o <local> — Specify local destination path (default: current directory, using the remote filename).
download --list [SUBDIR] — List files at the volume root or in an optional subdirectory. Requires DATABRICKS_VOLUME_PATH.

`submit`

submit <script> — Submit a script as a one-time Databricks job and wait for completion. Default: test_hello.py
submit <script> --no-wait — Submit without waiting
submit <script> --upload — Upload all scripts from agent_modules/ before submitting
submit <script> --compute cluster|serverless — Override DATABRICKS_COMPUTE_MODE for this run only

On classic mode, if the target cluster is not already RUNNING, it is started automatically and the submit waits (up to 20 minutes, the SDK default) for it to reach RUNNING. On serverless, no warm-up step is required. When submitting a script whose name starts with run_{wheel_package} (e.g. run_{wheel_package}.py, run_{wheel_package}_schema.py, run_{wheel_package}_sample.py), the runner automatically attaches the wheel — as a Library(whl=...) on classic, or as an Environment.dependencies entry on serverless. Scripts that don't follow this prefix convention (e.g. test_hello.py) are submitted without the wheel.

`validate`

validate — List the remote workspace directory and its agent_modules/ subdirectory. On classic, auto-starts the cluster if needed; on serverless, this is a no-op.
validate <file> — Also verify that {DATABRICKS_WORKSPACE_DIR}/agent_modules/<file> exists; exits non-zero if not.

`logs`

logs — Print stdout/stderr, error, and trace from the most recent run matching {run_name_prefix}:*
logs <run_id> — Print output for a specific parent run ID

Output is fetched via the Jobs API's get_run_output, which returns the tail 5 MB of stdout/stderr captured per task (the API caps output size; truncation is signaled in the output). The runner resolves the parent run to its task-level run IDs automatically, so pass the parent run_id shown at submit time. Databricks auto-expires runs after 60 days.

`clean`

clean — Delete the remote workspace directory and all matching job runs
clean --workspace — Delete only the workspace directory
clean --runs — Delete only job runs
clean --yes — Skip confirmation prompt

`catalog`

Manage Unity Catalog catalogs.

catalog list — List all catalogs visible to the current user
catalog get <name> — Show details for a catalog, including the storage root and storage location (managed location). Use this to find where a catalog's managed tables are stored.
catalog create <name> [--storage-root URL] [--comment TEXT] — Create a new catalog. --storage-root sets the managed storage location (equivalent to MANAGED LOCATION in SQL). Required on metastores that use per-catalog storage roots instead of a metastore-level default.
catalog delete <name> [--yes] [--force] — Delete a catalog. --force cascades to all schemas, tables, and volumes inside it. Prompts for confirmation unless --yes is given.

Finding the managed location

To find the managed storage location for an existing catalog:

uv run python -m cli catalog get my_catalog

This prints the storage root (the URL set at creation time) and the storage location (the full resolved path where managed tables are stored). Example output:

Catalog: my_catalog
  Owner:            ryan.knight@example.com
  Storage root:     abfss://container@account.dfs.core.windows.net/path
  Storage location: abfss://container@account.dfs.core.windows.net/path/__unitystorage/catalogs/abc123
  Comment:          Analytics catalog
  Created:          2025-01-15 10:30:00

`schema`

Manage Unity Catalog schemas. Schema names use dotted notation: catalog.schema.

schema list <catalog> — List schemas in a catalog
schema get <catalog.schema> — Show details for a schema
schema create <catalog.schema> [--comment TEXT] — Create a new schema
schema delete <catalog.schema> [--yes] — Delete a schema. Prompts for confirmation unless --yes is given.

`volume`

Manage Unity Catalog volumes. Volume names use dotted notation: catalog.schema.volume.

volume list <catalog.schema> — List volumes in a schema
volume get <catalog.schema.volume> — Show details (type, owner, storage location) for a volume
volume create <catalog.schema.volume> [--volume-type MANAGED|EXTERNAL] [--storage-location URL] [--comment TEXT] — Create a volume. Defaults to MANAGED. EXTERNAL volumes require --storage-location.
volume delete <catalog.schema.volume> [--yes] — Delete a volume. Prompts for confirmation unless --yes is given.

Releasing

Releases are published to PyPI automatically when you push a Git tag. The version in the tag becomes the package version.

git tag v0.4.8 && git push origin v0.4.8

The GitHub Actions workflow strips the v prefix, patches pyproject.toml with the version, builds the wheel and sdist, and publishes to PyPI via trusted publishing.

Requirements

Python 3.12+
Databricks authentication: either a Databricks CLI profile, or env vars (DATABRICKS_HOST/DATABRICKS_TOKEN), or any other unified-auth method
Either a Databricks all-purpose cluster (auto-started if terminated) or serverless compute enabled for the workspace
uv (for wheel building only)

Architecture

databricks-job-runner is layered into a thin CLI, an orchestrator, and a set of single-purpose action modules. Runner is the only class consuming projects need to touch.

cli.py          argparse + dispatch (flags -> Runner method calls)
  |
runner.py       Runner: holds config, owns the WorkspaceClient,
  |             exposes one method per subcommand
  |
  |-- config.py     RunnerConfig (frozen pydantic) + .env parser
  |-- compute.py    ClassicCluster / Serverless strategies (Protocol)
  |-- inject.py     inject_params() for submitted scripts
  |-- upload.py     workspace file + wheel + data upload
  |-- download.py   UC Volume file download and directory listing
  |-- submit.py     compute-agnostic job submission
  |-- validate.py   workspace listing + file-existence checks
  |-- logs.py       per-task stdout/stderr retrieval
  |-- catalog.py    Unity Catalog catalog/schema/volume management
  |-- clean.py      workspace + run cleanup
  |-- errors.py     RunnerError

Layers

CLI (cli.py) owns all argparse setup and translates the parsed namespace into method calls on Runner. Formats RunnerError into friendly exit messages. No argparse knowledge lives outside this file.
Orchestration (runner.py) exposes the Runner class. RunnerConfig and the WorkspaceClient are built lazily on first access, so importing a project's cli/__init__.py doesn't touch Databricks. Each public method coordinates a single subcommand end-to-end.
Action modules (upload.py, submit.py, validate.py, logs.py, clean.py, catalog.py) are plain functions wrapping SDK calls. None know about argparse or Runner, keeping each unit composable and independently testable.
Compute strategies (compute.py) implement the Compute protocol. A strategy knows how to (1) validate that its backend is ready, (2) decorate a SubmitTask with backend-specific fields, and (3) produce the top-level environments[] list for jobs.submit. submit_job is compute-agnostic — swapping backends is a strategy change, not a conditional branch.

Design choices

Strategy pattern for compute. Compute is a typing.Protocol, so adding a new backend is a new frozen dataclass that matches the shape — no changes to submit_job, Runner, or the CLI. ClassicCluster and Serverless are both frozen dataclasses for value-equality and immutability.
Single validation point. Required-key enforcement lives entirely in RunnerConfig.from_env_file, branching on DATABRICKS_COMPUTE_MODE (only DATABRICKS_CLUSTER_ID is required when mode is cluster). Downstream code trusts the config is valid.
Automatic parameter injection. All non-core .env keys are passed to submitted jobs as KEY=VALUE parameters via RunnerConfig.env_params(). Scripts call inject_params() at startup to load them into os.environ, then use pydantic BaseSettings to read configuration. No callback or per-project wiring needed.
Wheel convention. A submitted script whose name starts with run_{wheel_package} auto-attaches the latest wheel from dist/ — as Library(whl=...) on classic, or an Environment.dependencies entry on serverless. This covers both a single entry-point script (run_my_package.py) and per-phase scripts (run_my_package_schema.py, run_my_package_sample.py). Scripts that don't match the prefix (e.g. test_hello.py) are submitted without the wheel.
12-factor .env. Pre-existing env vars override .env values, so CI/CD exports and shell overrides trump the file — matching standard .env semantics.

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

ryan_neo4j

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
Programming Language
Typing
- Typed

Release history Release notifications | RSS feed

This version

0.4.8

Apr 25, 2026

0.4.7

Apr 22, 2026

0.4.6

Apr 22, 2026

0.4.5

Apr 21, 2026

0.4.4

Apr 21, 2026

0.4.3

Apr 18, 2026

0.4.2

Apr 10, 2026

0.4.1

Apr 9, 2026

0.4.0

Apr 9, 2026

0.3.1

Apr 8, 2026

0.3.0

Apr 6, 2026

0.2.0

Apr 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_job_runner-0.4.8.tar.gz (27.8 kB view details)

Uploaded Apr 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

databricks_job_runner-0.4.8-py3-none-any.whl (34.5 kB view details)

Uploaded Apr 25, 2026 Python 3

File details

Details for the file databricks_job_runner-0.4.8.tar.gz.

File metadata

Download URL: databricks_job_runner-0.4.8.tar.gz
Upload date: Apr 25, 2026
Size: 27.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for databricks_job_runner-0.4.8.tar.gz
Algorithm	Hash digest
SHA256	`9daf3a66c48a42aaf89c79d08eefd7efe63d2cfae45a47ce32661bdd88a8ed1f`
MD5	`ae9463e7055248f9f6eba0d47255e664`
BLAKE2b-256	`b8a42eca16546967e0dd1a12d2b5ac04b49318b56ae4c7fb7c5fe6279e885790`

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_job_runner-0.4.8.tar.gz:

Publisher: publish.yml on neo4j-partners/databricks-job-runner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: databricks_job_runner-0.4.8.tar.gz
- Subject digest: 9daf3a66c48a42aaf89c79d08eefd7efe63d2cfae45a47ce32661bdd88a8ed1f
- Sigstore transparency entry: 1376436586
- Sigstore integration time: Apr 25, 2026
Source repository:
- Permalink: neo4j-partners/databricks-job-runner@8609da3cec356722ba9ccff9a59a690e67281ae1
- Branch / Tag: refs/tags/v0.4.8
- Owner: https://github.com/neo4j-partners
- Access: internal
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8609da3cec356722ba9ccff9a59a690e67281ae1
- Trigger Event: push

File details

Details for the file databricks_job_runner-0.4.8-py3-none-any.whl.

File metadata

Download URL: databricks_job_runner-0.4.8-py3-none-any.whl
Upload date: Apr 25, 2026
Size: 34.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for databricks_job_runner-0.4.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0a1681039afe38e7a3a7c565fc827ce6ecbefb74819272f55d7a6162dafa0536`
MD5	`30b2adcc26f1f7a2c5b67f3757da3183`
BLAKE2b-256	`5e8da81fb93bbbfa589eb869be0b04dfbcb57d09feb6707de010d9974908c293`

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_job_runner-0.4.8-py3-none-any.whl:

Publisher: publish.yml on neo4j-partners/databricks-job-runner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: databricks_job_runner-0.4.8-py3-none-any.whl
- Subject digest: 0a1681039afe38e7a3a7c565fc827ce6ecbefb74819272f55d7a6162dafa0536
- Sigstore transparency entry: 1376436598
- Sigstore integration time: Apr 25, 2026
Source repository:
- Permalink: neo4j-partners/databricks-job-runner@8609da3cec356722ba9ccff9a59a690e67281ae1
- Branch / Tag: refs/tags/v0.4.8
- Owner: https://github.com/neo4j-partners
- Access: internal
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8609da3cec356722ba9ccff9a59a690e67281ae1
- Trigger Event: push

databricks-job-runner 0.4.8

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

databricks-job-runner

Installation

Quick start

Configuration

Compute modes

Example .env (classic cluster)

Example .env (serverless)

API

Runner

RunnerConfig

inject_params

RunnerError

Project layout

Subcommands

upload

download

submit

validate

logs

clean

catalog

Finding the managed location

schema

volume

Releasing

Requirements

Architecture

Layers

Design choices

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Example `.env` (classic cluster)

Example `.env` (serverless)

`Runner`

`RunnerConfig`

`inject_params`

`RunnerError`

`upload`

`download`

`submit`

`validate`

`logs`

`clean`

`catalog`

`schema`

`volume`