Skip to main content

Pre-commit hook to fix bare magic commands in Databricks .py-format notebooks

Project description

databricks-notebook-linter

A pre-commit hook that fixes bare magic commands in Databricks .py-format notebooks.

Problem

Databricks exports notebooks as .py files with special comment markers. Magic commands like %pip install and !nvidia-smi appear as bare lines, which are invalid Python syntax. This breaks linters (ruff, flake8) and type checkers (ty, mypy) that try to parse these files.

Solution

This tool prefixes bare magic commands with # MAGIC, converting them to Python comments that Databricks still recognizes and executes:

# Before
%pip install some-package==1.0.4

# After
# MAGIC %pip install some-package==1.0.4

It handles:

  • Single-line magic commands (%pip, %sql, %md, %sh, %fs, %run, %python, %r, %scala)
  • Shell bang commands (!nvidia-smi)
  • dbutils.library.restartPython() calls
  • Multiline continuations (%pip install -U \)
  • Block-level magic -- if a %pip or ! command is inside an if, for, try, or other block, the entire block is prefixed
  • Nested blocks -- magic three levels deep prefixes all enclosing levels
  • Compound blocks -- if/elif/else, try/except/finally treated as single units
  • Mixed cells -- regular Python lines outside blocks are left untouched

The tool is idempotent -- running it twice produces the same result.

Usage

As a pre-commit hook

Add to your .pre-commit-config.yaml:

repos:
  - repo: https://github.com/Yipit/databricks-notebook-linter
    rev: v0.2.1
    hooks:
      - id: fix-databricks-magic
        args: [--fix]

This auto-fixes files on commit. To check without modifying files, omit args:

hooks:
  - id: fix-databricks-magic

As a CLI tool

pip install databricks-notebook-linter

# Check mode (default): report issues, exit 1 if any found
fix-databricks-magic path/to/notebook.py

# Fix mode: rewrite files in place, exit 1 if any changed
fix-databricks-magic --fix path/to/notebook.py

Check mode output

notebook.py:5: bare magic command '%pip install foo' needs '# MAGIC' prefix
notebook.py:10: line in block containing magic needs '# MAGIC' prefix

How it works

  1. Checks if the file starts with # Databricks notebook source -- skips non-notebook files
  2. Splits the file into cells on # COMMAND ---------- boundaries
  3. For each cell, scans for bare magic lines (lines starting with %pip, !, etc.)
  4. If magic is at the top level, marks just that line (and any continuation lines)
  5. If magic is indented inside a block, walks backwards to find the top-level enclosing block and forwards to find the end of compound blocks (else, except, finally), then marks every line in the block
  6. Prefixes all marked lines with # MAGIC, preserving relative indentation for block-internal lines

Examples

Bare magic commands

The simplest case -- a magic command on its own line gets prefixed:

# Before                              # After
%pip install transformers             # MAGIC %pip install transformers
!nvidia-smi                           # MAGIC !nvidia-smi
%sql SELECT * FROM my_table           # MAGIC %sql SELECT * FROM my_table
dbutils.library.restartPython()       # MAGIC dbutils.library.restartPython()

Multiline continuations

When a %pip install spans multiple lines with \, all continuation lines are prefixed:

# Before
%pip install -U \
  transformers==4.57.6 \
  datasets==4.5.0 \
  peft==0.18.1

# After
# MAGIC %pip install -U \
# MAGIC   transformers==4.57.6 \
# MAGIC   datasets==4.5.0 \
# MAGIC   peft==0.18.1

Conditional installs

When a magic command is inside a block, the entire block is prefixed -- the if statement itself and all lines inside it. This is necessary because Databricks needs the whole block to be in magic context:

# Before
if COMPUTE_ENV == "serverless":
    %pip install -U hf_transfer

# After
# MAGIC if COMPUTE_ENV == "serverless":
# MAGIC     %pip install -U hf_transfer

Compound blocks (if/else, try/except)

The tool treats if/elif/else and try/except/finally as single units. If magic appears in any branch, the entire compound block is prefixed:

# Before
try:
    import bitsandbytes
except:
    %pip install bitsandbytes

# After
# MAGIC try:
# MAGIC     import bitsandbytes
# MAGIC except:
# MAGIC     %pip install bitsandbytes

This also works when magic only appears in a secondary branch like else or except -- the entire block from the opening if or try is prefixed.

Mixed cells

When a cell contains both regular Python and magic commands, only the magic lines (and their enclosing blocks) are prefixed. Regular Python is left untouched:

# Before
INDEX_URL = dbutils.secrets.get("pip", "index_url")
%pip install some-package --index-url $INDEX_URL
result = process_data()

# After
INDEX_URL = dbutils.secrets.get("pip", "index_url")
# MAGIC %pip install some-package --index-url $INDEX_URL
result = process_data()

What is NOT treated as magic

The tool avoids false positives. These patterns are left alone:

# Not touched -- % inside a string
msg = "%pip is a magic command"

# Not touched -- modulo operator
result = 10 % 3

# Not touched -- % in a comment
# Use %pip to install packages

# Not touched -- if block without any magic in its body
if version == "1.0":
    print("correct")

Non-notebook files

Files that don't start with # Databricks notebook source are skipped entirely, and non-.py files are ignored by the CLI and pre-commit hook.

Development

make setup    # install dependencies
make test     # run tests (with 100% branch coverage enforcement)
make lint     # run ruff
make format   # auto-format

Releasing

# All at once: tag, publish to PyPI, push
make release VERSION=x.y.z

# Or in two steps:
make tag-release VERSION=x.y.z    # bump, commit, tag (local only)
make push-release VERSION=x.y.z   # build, publish to PyPI, push commit + tag

tag-release validates a clean working tree on main, runs tests and lint, bumps the version in pyproject.toml and README.md, commits, and creates an annotated tag. push-release builds, publishes to PyPI, then pushes. Nothing reaches the remote until the PyPI publish succeeds.

Contributing

PRs are welcome. Run make test and make lint before submitting.

License

MIT


NOTE: This codebase dictated but not read.*

* Claude wrote most of this at my prompting. I have reviewed the logic and tests, but I am still responsible for errors in the codebase, notwithstanding the original meaning of the introductory phrase.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_notebook_linter-0.2.1.tar.gz (93.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databricks_notebook_linter-0.2.1-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file databricks_notebook_linter-0.2.1.tar.gz.

File metadata

  • Download URL: databricks_notebook_linter-0.2.1.tar.gz
  • Upload date:
  • Size: 93.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for databricks_notebook_linter-0.2.1.tar.gz
Algorithm Hash digest
SHA256 906a9a49cb813a89fd0c7d0b2b9cc67ed8120685cf664fdbd120bc60fd42b578
MD5 de7ac638753efed3550ed45a0d89050e
BLAKE2b-256 11d44782abad72bbe2a0f8aa7a50d2e787a7de053fee6198f3e4a1df7c8af3ee

See more details on using hashes here.

File details

Details for the file databricks_notebook_linter-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: databricks_notebook_linter-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for databricks_notebook_linter-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 28c5f6df2863ac9127ea7ef498680bc09bc050ac388423d38011ad2c7cdf97f7
MD5 cd6a78ff6941bd5d94194e9fdcb8e83d
BLAKE2b-256 6402e784238fffbb2cf23e1b4dcf3ae062debb86a406c74addd9c891be4aa1f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page