Skip to main content

Checks and fixes SQL code for PySpark SQL using SQLFluff.

Project description

sqlfluff-pyspark

Lint and optionally fix SQL embedded in spark.sql(...) calls using SQLFluff.

Installation

pip install sqlfluff-pyspark

Command Line Usage

# Lint spark.sql strings
sqlfluff-pyspark path/to/file.py another_file.py

# Apply fixes (writes changes back to the files)
sqlfluff-pyspark --fix path/to/file.py

Exit codes:

  • 0: Success / no actionable snippets / no violations
  • 1: Lint violations found (printed to stderr) or fixes applied and re-lint failed
  • 1: Unexpected error from sqlfluff invocation

Pre-commit Hook Integration

This project provides pre-commit hooks so you can automatically lint (and optionally fix) spark.sql strings before committing.

Add the following to your .pre-commit-config.yaml:

repos:
  - repo: https://github.com/danieltom/sqlfluff-pyspark
    rev: v0.1.0  # or the latest tag
    hooks:
      - id: sqlfluff-pyspark-lint
      # Optional fix hook (will modify files). Normally run separately or in CI.
      # - id: sqlfluff-pyspark-fix

Then install:

pre-commit install

Choosing lint vs fix hook

Use the lint hook locally to keep commits clean. Run the fix hook manually:

pre-commit run sqlfluff-pyspark-fix --all-files

or directly:

sqlfluff-pyspark --fix your_script.py

How It Works

  1. Parses Python source with ast to find calls to spark.sql(...).
  2. Extracts string literals (skips f-strings and non-constant expressions).
  3. Writes each snippet to a temp SQL file and calls sqlfluff on them.
  4. Translates violation line numbers back to original file positions.
  5. For fixes, rewrites the original string literals preserving style (indent, quoting) when possible.

Inline directive sqlfluff: anywhere in the literal will skip linting/fixing for that snippet.

Limitations / Notes

  • F-strings are skipped (dynamic content).
  • Concatenated string literals are supported ("SELECT" + " 1").
  • Multi-line snippets are normalized with indentation preserved.
  • Only supports .py files passed explicitly; it does not auto-discover.

Development

pip install -e .[dev]
pytest -vv

Versioning

The hook rev should match a published tag. If using main, be aware of potential breaking changes.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlfluff_pyspark-0.1.1.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlfluff_pyspark-0.1.1-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file sqlfluff_pyspark-0.1.1.tar.gz.

File metadata

  • Download URL: sqlfluff_pyspark-0.1.1.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlfluff_pyspark-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fcce775e1b0a983fbd82ac2a2b5c31010f6fb062f0aa28ae26eb3e3c7917b285
MD5 91ad053d725c46d32ac75ee2ca84b32e
BLAKE2b-256 b9b31a6db2515a61b710f4eff14a1afa97ba6eaf27a1312cbee97cf5aea14aaa

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlfluff_pyspark-0.1.1.tar.gz:

Publisher: release.yml on dan1elt0m/sqlfluff-pyspark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlfluff_pyspark-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sqlfluff_pyspark-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 93df12e851b32838ea769f61e7d1c41ed74b47048c220e0568aaff301662293a
MD5 761c5d2f35e75705ae8f889f04c31e21
BLAKE2b-256 fdb49999b21dab9bf87932224129a0721d4d0c19906ef36e29fac1441beb6996

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlfluff_pyspark-0.1.1-py3-none-any.whl:

Publisher: release.yml on dan1elt0m/sqlfluff-pyspark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page