Checks and fixes SQL code for PySpark SQL using SQLFluff.
Project description
sqlfluff-pyspark
Lint and optionally fix SQL embedded in spark.sql(...) calls using SQLFluff.
Installation
pip install sqlfluff-pyspark
Command Line Usage
# Lint spark.sql strings
sqlfluff-pyspark path/to/file.py another_file.py
# Apply fixes (writes changes back to the files)
sqlfluff-pyspark --fix path/to/file.py
Exit codes:
- 0: Success / no actionable snippets / no violations
- 1: Lint violations found (printed to stderr) or fixes applied and re-lint failed
-
1: Unexpected error from sqlfluff invocation
Pre-commit Hook Integration
This project provides pre-commit hooks so you can automatically lint (and optionally fix) spark.sql strings before committing.
Add the following to your .pre-commit-config.yaml:
repos:
- repo: https://github.com/danieltom/sqlfluff-pyspark
rev: v0.1.0 # or the latest tag
hooks:
- id: sqlfluff-pyspark-lint
# Optional fix hook (will modify files). Normally run separately or in CI.
# - id: sqlfluff-pyspark-fix
Then install:
pre-commit install
Choosing lint vs fix hook
Use the lint hook locally to keep commits clean. Run the fix hook manually:
pre-commit run sqlfluff-pyspark-fix --all-files
or directly:
sqlfluff-pyspark --fix your_script.py
How It Works
- Parses Python source with
astto find calls tospark.sql(...). - Extracts string literals (skips f-strings and non-constant expressions).
- Writes each snippet to a temp SQL file and calls
sqlfluffon them. - Translates violation line numbers back to original file positions.
- For fixes, rewrites the original string literals preserving style (indent, quoting) when possible.
Inline directive sqlfluff: anywhere in the literal will skip linting/fixing for that snippet.
Limitations / Notes
- F-strings are skipped (dynamic content).
- Concatenated string literals are supported (
"SELECT" + " 1"). - Multi-line snippets are normalized with indentation preserved.
- Only supports
.pyfiles passed explicitly; it does not auto-discover.
Development
pip install -e .[dev]
pytest -vv
Versioning
The hook rev should match a published tag. If using main, be aware of potential breaking changes.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlfluff_pyspark-0.1.1.tar.gz.
File metadata
- Download URL: sqlfluff_pyspark-0.1.1.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcce775e1b0a983fbd82ac2a2b5c31010f6fb062f0aa28ae26eb3e3c7917b285
|
|
| MD5 |
91ad053d725c46d32ac75ee2ca84b32e
|
|
| BLAKE2b-256 |
b9b31a6db2515a61b710f4eff14a1afa97ba6eaf27a1312cbee97cf5aea14aaa
|
Provenance
The following attestation bundles were made for sqlfluff_pyspark-0.1.1.tar.gz:
Publisher:
release.yml on dan1elt0m/sqlfluff-pyspark
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sqlfluff_pyspark-0.1.1.tar.gz -
Subject digest:
fcce775e1b0a983fbd82ac2a2b5c31010f6fb062f0aa28ae26eb3e3c7917b285 - Sigstore transparency entry: 661264673
- Sigstore integration time:
-
Permalink:
dan1elt0m/sqlfluff-pyspark@a8ef59f50bf21a901959986eed7d8ede53b58aa2 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/dan1elt0m
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a8ef59f50bf21a901959986eed7d8ede53b58aa2 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file sqlfluff_pyspark-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sqlfluff_pyspark-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93df12e851b32838ea769f61e7d1c41ed74b47048c220e0568aaff301662293a
|
|
| MD5 |
761c5d2f35e75705ae8f889f04c31e21
|
|
| BLAKE2b-256 |
fdb49999b21dab9bf87932224129a0721d4d0c19906ef36e29fac1441beb6996
|
Provenance
The following attestation bundles were made for sqlfluff_pyspark-0.1.1-py3-none-any.whl:
Publisher:
release.yml on dan1elt0m/sqlfluff-pyspark
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sqlfluff_pyspark-0.1.1-py3-none-any.whl -
Subject digest:
93df12e851b32838ea769f61e7d1c41ed74b47048c220e0568aaff301662293a - Sigstore transparency entry: 661264680
- Sigstore integration time:
-
Permalink:
dan1elt0m/sqlfluff-pyspark@a8ef59f50bf21a901959986eed7d8ede53b58aa2 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/dan1elt0m
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a8ef59f50bf21a901959986eed7d8ede53b58aa2 -
Trigger Event:
workflow_dispatch
-
Statement type: