Find, format, and extract SQL embedded in Python files.
Project description
py-sql-cleaner
py-sql-cleaner is a CLI tool for finding, formatting, and extracting SQL
embedded in Python files.
It is built for Python codebases where long SQL queries are written directly
inside triple-quoted Python strings. The current MVP uses SQLGlot for formatting,
defaults to SQLGlot's generic dialect, and can format with database-specific
dialects that this project has explicitly enabled via --dialect.
query = """
select user_id, updated_at
from users
qualify row_number() over(partition by user_id order by updated_at desc)=1
"""
py-sql-cleaner can format that SQL in place, or extract it into an external .sql
file.
py-sql-cleaner is an early MVP. It uses SQLGlot internally for best-effort SQL
formatting. It does not connect to databases and does not execute SQL.
Dialect support means SQLGlot parser/formatter mode selection, not exhaustive
database validation. Redshift command-style statements such as COPY and
UNLOAD are preserved rather than reformatted to avoid changing load/export
options.
[!NOTE]
py-sql-cleaneris conservative by default: f-strings and Jinja-like templates are detected but skipped instead of being rewritten.
Features
- Format SQL embedded in Python triple-quoted strings
- Extract embedded SQL into external
.sqlfiles - Replace embedded SQL strings with file references
- Detect common SQL variable names such as
sql,query,*_sql, and*_query - Skip unsafe blocks, including f-strings and Jinja-like templates, by default
- Support explicit dialect selection with
--dialect/-d - Support
checkmode for CI - Support
dry-runmode before rewriting files
Installation
Install py-sql-cleaner from PyPI:
pip install py-sql-cleaner
Or install it as an isolated CLI tool with pipx:
pipx install py-sql-cleaner
Release archives and wheels are also attached to GitHub Releases.
You can also run it without installing:
uvx py-sql-cleaner --help
Quick Start
-
List embedded SQL blocks:
py-sql-cleaner list jobs/load_users.py
-
Preview formatting changes:
py-sql-cleaner format jobs/load_users.py --dry-run
-
Format embedded SQL in place:
py-sql-cleaner format jobs/load_users.py
-
Format with a database-specific dialect:
py-sql-cleaner format jobs/load_users.py -d redshift
Currently enabled dialects are
generic,mysql,postgres, andredshift. -
Extract embedded SQL into
.sqlfiles:py-sql-cleaner extract jobs/load_users.py --out-dir sql
-
Check formatting for CI:
py-sql-cleaner check jobs/load_users.py
Example
Before:
query = """
select user_id, updated_at
from users
qualify row_number() over(partition by user_id order by updated_at desc)=1
"""
After py-sql-cleaner format jobs/load_users.py:
query = """
SELECT
user_id,
updated_at
FROM users
QUALIFY
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY updated_at DESC) = 1
"""
Exact formatting is produced by SQLGlot and may change as SQLGlot changes.
After py-sql-cleaner extract jobs/load_users.py --out-dir sql:
query = "sql/query.sql"
Supported Input
The current MVP targets Python triple-quoted strings.
Supported:
query = """
SELECT *
FROM users
"""
load_users_sql = '''
SELECT *
FROM users
'''
Not targeted in the MVP:
query = "SELECT * FROM users"
Safety
py-sql-cleaner is conservative by default. It skips unsafe blocks instead of
rewriting them.
[!NOTE] Skipped blocks are left unchanged. This is intentional: preserving runtime behavior is more important than formatting every SQL-looking string.
Skipped by default:
query = f"""
SELECT *
FROM users
WHERE user_id = {user_id}
"""
query = """
SELECT *
FROM users
WHERE ds = '{{ ds }}'
"""
py-sql-cleaner does not:
- connect to databases
- execute SQL
- validate SQL against a database
- inspect schemas
- provide autocomplete
- guarantee full database compatibility
- fully support f-strings
- fully support Jinja templates
- format every possible SQL string
Commands
| Command | Purpose | Example |
|---|---|---|
list |
List embedded SQL blocks | py-sql-cleaner list jobs/load_users.py |
format |
Format embedded SQL in place | py-sql-cleaner format jobs/load_users.py |
check |
Check whether embedded SQL is formatted | py-sql-cleaner check jobs/load_users.py |
extract |
Extract embedded SQL into .sql files |
py-sql-cleaner extract jobs/load_users.py --out-dir sql |
dialects |
List accepted dialect values | py-sql-cleaner dialects |
Documentation
Status
py-sql-cleaner is currently an early MVP.
The current focus is:
- Python files
- triple-quoted SQL strings
- SQLGlot-backed SQL formatting, defaulting to generic SQL with
--dialectsupport for explicitly enabled database-specific formatting - formatting
- extracting SQL into
.sqlfiles
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_sql_cleaner-0.1.0.tar.gz.
File metadata
- Download URL: py_sql_cleaner-0.1.0.tar.gz
- Upload date:
- Size: 173.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3dd93673f0959db877d85e9ec4477c53f6d4bf216826d1d1cdb3aa20526a733b
|
|
| MD5 |
4b9975ffb26295b1f24e9561caf103aa
|
|
| BLAKE2b-256 |
c0952ad0bc479364788ab7bdaa946caee64bda779380fd7f2962d90e220ab672
|
File details
Details for the file py_sql_cleaner-0.1.0-py3-none-any.whl.
File metadata
- Download URL: py_sql_cleaner-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da0ffe701029e3d77a1ee939b3936bf8d0021f372c0327b0159d62442d60e34f
|
|
| MD5 |
49c22666bf7aadc6ddb7321d2e92eeee
|
|
| BLAKE2b-256 |
3920ebddf3de2c247c022d72dec1075ba19deac61ce63869e76ddafc69e53393
|