Skip to main content

Find, format, and extract SQL embedded in Python files.

Project description

py-sql-cleaner

py-sql-cleaner is a CLI tool for finding, formatting, and extracting SQL embedded in Python files.

It is built for Python codebases where long SQL queries are written directly inside triple-quoted Python strings. The current MVP uses SQLGlot for formatting, defaults to SQLGlot's generic dialect, and can format with database-specific dialects that this project has explicitly enabled via --dialect.

query = """
select user_id, updated_at
from users
qualify row_number() over(partition by user_id order by updated_at desc)=1
"""

py-sql-cleaner can format that SQL in place, or extract it into an external .sql file.

py-sql-cleaner is an early MVP. It uses SQLGlot internally for best-effort SQL formatting. It does not connect to databases and does not execute SQL. Dialect support means SQLGlot parser/formatter mode selection, not exhaustive database validation. Redshift command-style statements such as COPY and UNLOAD are preserved rather than reformatted to avoid changing load/export options.

[!NOTE] py-sql-cleaner is conservative by default: f-strings and Jinja-like templates are detected but skipped instead of being rewritten.

Features

  • Format SQL embedded in Python triple-quoted strings
  • Extract embedded SQL into external .sql files
  • Replace embedded SQL strings with file references
  • Detect common SQL variable names such as sql, query, *_sql, and *_query
  • Skip unsafe blocks, including f-strings and Jinja-like templates, by default
  • Support explicit dialect selection with --dialect / -d
  • Support check mode for CI
  • Support dry-run mode before rewriting files

Installation

Install py-sql-cleaner from PyPI:

pip install py-sql-cleaner

Or install it as an isolated CLI tool with pipx:

pipx install py-sql-cleaner

Release archives and wheels are also attached to GitHub Releases.

You can also run it without installing:

uvx py-sql-cleaner --help

Quick Start

  1. List embedded SQL blocks:

    py-sql-cleaner list jobs/load_users.py
    
  2. Preview formatting changes:

    py-sql-cleaner format jobs/load_users.py --dry-run
    
  3. Format embedded SQL in place:

    py-sql-cleaner format jobs/load_users.py
    
  4. Format with a database-specific dialect:

    py-sql-cleaner format jobs/load_users.py -d redshift
    

    Currently enabled dialects are generic, mysql, postgres, and redshift.

  5. Extract embedded SQL into .sql files:

    py-sql-cleaner extract jobs/load_users.py --out-dir sql
    
  6. Check formatting for CI:

    py-sql-cleaner check jobs/load_users.py
    

Example

Before:

query = """
select user_id, updated_at
from users
qualify row_number() over(partition by user_id order by updated_at desc)=1
"""

After py-sql-cleaner format jobs/load_users.py:

query = """
SELECT
  user_id,
  updated_at
FROM users
QUALIFY
  ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY updated_at DESC) = 1
"""

Exact formatting is produced by SQLGlot and may change as SQLGlot changes.

After py-sql-cleaner extract jobs/load_users.py --out-dir sql:

query = "sql/query.sql"

Supported Input

The current MVP targets Python triple-quoted strings.

Supported:

query = """
SELECT *
FROM users
"""
load_users_sql = '''
SELECT *
FROM users
'''

Not targeted in the MVP:

query = "SELECT * FROM users"

Safety

py-sql-cleaner is conservative by default. It skips unsafe blocks instead of rewriting them.

[!NOTE] Skipped blocks are left unchanged. This is intentional: preserving runtime behavior is more important than formatting every SQL-looking string.

Skipped by default:

query = f"""
SELECT *
FROM users
WHERE user_id = {user_id}
"""
query = """
SELECT *
FROM users
WHERE ds = '{{ ds }}'
"""

py-sql-cleaner does not:

  • connect to databases
  • execute SQL
  • validate SQL against a database
  • inspect schemas
  • provide autocomplete
  • guarantee full database compatibility
  • fully support f-strings
  • fully support Jinja templates
  • format every possible SQL string

Commands

Command Purpose Example
list List embedded SQL blocks py-sql-cleaner list jobs/load_users.py
format Format embedded SQL in place py-sql-cleaner format jobs/load_users.py
check Check whether embedded SQL is formatted py-sql-cleaner check jobs/load_users.py
extract Extract embedded SQL into .sql files py-sql-cleaner extract jobs/load_users.py --out-dir sql
dialects List accepted dialect values py-sql-cleaner dialects

Documentation

Status

py-sql-cleaner is currently an early MVP.

The current focus is:

  • Python files
  • triple-quoted SQL strings
  • SQLGlot-backed SQL formatting, defaulting to generic SQL with --dialect support for explicitly enabled database-specific formatting
  • formatting
  • extracting SQL into .sql files

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_sql_cleaner-0.1.0.tar.gz (173.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_sql_cleaner-0.1.0-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file py_sql_cleaner-0.1.0.tar.gz.

File metadata

  • Download URL: py_sql_cleaner-0.1.0.tar.gz
  • Upload date:
  • Size: 173.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for py_sql_cleaner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3dd93673f0959db877d85e9ec4477c53f6d4bf216826d1d1cdb3aa20526a733b
MD5 4b9975ffb26295b1f24e9561caf103aa
BLAKE2b-256 c0952ad0bc479364788ab7bdaa946caee64bda779380fd7f2962d90e220ab672

See more details on using hashes here.

File details

Details for the file py_sql_cleaner-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: py_sql_cleaner-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for py_sql_cleaner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 da0ffe701029e3d77a1ee939b3936bf8d0021f372c0327b0159d62442d60e34f
MD5 49c22666bf7aadc6ddb7321d2e92eeee
BLAKE2b-256 3920ebddf3de2c247c022d72dec1075ba19deac61ce63869e76ddafc69e53393

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page