Skip to main content

A Python library for intelligent SQL predicate manipulation using SQLGlot. This library provides tools to safely remove specific predicates from `SQL WHERE` clauses and filter SQL statements based on column availability.

Project description

Query.Farm SQL Manipulation

A Python library for intelligent SQL predicate manipulation using SQLGlot

Column Filtering with Complex Expressions

import sqlglot
from query_farm_sql_manipulation import transforms

sql = '''
SELECT * FROM users
WHERE age > 18
  AND (status = 'active' OR role = 'admin')
  AND department IN ('engineering', 'sales')
'''

# Parse the statement first
statement = sqlglot.parse_one(sql, dialect="duckdb")

# Only keep predicates involving 'age' and 'role'
allowed_columns = {'age', 'role'}

result = transforms.filter_column_references(
    statement=statement,
    selector=lambda col: col.name in allowed_columns,
)

# Result: SELECT * FROM users WHERE age > 18 AND role = 'admin'
print(result.sql())

Features

  • Predicate Removal: Safely remove specific predicates from complex SQL WHERE clauses while preserving logical structure
  • Column Filtering: Filter SQL statements to only include predicates referencing allowed columns
  • Intelligent Logic Handling: Properly handles AND/OR logic, nested expressions, CASE statements, and parentheses
  • SQLGlot Integration: Built on top of SQLGlot for robust SQL parsing and manipulation
  • Multiple Dialect Support: Works with various SQL dialects (default: DuckDB)

Installation

pip install query-farm-sql-manipulation

Requirements

  • Python >= 3.12
  • SQLGlot >= 26.33.0

Quick Start

Basic Predicate Removal

import sqlglot
from query_farm_sql_manipulation import transforms

# Parse a SQL statement
sql = 'SELECT * FROM data WHERE x = 1 AND y = 2'
statement = sqlglot.parse_one(sql, dialect="duckdb")

# Find the predicate you want to remove
predicates = list(statement.find_all(sqlglot.expressions.Predicate))
target_predicate = predicates[0]  # x = 1

# Remove the predicate
transforms.remove_expression_part(target_predicate)

# Result: SELECT * FROM data WHERE y = 2
print(statement.sql())

Column-Name Based Filtering

import sqlglot
from query_farm_sql_manipulation import transforms

# Parse SQL statement first
sql = 'SELECT * FROM data WHERE color = "red" AND size > 10 AND type = "car"'
statement = sqlglot.parse_one(sql, dialect="duckdb")

# Filter to only include predicates with allowed columns
allowed_columns = {"color", "type"}

filtered = transforms.filter_column_references(
    statement=statement,
    selector=lambda col: col.name in allowed_columns,
)

# Result: SELECT * FROM data WHERE color = "red" AND type = "car"
print(filtered.sql())

API Reference

remove_expression_part(child: sqlglot.Expression) -> None

Removes the specified SQLGlot expression from its parent, respecting logical structure.

Parameters:

  • child: The SQLGlot expression to remove

Raises:

  • ValueError: If the expression cannot be safely removed

Supported Parent Types:

  • AND/OR expressions: Replaces parent with the remaining operand
  • WHERE clauses: Removes the entire WHERE clause if it becomes empty
  • Parentheses: Recursively removes the parent
  • NOT expressions: Removes the entire NOT expression
  • CASE statements: Removes conditional branches

filter_column_references(*, statement: sqlglot.Expression, selector: Callable[[sqlglot.expressions.Column], bool]) -> sqlglot.Expression

Filters a SQL statement to remove predicates containing columns that don't match the selector criteria.

Parameters:

  • statement: The SQLGlot expression to filter
  • selector: A callable that takes a Column and returns True if it should be preserved, False if it should be removed

Returns:

  • Filtered SQLGlot expression with non-matching columns removed

Raises:

  • ValueError: If a column can't be cleanly removed due to interactions with allowed columns

where_clause_contents(statement: sqlglot.expressions.Expression) -> sqlglot.expressions.Expression | None

Extracts the contents of the WHERE clause from a SQLGlot expression.

Parameters:

  • statement: The SQLGlot expression to extract from

Returns:

  • The contents of the WHERE clause, or None if no WHERE clause exists

filter_predicates_with_right_side_column_references(statement: sqlglot.expressions.Expression) -> sqlglot.Expression

Filters out predicates that have column references on the right side of comparisons.

Parameters:

  • statement: The SQLGlot expression to filter

Returns:

  • Filtered SQLGlot expression with right-side column reference predicates removed

Examples

Complex Logic Handling

The library intelligently handles complex logical expressions:

# Original: (x = 1 AND y = 2) OR z = 3
# Remove y = 2: x = 1 OR z = 3

# Original: NOT (x = 1 AND y = 2)
# Remove x = 1: NOT y = 2 (which becomes y <> 2)

# Original: CASE WHEN x = 1 THEN 'yes' WHEN x = 2 THEN 'maybe' ELSE 'no' END
# Remove x = 1: CASE WHEN x = 2 THEN 'maybe' ELSE 'no' END

Column Filtering with Complex Expressions

sql = '''
SELECT * FROM users
WHERE age > 18
  AND (status = 'active' OR role = 'admin')
  AND department IN ('engineering', 'sales')
'''

# Only keep predicates involving 'age' and 'role'
allowed_columns = {'age', 'role'}

result = transforms.filter_column_references_statement(
    sql=sql,
    selector=lambda col: col.name in allowed_columns,
)

# Result: SELECT * FROM users WHERE age > 18 AND role = 'admin'

Error Handling

The library will raise ValueError when predicates cannot be safely removed:

import sqlglot
from query_farm_sql_manipulation import transforms

# This will raise ValueError because x = 1 is part of a larger expression
sql = "SELECT * FROM data WHERE result = (x = 1)"
statement = sqlglot.parse_one(sql, dialect="duckdb")

# Cannot remove x = 1 because it's used as a value, not a predicate
# This would raise ValueError if attempted

Supported SQL Constructs

  • Logical Operators: AND, OR, NOT
  • Comparison Operators: =, <>, <, >, <=, >=, LIKE, IN, IS NULL, etc.
  • Complex Expressions: CASE statements, subqueries, function calls
  • Nested Logic: Parentheses and nested boolean expressions
  • Multiple Dialects: DuckDB, PostgreSQL, MySQL, SQLite, and more via SQLGlot

Testing

Run the test suite:

pytest src/query_farm_sql_manipulation/test_transforms.py

The test suite includes comprehensive examples of:

  • Basic predicate removal scenarios
  • Complex logical expression handling
  • Error cases and edge conditions
  • Column filtering with various SQL constructs

Contributing

This project uses:

  • Rye for dependency management
  • pytest for testing
  • mypy for type checking
  • ruff for linting

Author

This Python module was created by Query.Farm.

License

MIT Licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

query_farm_sql_manipulation-0.1.7.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

query_farm_sql_manipulation-0.1.7-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file query_farm_sql_manipulation-0.1.7.tar.gz.

File metadata

File hashes

Hashes for query_farm_sql_manipulation-0.1.7.tar.gz
Algorithm Hash digest
SHA256 6a31857fed4e507d94591b941bcd0b8aa397a46883d165b57f5e7577255bd1d3
MD5 bccbaba254efde48cfcc30dcc7d72276
BLAKE2b-256 915fd0536c2af7b6fa5ada660ff6a63e6d0c0673e63de0be021bda6f31fea025

See more details on using hashes here.

File details

Details for the file query_farm_sql_manipulation-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for query_farm_sql_manipulation-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b1c7046ff506ff118fa4cc86524a778976507e6d853ee146a6d784097e5620fa
MD5 a6a0babef23a4be3b602099108ae6ea1
BLAKE2b-256 0b98a09a5d90ea27cb7b84537cb90fdf9ae2628f78ab0b965e0c5207840656a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page