SQL performance analysis and optimization advisor with SQLMesh integration
Project description
Lombardi ๐ฏ
SQL performance analysis and optimization advisor with SQLMesh integration
Lombardi is a powerful SQL performance analysis tool that helps you identify bottlenecks, detect antipatterns, and optimize your queries across different data warehouses. Built on top of SQLGlot's semantic parsing, it provides actionable insights for better query performance.
โจ Features
๐ Semantic SQL Analysis
- Complexity Scoring: Quantitative assessment (0-100) based on joins, subqueries, nesting depth
- Antipattern Detection: Identifies common performance killers like SELECT *, cartesian joins, functions in WHERE clauses
- Optimization Suggestions: Actionable recommendations with specific examples
๐ข Warehouse-Specific Rules
- Snowflake: Clustering keys, result caching, VARIANT optimization, time travel suggestions
- BigQuery: Partitioning recommendations, slot efficiency, materialized views, array operations
- Universal: Cross-platform optimizations that work everywhere
๐ ๏ธ Integration Ready
- SQLMesh Integration: Custom audit for performance checks during
sqlmesh plan - CLI Tool: Rich terminal output with colors, tables, and formatting
- Python API: Programmatic access for custom workflows
- Multiple Output Formats: Rich terminal, JSON, plain text
๐ Quick Start
Installation
pip install lombardi
CLI Usage
# Analyze a SQL file
lombardi analyze query.sql --dialect snowflake
# Analyze a query string
lombardi analyze --sql "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id"
# Get just complexity metrics
lombardi complexity --sql "SELECT COUNT(*) FROM large_table WHERE date_col >= '2024-01-01'"
# Export results as JSON
lombardi analyze query.sql --format json > analysis.json
# BigQuery-specific analysis
lombardi analyze --sql "SELECT * FROM dataset.table" --dialect bigquery
Python API
from lombardi import ComplexityAnalyzer, AntipatternDetector, OptimizationSuggester
# Analyze query complexity
analyzer = ComplexityAnalyzer(dialect="snowflake")
metrics = analyzer.analyze("SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id")
print(f"Complexity Score: {metrics.complexity_score}/100")
print(f"Join Count: {metrics.join_count}")
print(f"Subqueries: {metrics.subquery_count}")
# Detect antipatterns
detector = AntipatternDetector()
issues = detector.detect("SELECT * FROM users WHERE UPPER(name) = 'JOHN'")
for issue in issues:
print(f"โ {issue.pattern}: {issue.description}")
# Get optimization suggestions
suggester = OptimizationSuggester(dialect="bigquery")
suggestions = suggester.suggest("SELECT * FROM large_table WHERE date_col >= '2024-01-01'")
for suggestion in suggestions:
print(f"๐ก {suggestion.title}: {suggestion.description}")
๐ Example Output
$ lombardi analyze --sql "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE UPPER(c.name) LIKE '%ACME%'"
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโ SQL Analysis Results โโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Complexity Score: 12.0/100 โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Complexity Metrics
โโโโโโโโโโโโโโโโโโณโโโโโโโโ
โ Metric โ Value โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ Join Count โ 1 โ
โ Subquery Count โ 0 โ
โ CTE Count โ 0 โ
โ Function Count โ 1 โ
โ Nesting Depth โ 0 โ
โ Table Count โ 2 โ
โ Column Count โ 3 โ
โโโโโโโโโโโโโโโโโโดโโโโโโโโ
Detected Issues
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Pattern โ Severity โ Description โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ Select Star โ MEDIUM โ SELECT * can be inefficient โ
โ Function On Column In Where โ HIGH โ Functions prevent index usage โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Optimization Suggestions
โโโโโโโโโโโโโโโณโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Category โ Priority โ Suggestion โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ Performance โ HIGH โ Function On Column In Where: Rewrite to avoid โ
โ โ โ applying functions to indexed columns โ
โ Indexing โ HIGH โ JOIN Index Optimization: Ensure indexes exist on โ
โ โ โ JOIN columns: o.customer_id, c.id โ
โ Performance โ MEDIUM โ Select Star: Explicitly list required columns โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ SQLMesh Integration
Integrate Lombardi into your SQLMesh workflow for automated performance checks:
# In your SQLMesh model
MODEL (
name my_project.optimized_table,
kind FULL,
audits [performance_check]
);
SELECT
region_code,
species_code,
COUNT(*) as observation_count
FROM @DEV.my_project.raw_observations
WHERE observation_date >= '2024-01-01'
GROUP BY 1, 2;
# Configure the audit
from lombardi.integrations.sqlmesh_audit import create_performance_audit
performance_audit = create_performance_audit(
complexity_threshold=30.0,
min_severity="medium",
dialect="snowflake"
)
๐๏ธ Architecture
lombardi/
โโโ analyzers/ # Core analysis engines
โ โโโ complexity_analyzer.py # Complexity scoring
โ โโโ antipattern_detector.py # Antipattern detection
โ โโโ optimization_suggester.py # Optimization recommendations
โโโ rules/ # Warehouse-specific optimizations
โ โโโ snowflake_rules.py # Snowflake optimizations
โ โโโ bigquery_rules.py # BigQuery optimizations
โโโ integrations/ # Third-party integrations
โ โโโ sqlmesh_audit.py # SQLMesh audit integration
โโโ cli.py # Command-line interface
๐ฏ Detected Antipatterns
Lombardi identifies these common SQL performance issues:
| Pattern | Severity | Description |
|---|---|---|
| **SELECT *** | Medium | Can be inefficient and fragile |
| Cartesian Joins | Critical | Missing JOIN conditions |
| Functions in WHERE | High | Prevents index usage |
| Leading Wildcards | Medium | LIKE '%pattern' prevents indexes |
| Missing WHERE | Medium | Full table scans |
| Subquery in SELECT | Medium | Often better as JOINs |
๐ข Warehouse-Specific Optimizations
Snowflake
- โ๏ธ Clustering Keys: Recommendations for large table partitioning
- ๐ Result Caching: Identify cacheable query patterns
- ๐ Warehouse Sizing: Complexity-based sizing suggestions
- ๐ Time Travel: Optimize historical queries
- ๐ VARIANT: Semi-structured data query optimization
BigQuery
- ๐ Partitioning: Date/timestamp partitioning opportunities
- ๐๏ธ Clustering: Multi-column clustering recommendations
- ๐ฐ Slot Efficiency: Cost optimization suggestions
- ๐ Materialized Views: Pre-aggregation opportunities
- ๐ Array Operations: UNNEST and array function optimization
๐ง Configuration
CLI Options
lombardi analyze [OPTIONS] [SQL_FILE]
Options:
--sql, -s TEXT SQL query string to analyze
--dialect, -d TEXT SQL dialect (snowflake, bigquery, etc.)
--threshold, -t FLOAT Complexity threshold (0-100) [default: 50.0]
--severity TEXT Minimum severity (low, medium, high, critical) [default: medium]
--format, -f TEXT Output format (rich, json, plain) [default: rich]
--suggestions/--no-suggestions Include optimization suggestions [default: suggestions]
Python API Configuration
# Initialize with specific dialect
analyzer = ComplexityAnalyzer(dialect="snowflake")
detector = AntipatternDetector(dialect="bigquery")
# Configure thresholds and rules
suggester = OptimizationSuggester(dialect="snowflake")
suggestions = suggester.suggest(sql_query)
# Warehouse-specific analysis
from lombardi.rules.snowflake_rules import SnowflakeRules
snowflake_rules = SnowflakeRules()
sf_suggestions = snowflake_rules.analyze(sql_query)
๐งช Development
Setup
git clone https://github.com/Doctacon/lombardi.git
cd lombardi
uv sync
Testing
# Run tests
uv run pytest tests/ -v
# Test CLI
uv run lombardi analyze --sql "SELECT * FROM test_table"
# Test with coverage
uv run pytest tests/ --cov=lombardi
Building
# Build package
uv build
# Test locally
pip install dist/lombardi-*.whl
๐ Performance Impact
Lombardi helps identify optimizations that can provide:
- Query Speed: 10x-100x faster execution through proper indexing
- Cost Reduction: 50-90% lower warehouse costs via efficient queries
- Resource Usage: Reduced CPU, memory, and I/O through better query patterns
- Maintainability: Cleaner, more readable SQL through antipattern detection
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and add tests
- Run tests:
uv run pytest - Submit a pull request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- SQLGlot: Powers our semantic SQL parsing
- SQLMesh: Inspiration for the audit integration pattern
- Rich: Beautiful terminal output
- Typer: Excellent CLI framework
๐ Links
- Documentation: [Coming Soon]
- PyPI: https://pypi.org/project/lombardi/
- Issues: https://github.com/Doctacon/lombardi/issues
- Discussions: https://github.com/Doctacon/lombardi/discussions
Named after Vince Lombardi, who believed "Perfection is not attainable, but if we chase perfection we can catch excellence" - the same philosophy we apply to SQL optimization.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lombardi-0.1.2.tar.gz.
File metadata
- Download URL: lombardi-0.1.2.tar.gz
- Upload date:
- Size: 96.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e8b2d6ee0f3d128e85e2f35bd881f23228e9743744bf4f7e0572ce9bbe7690e
|
|
| MD5 |
bf60126cae10eff768a51941e55cfc9d
|
|
| BLAKE2b-256 |
800be762851d7727155f486fc5c39e6b8627b14fc37a2ba853470cefb1c51268
|
File details
Details for the file lombardi-0.1.2-py3-none-any.whl.
File metadata
- Download URL: lombardi-0.1.2-py3-none-any.whl
- Upload date:
- Size: 21.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f577a3d0c9f77566f71558756936c345074303bd7427b75354cfb261012b87e
|
|
| MD5 |
9c29b7845ba49e2560fb7e8e972b6ec7
|
|
| BLAKE2b-256 |
d6e93d3148623ea06926e9aa1c785d48012642c4af986ee141b026f8dc0ec55a
|