Python обертка для Java-оптимизатора SQL запросов на базе Apache Calcite
Project description
SQL Optimizer Python Package
Python обертка для Java-оптимизатора SQL запросов на базе Apache Calcite.
Возможности
- Python API: Простой в использовании Python интерфейс для оптимизации SQL
- Java Backend: Использует мощь Apache Calcite для оптимизации запросов
- Поддержка метаданных: Определение пользовательских схем таблиц и типов данных
- Правила оптимизации: Применение пользовательских стратегий оптимизации
- CLI интерфейс: Командная строка для быстрой оптимизации
- Автономный пакет: Включает Java JAR файл - не требует отдельной установки Java
Требования
- Python 3.8 или выше
- Java 1.8 или выше (автоопределяется)
Установка
Из Wheel пакета
pip install sql-optimizer-1.0.0-py3-none-any.whl
Из исходного кода
# Клонирование репозитория
git clone https://github.com/sql-optimizer/sql-optimizer-python.git
cd sql-optimizer-python
# Установка в режиме разработки
pip install -e .
Quick Start
Basic Usage
from sql_optimizer_python import SqlOptimizer
# Initialize optimizer
optimizer = SqlOptimizer()
# Optimize a simple query
result = optimizer.optimize("SELECT * FROM employees")
print(result)
With Custom Metadata
# Define table metadata
metadata = {
"custom_table": {
"columns": {
"id": {"type": "INTEGER"},
"name": {"type": "VARCHAR"},
"age": {"type": "INTEGER"}
},
"rowCount": 1000
}
}
# Optimize with metadata
result = optimizer.optimize(
"SELECT * FROM custom_table WHERE age > 25",
metadata=metadata
)
With Optimization Rules
from sql_optimizer_python import OptimizationRule
# Create optimization rules
rules = [
OptimizationRule(
name="push_down_filter",
description="Push filter operations down",
priority=100,
conditions=[{"type": "node_type", "operator": "contains", "value": "TableScan"}],
transformations=[{"action": "push_down", "target": "filter", "cost_reduction": 25.0}]
)
]
# Optimize with rules
result = optimizer.optimize(
"SELECT * FROM employees WHERE salary > 50000",
optimization_rules=rules
)
Command Line Interface
Basic Optimization
sql-optimizer "SELECT * FROM employees"
With Metadata File
sql-optimizer "SELECT * FROM custom_table" --metadata-file examples/custom_table_metadata.json
With Optimization Rules
sql-optimizer "SELECT * FROM employees" --rules-file examples/basic_optimization_rules.json
Check Java Runtime
sql-optimizer --check-java
Pretty Print Output
sql-optimizer "SELECT * FROM employees" --pretty
API Reference
SqlOptimizer
Main class for SQL query optimization.
class SqlOptimizer:
def __init__(self, java_home: Optional[str] = None)
def optimize(self, sql_query: str,
metadata: Optional[Union[Dict, str]] = None,
optimization_rules: Optional[Union[List[OptimizationRule], str]] = None,
metadata_file: Optional[str] = None,
rules_file: Optional[str] = None) -> Dict[str, Any]
def optimize_from_files(self, sql_query: str,
metadata_file: Optional[str] = None,
rules_file: Optional[str] = None) -> Dict[str, Any]
def get_version(self) -> str
def get_java_info(self) -> Dict[str, str]
OptimizationRule
Represents an optimization rule.
class OptimizationRule:
def __init__(self, name: str, description: str, priority: int = 100,
enabled: bool = True, conditions: Optional[List[Dict]] = None,
transformations: Optional[List[Dict]] = None,
metadata: Optional[Dict] = None)
def to_dict(self) -> Dict[str, Any]
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> 'OptimizationRule'
OptimizationEngine
Manages optimization rules.
class OptimizationEngine:
def __init__(self)
def add_rule(self, rule: OptimizationRule) -> None
def add_rules_from_file(self, file_path: str) -> None
def add_rules_from_json(self, rules_json: str) -> None
def get_rules_json(self) -> str
def clear_rules(self) -> None
Metadata Format
Table metadata is defined in JSON format:
{
"table_name": {
"columns": {
"column_name": {
"type": "INTEGER|VARCHAR|DOUBLE|BOOLEAN",
"nullable": true|false
}
},
"rowCount": 1000
}
}
Optimization Rules Format
Optimization rules are defined in JSON format:
[
{
"name": "rule_name",
"description": "Rule description",
"priority": 100,
"enabled": true,
"conditions": [
{
"type": "node_type|table_name|cost_threshold|field_count",
"operator": "equals|contains|greater_than|less_than",
"value": "value",
"negated": false
}
],
"transformations": [
{
"action": "push_down|reorder|optimize|prune|replace",
"target": "filter|projection|join|scan|aggregate",
"parameters": {},
"cost_reduction": 25.0
}
],
"metadata": {
"category": "filter_optimization",
"author": "system"
}
}
]
Output Format
The optimizer returns a JSON object with the following structure:
{
"originalQuery": "SELECT * FROM employees",
"metadata": { /* table metadata if provided */ },
"optimizedPlan": {
"relType": "LogicalTableScan",
"rowType": {
"fieldCount": 4,
"nullable": false,
"fields": {
"id": "INTEGER",
"name": "VARCHAR",
"department": "VARCHAR",
"salary": "INTEGER"
}
},
"cost": {
"cpu": 101.0,
"io": 0.0,
"rows": 100.0,
"infinite": false
}
},
"optimizationStats": {
"appliedRules": ["push_down_filter"],
"totalRulesConsidered": 3,
"optimizationDetails": {}
},
"explanation": "LogicalTableScan(table=[[TEST, employees]])"
}
Examples
Example 1: Basic Query Optimization
from sql_optimizer_python import SqlOptimizer
optimizer = SqlOptimizer()
result = optimizer.optimize("SELECT * FROM employees WHERE salary > 50000")
print(result["optimizedPlan"]["relType"])
Example 2: Custom Table with Metadata
metadata = {
"users": {
"columns": {
"id": {"type": "INTEGER"},
"username": {"type": "VARCHAR"},
"email": {"type": "VARCHAR"},
"active": {"type": "BOOLEAN"}
},
"rowCount": 10000
}
}
result = optimizer.optimize(
"SELECT username, email FROM users WHERE active = true",
metadata=metadata
)
Example 3: Custom Optimization Rules
from sql_optimizer_python import OptimizationRule
rules = [
OptimizationRule(
name="index_scan_optimization",
description="Use index scan for large tables",
priority=80,
conditions=[
{"type": "node_type", "operator": "equals", "value": "LogicalTableScan"},
{"type": "cost_threshold", "operator": "greater_than", "value": 1000.0}
],
transformations=[
{"action": "optimize", "target": "scan", "cost_reduction": 50.0}
]
)
]
result = optimizer.optimize(
"SELECT * FROM large_table",
optimization_rules=rules
)
Error Handling
The package provides custom exceptions for different error types:
from sql_optimizer_python import (
SqlOptimizerError, JavaRuntimeError, OptimizationError,
MetadataError, RulesError
)
try:
result = optimizer.optimize("SELECT * FROM table")
except JavaRuntimeError as e:
print(f"Java runtime issue: {e}")
except OptimizationError as e:
print(f"Optimization failed: {e}")
except MetadataError as e:
print(f"Metadata error: {e}")
except RulesError as e:
print(f"Rules error: {e}")
Development
Running Tests
pytest tests/
Building Wheel
python setup.py bdist_wheel
Installing in Development Mode
pip install -e .
License
MIT License - see LICENSE file for details.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
Support
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sql_optimizer-1.0.3-py3-none-any.whl.
File metadata
- Download URL: sql_optimizer-1.0.3-py3-none-any.whl
- Upload date:
- Size: 18.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07c16a9fa9741a2a5a14160587924de8f4009ea0c68953dcdee73a07c75322c9
|
|
| MD5 |
11eed3597e949b44498028b0a5b0b6ca
|
|
| BLAKE2b-256 |
cda7d2556d1c30695828feda433c9aa29281d562072009e819e7a10b569821c1
|