The project aims to remove the guesswork of selecting columns to be used in the ZORDER statement. It achieves this by analyzing the logged execution plan for each cluster provided and returns the top n columns that were used in filter/where clauses.
Project description
Installation
pip install in your Databricks Notebook
%pip install auto_zorder
Example Usage
Note: If the cluster log delivery has not been active for very long then you may not see any results.
Basic Usage
optimize_cmd = auto_zorder(
cluster_ids=['cluster_id_1', 'cluster_id_2'],
optimize_table='my_db.my_table'
)
print(optimize_cmd)
>>> 'OPTIMIZE my_db.my_table ZORDER BY (col1, col2, col3, col4, col5)'
# To run the OPTIMIZE Command
spark.sql(optimize_cmd)
Limit the Number of ZORDER columns
optimize_cmd = auto_zorder(
cluster_ids=['cluster_id_1', 'cluster_id_2'],
optimize_table='my_db.my_table',
number_of_cols=2
)
print(optimize_cmd)
>>> 'OPTIMIZE my_db.my_table ZORDER BY (col1, col2)'
Save auto zorder analysis
optimize_cmd = auto_zorder(
cluster_ids=['cluster_id_1'],
optimize_table='my_db.my_table',
save_analysis='my_db.my_analysis'
)
Run auto zorder using analysis instead of cluster logs
optimize_cmd = auto_zorder(
use_analysis='my_db.my_analysis',
optimize_table='my_db.my_table'
)
Include additional columns and location in ZORDER
optimize_cmd = auto_zorder(
cluster_ids=['cluster_id_1', 'cluster_id_2'],
optimize_table='my_db.my_table',
use_add_cols=[('add_col1', 0), ('add_col2', 4)]
)
print(optimize_cmd)
>>> 'OPTIMIZE my_db.my_table ZORDER BY (add_col1, auto_col1, auto_col2, auto_col3, add_col2, auto_col4, auto_col5)'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
auto_zorder-0.0.4.tar.gz
(5.5 kB
view hashes)
Built Distribution
Close
Hashes for auto_zorder-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bf4893237ada40edc21a2fc47e46dbf9317cf59c786bd153d2cc7fcca973867 |
|
MD5 | 1b03c9f287ae6bf8c1c5cd182fcd4104 |
|
BLAKE2b-256 | 8c78c94a47f66863dab78dd5e349265588581e11d6b2b5cb961f502a2c0370e0 |