Skip to main content

The project aims to remove the guesswork of selecting columns to be used in the ZORDER statement. It achieves this by analyzing the logged execution plan for each cluster provided and returns the top n columns that were used in filter/where clauses.

Project description

Installation

pip install in your Databricks Notebook

%pip install auto_zorder

Example Usage

Note: If the cluster log delivery has not been active for very long then you may not see any results.

Basic Usage

optimize_cmd = auto_zorder(

                    cluster_ids=['cluster_id_1', 'cluster_id_2'],

                    optimize_table='my_db.my_table'

                    )



print(optimize_cmd)

>>> 'OPTIMIZE my_db.my_table ZORDER BY (col1, col2, col3, col4, col5)'



# To run the OPTIMIZE Command

spark.sql(optimize_cmd)

Limit the Number of ZORDER columns

optimize_cmd = auto_zorder(

                    cluster_ids=['cluster_id_1', 'cluster_id_2'],

                    optimize_table='my_db.my_table',

                    number_of_cols=2

                    )



print(optimize_cmd)

>>> 'OPTIMIZE my_db.my_table ZORDER BY (col1, col2)'

Save auto zorder analysis

optimize_cmd = auto_zorder(

                    cluster_ids=['cluster_id_1'],

                    optimize_table='my_db.my_table',

                    save_analysis='my_db.my_analysis'

                    )

Run auto zorder using analysis instead of cluster logs

optimize_cmd = auto_zorder(

                    use_analysis='my_db.my_analysis',

                    optimize_table='my_db.my_table'

                    )

Include additional columns and location in ZORDER

optimize_cmd = auto_zorder(

                    cluster_ids=['cluster_id_1', 'cluster_id_2'],

                    optimize_table='my_db.my_table',

                    use_add_cols=[('add_col1', 0), ('add_col2', 4)]

                    )



print(optimize_cmd)

>>> 'OPTIMIZE my_db.my_table ZORDER BY (add_col1, auto_col1, auto_col2, auto_col3, add_col2, auto_col4, auto_col5)'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_zorder-0.0.4.tar.gz (5.5 kB view hashes)

Uploaded Source

Built Distribution

auto_zorder-0.0.4-py3-none-any.whl (4.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page