Skip to main content

The project aims to remove the guesswork of selecting columns to be used in the ZORDER statement. It achieves this by analyzing the logged execution plan for each cluster provided and returns the top n columns that were used in filter/where clauses.

Project description

Installation

pip install in your Databricks Notebook

%pip install auto_zorder

Example Usage

Note: If the cluster log delivery has not been active for very long then you may not see any results.

Basic Usage

optimize_cmd = auto_zorder(

                    cluster_ids=['cluster_id_1', 'cluster_id_2'],

                    optimize_table='my_db.my_table'

                    )



print(optimize_cmd)

>>> 'OPTIMIZE my_db.my_table ZORDER BY (col1, col2, col3, col4, col5)'



# To run the OPTIMIZE Command

spark.sql(optimize_cmd)

Limit the Number of ZORDER columns

optimize_cmd = auto_zorder(

                    cluster_ids=['cluster_id_1', 'cluster_id_2'],

                    optimize_table='my_db.my_table',

                    number_of_cols=2

                    )



print(optimize_cmd)

>>> 'OPTIMIZE my_db.my_table ZORDER BY (col1, col2)'

Save auto zorder analysis

optimize_cmd = auto_zorder(

                    cluster_ids=['cluster_id_1'],

                    optimize_table='my_db.my_table',

                    save_analysis='my_db.my_analysis'

                    )

Run auto zorder using analysis instead of cluster logs

optimize_cmd = auto_zorder(

                    use_analysis='my_db.my_analysis',

                    optimize_table='my_db.my_table'

                    )

Include additional columns and location in ZORDER

optimize_cmd = auto_zorder(

                    cluster_ids=['cluster_id_1', 'cluster_id_2'],

                    optimize_table='my_db.my_table',

                    use_add_cols=[('add_col1', 0), ('add_col2', 4)]

                    )



print(optimize_cmd)

>>> 'OPTIMIZE my_db.my_table ZORDER BY (add_col1, auto_col1, auto_col2, auto_col3, add_col2, auto_col4, auto_col5)'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_zorder-0.1.1.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

auto_zorder-0.1.1-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file auto_zorder-0.1.1.tar.gz.

File metadata

  • Download URL: auto_zorder-0.1.1.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.8

File hashes

Hashes for auto_zorder-0.1.1.tar.gz
Algorithm Hash digest
SHA256 13698054b2e6997f5c3cb7ac8e15c303ad4a8cba9cff5a1545151860c1f900ae
MD5 5472329673bd51eb301930ae51b60c59
BLAKE2b-256 5c3ad96eeffa8e3c798d9d9c609282b081db44b6e09a07a52bcc573efe9e3486

See more details on using hashes here.

File details

Details for the file auto_zorder-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: auto_zorder-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.8

File hashes

Hashes for auto_zorder-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dbc0e5f126edc302adceeb1ebc2ad9c384ed0ed52ac4229b94c1d1cb5c8e971f
MD5 fc66392602d783fffb30300a43b1589c
BLAKE2b-256 2ed570c40108a19f87a29f8edb163019c3aa545901ab9663ad5ec5e6e188860a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page