Databricks utility to identify which column to use for z-ordering and partitioning.

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Libraries :: Python Modules

Project description

Databricks optimize utility

Library to generate more easily optimize statement based on existing data.

Build from source

Build the wheel file is enough to run

python -m build --wheel

Table optimizer

This object can be initialized with a table object name (with three level namespace "catalog.schema.table"), the method pre_optimization will run some statistics to identify the best column to run z-order (column with highest cardinality) and to re-partition (this depends on the size of the table and the size of the generated files partition). This is used to define and print the optimization query. Then to run the optimization itself we have two additional method, run_optimize and run_partition, these two can be executed even on their own without the need to run the pre_optimization, in this case the method will run the statistics and the generation separately.

from dbks_optimize.optimizer import TableOptimizer

opt = TableOptimizer(spark,"testing.bakehouse.sales_customers",force_partition_on_col='customerID')

opt.pre_optimization() #print out the optimization statement

opt.run_optimizer() #execute optimize on the table

opt.run_partition() #execute partitioning query (it clone i a new table and overwrite the existing one)

Schema optimizer

This object leverage the existing class for single table object and iterate over list of table available computing table statistics. pre_optimize is used to generate the optimize statement, while to run real optimization we can use run_db_optimization.

from dbks_optimize.optimizer import SchemaOptimizer

opt = SchemaOptimizer(spark,'testing.bakehouse')

opt.pre_optimization()

opt.run_db_optimization()

Catalog optimizer

Using the same logic iterate over all schemas and for each on on all tables to generate the statement.

from dbks_optimize.optimizer import CatalogOptimizer

opt = CatalogOptimizer(spark,'testing')

opt.pre_optimization()

opt.run_catalog_optimization()

Project details

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

This version

0.2.2

Sep 6, 2025

0.2.0

Aug 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_optimize-0.2.2.tar.gz (5.7 kB view details)

Uploaded Sep 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

databricks_optimize-0.2.2-py3-none-any.whl (6.4 kB view details)

Uploaded Sep 6, 2025 Python 3

File details

Details for the file databricks_optimize-0.2.2.tar.gz.

File metadata

Download URL: databricks_optimize-0.2.2.tar.gz
Upload date: Sep 6, 2025
Size: 5.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databricks_optimize-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`1fe4faf391322d10297639959bc7a83908986501637b9a63513fe12394012e12`
MD5	`bba0d0a790b6d374605d99eaa81e85a1`
BLAKE2b-256	`c2ad55ea77db051e3d2726fc3c150f148b21924cc0a9eca4de2b5f72afbbaa41`

See more details on using hashes here.

File details

Details for the file databricks_optimize-0.2.2-py3-none-any.whl.

File metadata

Download URL: databricks_optimize-0.2.2-py3-none-any.whl
Upload date: Sep 6, 2025
Size: 6.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databricks_optimize-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`06965b532d22a0ebd8fdaa7ea58b9f8652244bac27f19da8687f726c46f87c63`
MD5	`2240dcc639085d88ae780a22a71bb3a2`
BLAKE2b-256	`d5ae9d2a69ee81258fb6310653bd727bf1639b4e30029b7bbf895798aded8eae`

See more details on using hashes here.

databricks-optimize 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Databricks optimize utility

Build from source

Table optimizer

Schema optimizer

Catalog optimizer

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes