Skip to main content

Delta Lake helper methods

Project description

Levi

Delta Lake helper methods. No Spark dependency.

Installation

Install the latest version with pip install levi.

Delta File Stats

The delta_file_stats function provides information on the number of bytes in files of a Delta table. Example usage:

import levi
from deltalake import DeltaTable

dt = DeltaTable("some_folder/some_table")
levi.delta_file_sizes(dt)

# return value
{
    'num_files_<1mb': 345, 
    'num_files_1mb-500mb': 588,
    'num_files_500mb-1gb': 960,
    'num_files_1gb-2gb': 0, 
    'num_files_>2gb': 5
}

This output shows that there are 345 small files with less than 1mb of data and 5 huge files with more than 2gb of data. It'd be a good idea to compact the small files and split up the large files to make queries on this Delta table run faster.

You can also specify the boundaries when you invoke the function to get a custom result:

levi.delta_file_sizes(dt, ["<1mb", "1mb-200mb", "200mb-800mb", "800mb-2gb", ">2gb"])

Skipped stats

Provides information on the number of files and number of bytes that are skipped for a given set of predicates.

import levi

dt = DeltaTable("some_folder/some_table")
levi.skipped_stats(dt, filters=[('a_float', '=', 4.5)])

# return value
{
    'num_files': 2,
    'num_files_skipped': 1,
    'num_bytes_skipped': 996
}

This predicate will skip one file and 996 bytes of data.

You can use skipped_stats to figure out the percentage of files that get skipped. You can also use this information to see if you should Z ORDER your data or otherwise rearrange it to allow for better file skipping.

Get Latest Delta Table Version

The latest_version function gets the most current Delta Table version number and returns it.

import levi
from deltalake import DeltaTable

dt = DeltaTable("some_folder/some_table")
levi.latest_version(dt)

# return value
2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

levi-0.3.0.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

levi-0.3.0-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file levi-0.3.0.tar.gz.

File metadata

  • Download URL: levi-0.3.0.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.9.5 Darwin/20.3.0

File hashes

Hashes for levi-0.3.0.tar.gz
Algorithm Hash digest
SHA256 1e3166637baa3e080415ab052ef91524b35600432e198e62802955472d48ba96
MD5 71fdd6cac9cf5e0aa42ddb811cf8d16c
BLAKE2b-256 f67d80de65f3f6d438fcae989731e27ebbf38b8e6d65898ad01203d49de43217

See more details on using hashes here.

File details

Details for the file levi-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: levi-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.9.5 Darwin/20.3.0

File hashes

Hashes for levi-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 383e5d7f34a1dfb39b209e2aa64c5880c7970c9a25e6eb1c364877ce374d30d6
MD5 317cdb8404145d4c4aced751a9372704
BLAKE2b-256 98ded4712434eede14d5be79cd67f8316fd88b15a2b2ea7f6a2fdade3c07cc00

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page