A collection of utilities and tools for accelerating pyspark development and productivity.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Patek

A collection of reusable pyspark utility functions that help make development easier!

Installation

Patek is available on PyPI and can be installed with pip:

pip install patek

Usage

IO Helpers

Patek provides a set of IO helpers to quickly read and write data from/to various sources in PySpark.

Dynamic Delta Table Writer

The superDeltaWriter function allows you to write data to a Delta table using the merge capability without having to write out every single update and merge condition. This is useful when you have a large number of columns and/or a large number of update conditions.

from patek.io import superDeltaWriter

superDeltaWriter(sparkDataframe, ['key_column1'], 'delta/path', sparkSession, sparkContext, ['update_col1', 'update_col2'])

If update columns are not specified, the default is to update all non-key columns that exist in both the source and target tables. Also, if the target table does not exist, it will be created.

Funnel.io Schema to Spark Schema

The funnelSparkler function allows you to convert a Funnel.io schema to a Spark schema. This is useful to remove ambiguity when reading data from Funnel.io exports into spark dataframes, without having to manually define the schema.

from patek.io import funnelSparkler

dataframe = funnelSparkler('path/to/funnel_schema.json', 'path/to/funnel_export_data', sparkSession, sparkContext, data_file_type='csv')

Utility Functions

Patek provides a set of utility functions to help make development easier.

Determine Key Candidates

The determine_key_candidates function allows you to determine the key candidates for a given dataframe. This is useful when you have a large number of columns in a dataframe and you want to quickly determine which columns are good candidates for a key.

from patek.utils import determine_key_candidates

key_candidates = determine_key_candidates(sparkDataframe)
print(key_candidates)

# Output:
# a list containing single column key candidates: ['column1', 'column2', 'column3']
# a list containing composite key candidates: [['column1', 'column2'], ['column1', 'column3']]

Clean Column Names

The column_cleaner function allows you to clean column names in a dataframe. It removes special characters and replaces spaces with underscores.

from patek.utils import column_cleaner

# input dataframe columns: ['column?? 1', 'column: 2', 'column-3']

cleaned_dataframe = column_cleaner(sparkDataframe)

# output dataframe columns: ['column_1', 'column_2', 'column_3']

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.5.2

Feb 22, 2023

This version

0.5.1

Dec 16, 2022

0.5

Dec 16, 2022

0.4

Dec 16, 2022

0.3

Dec 16, 2022

0.2

Dec 11, 2022

0.1

Dec 11, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

patek-0.5.1.tar.gz (9.0 kB view hashes)

Uploaded Dec 16, 2022 Source

Built Distribution

patek-0.5.1-py3-none-any.whl (9.7 kB view hashes)

Uploaded Dec 16, 2022 Python 3

Hashes for patek-0.5.1.tar.gz

Hashes for patek-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`d6907fb04cd9c0969c407e3b2096fb1a0e831939412489f6feaa219044dbb541`
MD5	`4ac3539e1407a845db67163f4a4026e2`
BLAKE2b-256	`b7f55c10e3bd18346faec64369cd34161ac0991fe98b90a5f7e70bdee91e8613`

Hashes for patek-0.5.1-py3-none-any.whl

Hashes for patek-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f5bda48e2354b97e516d2dc2029ac5bb217f561318c75bd5705df1a62533d63d`
MD5	`d939369c8a89a430b67e9e4299e8e3f9`
BLAKE2b-256	`c85396dcc16169e38e3789fe2270214f046673c64cebd8e3efefb0d7d1d28f91`