cleaning functions for pyspark df
Project description
pyspark-df-cleaner
Making life easier. This package is used for cleaning Pyspark dataframes. The module will be extended in the future.
It currently consists of three main features:
- Removing leading zeros from column -> e.g. Turns "0000F45" into "F45"
- Casting int/long column to date -> e.g. Turns int/long column into Pyspark DateType()
- keep_alphanumeric_string -> e.g. Turns "444-555-666" into "444555666"
Should you have any suggestions for additional features, just let me know.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sparkcleaner-1.1.1.tar.gz
(4.7 kB
view hashes)
Built Distribution
Close
Hashes for sparkcleaner-1.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c05fbb31ad9d53b19a71ad74786b38c498f16c50025a83c3f7fe0cbbc1da43d3 |
|
MD5 | f22d9fae37546a09952ea250b9b7d2bf |
|
BLAKE2b-256 | 93ae587f78cd2c4e5140fa20da6ff2067ccc672723642fdbee0466552eadcb63 |