Skip to main content

cleaning functions for pyspark df

Project description

pyspark-df-cleaner

Making life easier. This package is used for cleaning Pyspark dataframes. The module will be extended in the future.

It currently consists of three main features:

  • Removing leading zeros from column -> e.g. Turns "0000F45" into "F45"
  • Casting int/long column to date -> e.g. Turns int/long column into Pyspark DateType()
  • keep_alphanumeric_string -> e.g. Turns "444-555-666" into "444555666"

Should you have any suggestions for additional features, just let me know.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkcleaner-1.1.1.tar.gz (4.7 kB view hashes)

Uploaded Source

Built Distribution

sparkcleaner-1.1.1-py3-none-any.whl (5.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page