Skip to main content

A library containing various utility functions for playing with PySpark DataFrames

Project description

Spark-frame

PyPI version PyPI - Python Version GitHub Build SonarCloud Coverage SonarCloud Bugs SonarCloud Vulnerabilities PyPI - Downloads Code style: black

What is it ?

Spark-frame is a library that brings several utility methods and transformation functions for PySpark DataFrames. These methods were initially part of the karadoc project used at Younited, but they don't rely on karadoc, so it makes more sense to keep them as standalone library.

Several of these methods were my initial inspiration to make the cousin project bigquery-frame, which is why you will find similar methods in transformations and data_diff for both spark_frame and bigquery_frame, except the former runs on PySpark while the latter runs on BigQuery (obviously).

Installation

spark-frame is available on PyPi.

pip install spark-frame

Release notes

v0.0.3

  • New transformation: spark_frame.transformations.convert_all_maps_to_arrays.
  • New transformation: spark_frame.transformations.sort_all_arrays.
  • New transformation: spark_frame.transformations.harmonize_dataframes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_frame-0.0.3.tar.gz (35.3 kB view hashes)

Uploaded Source

Built Distribution

spark_frame-0.0.3-py3-none-any.whl (45.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page