Skip to main content

This project performs pyspark operations on dataframes, currently for unnesting shallow or deeply nested json data.

Project description

Pyspark ETL

This project aims at solving common problems that face data engineers.

One problem is to handle deeply nested json data and render data in a clean tabular format.

This kind of semistructured data may contain a combination of different data types that need to be handled differently to flatten the data properly.

This package does just that!

In this initial version of the package, there is one module named pysparketl which has one main function: flattenDF and two utility functions: _getArrayCols, _explodeArrayCols.

Usage

flattenDF(df) where df is a pyspark dataframe that has nested data in its columns. The returned dataframe will be a completely flat/tabular structure.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysparketl-0.0.1.tar.gz (3.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysparketl-0.0.1-py3-none-any.whl (3.4 kB view details)

Uploaded Python 3

File details

Details for the file pysparketl-0.0.1.tar.gz.

File metadata

  • Download URL: pysparketl-0.0.1.tar.gz
  • Upload date:
  • Size: 3.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for pysparketl-0.0.1.tar.gz
Algorithm Hash digest
SHA256 845c9af60a52fb2054634a8b8161a8d7dc779f486e4e8def732f0de78d44ee3a
MD5 1c7d76166432625ca7331132ccb89ce5
BLAKE2b-256 dbbea07c9afab632cc463ec11006d55499e6a26535407f510db14b4140d84979

See more details on using hashes here.

File details

Details for the file pysparketl-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: pysparketl-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 3.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for pysparketl-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 547986f1859b0fc7bff958cb7e7296bb8c53b0a1777f7a0ed02cbb1a9abce53d
MD5 5b1110e936d2690d5fa56ff0af30afe3
BLAKE2b-256 8ad5dbe806c14ce1bae34b6a62a2b4ae83e88d2650a6ce5db739d71ab3bf21fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page