Skip to main content

A Flake8 plugin to check for PySpark withColumn usage in loops

Project description

Flake8-pyspark-with-column

A flake8 plugin that detects of usage withColumn in a loop or inside reduce. From the PySpark documentation about withColumn method:

  This method introduces a projection internally.
  Therefore, calling it multiple times, for instance,
  via loops in order to add multiple columns
  can generate big plans which can cause performance issues
  and even StackOverflowException.
  To avoid this, use select() with multiple columns at once.

Rules

This plugin contains the following rules:

  • PSPRK001: Usage of withColumn in a loop detected
  • PSPRK002: Usage of withColumn iside reduce is detected

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flake8_pyspark_with_column-0.0.2.tar.gz (7.2 kB view hashes)

Uploaded Source

Built Distribution

flake8_pyspark_with_column-0.0.2-py2.py3-none-any.whl (6.8 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page