Skip to main content

A Flake8 plugin to check for PySpark withColumn usage in loops

Project description

Flake8-pyspark-with-column

A flake8 plugin that detects of usage withColumn in a loop or inside reduce. From the PySpark documentation about withColumn method:

  This method introduces a projection internally.
  Therefore, calling it multiple times, for instance,
  via loops in order to add multiple columns
  can generate big plans which can cause performance issues
  and even StackOverflowException.
  To avoid this, use select() with multiple columns at once.

Rules

This plugin contains the following rules:

  • PSPRK001: Usage of withColumn in a loop detected
  • PSPRK002: Usage of withColumn iside reduce is detected

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flake8_pyspark_with_column-0.0.2.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

flake8_pyspark_with_column-0.0.2-py2.py3-none-any.whl (6.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file flake8_pyspark_with_column-0.0.2.tar.gz.

File metadata

File hashes

Hashes for flake8_pyspark_with_column-0.0.2.tar.gz
Algorithm Hash digest
SHA256 897670411f9ca6858d9f36ba328182895e65a2ea54f23e29212e9cc75f8c0dad
MD5 17f1c9c0f9fd55f628c7fa9ca434b5e0
BLAKE2b-256 363a7edfce8cc46f11c588af5e4e5599db311ecc0e307e943a35405dc19f76c7

See more details on using hashes here.

File details

Details for the file flake8_pyspark_with_column-0.0.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for flake8_pyspark_with_column-0.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 19cfd8c7b3aab91f0cc68398206255f089be01392af02f78291ead15330078ff
MD5 b7b42657f49c19cd1c0192957ac8474a
BLAKE2b-256 6cf5acaf42f53af29b64ea30a60357b7cb370628a1fb9ad7591f0d4acd665f1d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page