Skip to main content

pyspark helpers

Project description

bigninja

PyPI PyPI - License

PySpark helpers to maximise data engineer productivity. Follow pain-driven development technique.

Setup

After pip install bigninja start using it by

from bigninja import *

BigNinja works by adding extension methods to Spark's DataFrame class. All the methods start with bn_ prefix to avoid conflicts with built-in methods.

DataFrame

.bn_select(*pattern: str), .bn_drop(*pattern: str)

Select/drop columns using a wildcard pattern i.e. df.wc_select("co*") returns columns starting with co. For instance:

  • bn_select("ci*") will select columns starting with city.
  • bn_select("id*", "ci*") with select both columns starting with id and ci and so on.

.bn_display()

Is like .show() but truncate is set to False and arrays and structs are transformed to JSON so that you can read it.

.bn_union(df: DataFrame)

Unions DataFrames, even if number of columns, their names and types don't match, by creating an overlap of columns from both datasets and filling missing values with null.

Etc

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

bigninja-0.0.3-py3-none-any.whl (7.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page