pyspark helpers
Project description
bigninja
PySpark helpers to maximise data engineer productivity. Follow pain-driven development technique.
Setup
After pip install bigninja
start using it by
from bigninja import *
BigNinja works by adding extension methods to Spark's DataFrame class. All the methods start with bn_
prefix to avoid conflicts with built-in methods.
DataFrame
.bn_select(*pattern: str), .bn_drop(*pattern: str)
Select/drop columns using a wildcard pattern i.e. df.wc_select("co*")
returns columns starting with co. For instance:
bn_select("ci*")
will select columns starting withcity
.bn_select("id*", "ci*")
with select both columns starting withid
andci
and so on.
.bn_display()
Is like .show()
but truncate
is set to False
and arrays and structs are transformed to JSON so that you can read it.
.bn_union(df: DataFrame)
Unions DataFrames, even if number of columns, their names and types don't match, by creating an overlap of columns from both datasets and filling missing values with null.
Etc
- Inspired by [quinn](MrPowers/quinn: pyspark methods to enhance developer productivity 📣 👯 🎉 (github.com)). Most ideas are initially taken from there.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.