Skip to main content

pyspark helpers

Project description

bigninja

PyPI PyPI - License

PySpark helpers to maximise data engineer productivity. Follow pain-driven development technique.

Setup

After pip install bigninja start using it by

from bigninja import *

BigNinja works by adding extension methods to Spark's DataFrame class. All the methods start with bn_ prefix to avoid conflicts with built-in methods.

DataFrame

.bn_select(*pattern: str), .bn_drop(*pattern: str)

Select/drop columns using a wildcard pattern i.e. df.wc_select("co*") returns columns starting with co. For instance:

  • bn_select("ci*") will select columns starting with city.
  • bn_select("id*", "ci*") with select both columns starting with id and ci and so on.

.bn_display()

Is like .show() but truncate is set to False and arrays and structs are transformed to JSON so that you can read it.

.bn_union(df: DataFrame)

Unions DataFrames, even if number of columns, their names and types don't match, by creating an overlap of columns from both datasets and filling missing values with null.

Etc

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

bigninja-0.0.3-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file bigninja-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: bigninja-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for bigninja-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8af27fb882bdbb17ffcc6e2d77277aa1adc7c9cf573817bcd82171392ecbbda5
MD5 eada77d4edc783cb5190f1671ac9b1dc
BLAKE2b-256 ad61a43fce1644a4dc70a5b04688439e05866aa9e1453c1676ad180a6a05251a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page