Skip to main content

A package for PySpark utility functions.

Project description

The sparkit logo.

pypi docs ci status coverage license

About

A package for PySpark utility functions:

Installation

sparkit is available on PyPI for Python 3.8+ and Spark 3 (Java 11):

pip install sparkit

Examples

join multiple data frames on common key (pass single and / or an iterable of data frames):

>>> import sparkit
>>> from pyspark.sql import Row, SparkSession
>>> spark = SparkSession.builder.getOrCreate()
>>> df1 = spark.createDataFrame([Row(id=1, x="a"), Row(id=2, x="b")])
>>> df2 = spark.createDataFrame([Row(id=1, y="c"), Row(id=2, y="d")])
>>> df3 = spark.createDataFrame([Row(id=1, z="e"), Row(id=2, z="f")])
>>> sparkit.join([df1, df2], df3, on="id").show()
+---+---+---+---+
| id|  x|  y|  z|
+---+---+---+---+
|  1|  a|  c|  e|
|  2|  b|  d|  f|
+---+---+---+---+

union multiple data frames by name (pass single and / or an iterable of data frames):

>>> import sparkit
>>> from pyspark.sql import Row, SparkSession
>>> spark = SparkSession.builder.getOrCreate()
>>> df1 = spark.createDataFrame([Row(x=1, y=2), Row(x=3, y=4)])
>>> df2 = spark.createDataFrame([Row(x=5, y=6), Row(x=7, y=8)])
>>> df3 = spark.createDataFrame([Row(x=0, y=1), Row(x=2, y=3)])
>>> df4 = spark.createDataFrame([Row(x=5, y=3), Row(x=9, y=6)])
>>> sparkit.union(df1, [df2, df3], df4).show()
+---+---+
|  x|  y|
+---+---+
|  1|  2|
|  3|  4|
|  5|  6|
|  7|  8|
|  0|  1|
|  2|  3|
|  5|  3|
|  9|  6|
+---+---+

Contributing to sparkit

Your contribution is greatly appreciated! See the following links to help you get started:

License

sparkit was created by sparkit Developers. It is licensed under the terms of the BSD 3-Clause license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkit-1.1.1.tar.gz (10.9 kB view hashes)

Uploaded Source

Built Distribution

sparkit-1.1.1-py3-none-any.whl (10.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page