A package for PySpark utility functions.
Project description
About
A package for PySpark utility functions:
Installation
sparkit
is available on PyPI for Python 3.8+ and Spark 3 (Java 11):
pip install sparkit
Examples
join
multiple data frames on common key (pass single and / or an iterable of data frames):
>>> import sparkit
>>> from pyspark.sql import Row, SparkSession
>>> spark = SparkSession.builder.getOrCreate()
>>> df1 = spark.createDataFrame([Row(id=1, x="a"), Row(id=2, x="b")])
>>> df2 = spark.createDataFrame([Row(id=1, y="c"), Row(id=2, y="d")])
>>> df3 = spark.createDataFrame([Row(id=1, z="e"), Row(id=2, z="f")])
>>> sparkit.join([df1, df2], df3, on="id").show()
+---+---+---+---+
| id| x| y| z|
+---+---+---+---+
| 1| a| c| e|
| 2| b| d| f|
+---+---+---+---+
union
multiple data frames by name (pass single and / or an iterable of data frames):
>>> import sparkit
>>> from pyspark.sql import Row, SparkSession
>>> spark = SparkSession.builder.getOrCreate()
>>> df1 = spark.createDataFrame([Row(x=1, y=2), Row(x=3, y=4)])
>>> df2 = spark.createDataFrame([Row(x=5, y=6), Row(x=7, y=8)])
>>> df3 = spark.createDataFrame([Row(x=0, y=1), Row(x=2, y=3)])
>>> df4 = spark.createDataFrame([Row(x=5, y=3), Row(x=9, y=6)])
>>> sparkit.union(df1, [df2, df3], df4).show()
+---+---+
| x| y|
+---+---+
| 1| 2|
| 3| 4|
| 5| 6|
| 7| 8|
| 0| 1|
| 2| 3|
| 5| 3|
| 9| 6|
+---+---+
Contributing to sparkit
Your contribution is greatly appreciated! See the following links to help you get started:
License
sparkit
was created by sparkit Developers.
It is licensed under the terms of the BSD 3-Clause license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sparkit-1.0.0.tar.gz
(9.8 kB
view hashes)