PySpark validation & testing tooling
Project description
pyspark-test
PySpark validation & testing tooling.
Installation
pip install pyspark-val
Usage
assert_pyspark_df_equal(left_df, actual_df)
Additional Arguments
check_dtype
: To compare the data types of spark dataframe. Default truecheck_column_names
: To compare column names. Default false. Not required of we are checking data types.check_columns_in_order
: To check the columns should be in order or not. Default to falseorder_by
: Column names with which dataframe must be sorted before comparing. Default None.
Example
import datetime
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark_test import assert_pyspark_df_equal
sc = SparkContext.getOrCreate(conf=conf)
spark_session = SparkSession(sc)
df_1 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
df_2 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
assert_pyspark_df_equal(df_1, df_2)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspark_val-0.1.4.tar.gz
(5.3 kB
view hashes)
Built Distribution
Close
Hashes for pyspark_val-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3940167cb7ed5a2f17de29cde6448bc254278c394c6018f9bb20ec0d8c0825e |
|
MD5 | 669a33a348209f2fecc886998c335d5f |
|
BLAKE2b-256 | de8d07db24311ffe281afcfde3591c276a45d8f54189aa1f9f2d55f443876c47 |