PySpark validation & testing tooling
Project description
pyspark-test
PySpark validation & testing tooling.
Installation
pip install pyspark-val
Usage
assert_pyspark_df_equal(left_df, actual_df)
Additional Arguments
check_dtype
: To compare the data types of spark dataframe. Default truecheck_column_names
: To compare column names. Default false. Not required of we are checking data types.check_columns_in_order
: To check the columns should be in order or not. Default to falseorder_by
: Column names with which dataframe must be sorted before comparing. Default None.
Example
import datetime
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark_test import assert_pyspark_df_equal
sc = SparkContext.getOrCreate(conf=conf)
spark_session = SparkSession(sc)
df_1 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
df_2 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
assert_pyspark_df_equal(df_1, df_2)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspark_val-0.1.3.tar.gz
(5.2 kB
view hashes)
Built Distribution
Close
Hashes for pyspark_val-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a0bc3d1090ea1d2485de127dd08676ed6c4f820b3e8c16138a6bb0dd2dd22bf |
|
MD5 | 9d267faa5d220615d4d9a58b1d2563f4 |
|
BLAKE2b-256 | e6c1e0ae1d2db137d28686bd8e88d532e6f5375cae8927135ae39f1fce85d284 |