PySpark validation & testing tooling
Project description
pyspark-test
PySpark validation & testing tooling.
Installation
pip install pyspark-val
Usage
assert_pyspark_df_equal(left_df, actual_df)
Additional Arguments
check_dtype
: To compare the data types of spark dataframe. Default truecheck_column_names
: To compare column names. Default false. Not required of we are checking data types.check_columns_in_order
: To check the columns should be in order or not. Default to falseorder_by
: Column names with which dataframe must be sorted before comparing. Default None.
Example
import datetime
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark_test import assert_pyspark_df_equal
sc = SparkContext.getOrCreate(conf=conf)
spark_session = SparkSession(sc)
df_1 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
df_2 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
assert_pyspark_df_equal(df_1, df_2)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspark_val-0.1.1.tar.gz
(4.3 kB
view details)
Built Distribution
File details
Details for the file pyspark_val-0.1.1.tar.gz
.
File metadata
- Download URL: pyspark_val-0.1.1.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21d26c0580a1f2fad1cd9380d1b2ae41ecff337be79d969435cad3a589272520 |
|
MD5 | 581f92dc7e5a6c27c28da6fbbaa12caf |
|
BLAKE2b-256 | 1449d78e0cfc85296ee9e479d89164790dbb4607bae39e7fc0620bafbfb3b3fa |
File details
Details for the file pyspark_val-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: pyspark_val-0.1.1-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3cc243447d059ca987dcc078bed7096915bf47efde07e2ee5a44716bbdaee560 |
|
MD5 | a5fdd71d3f8705eb556efbb5efbdcd65 |
|
BLAKE2b-256 | 21b89a13f51dcb341a99d01262684bce7ed7893a720ac21f348b9768a6afaca9 |