PySpark validation & testing tooling
Project description
pyspark-test
PySpark validation & testing tooling.
Installation
pip install pyspark-val
Usage
assert_pyspark_df_equal(left_df, actual_df)
Additional Arguments
check_dtype
: To compare the data types of spark dataframe. Default truecheck_column_names
: To compare column names. Default false. Not required of we are checking data types.check_columns_in_order
: To check the columns should be in order or not. Default to falseorder_by
: Column names with which dataframe must be sorted before comparing. Default None.
Example
import datetime
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark_test import assert_pyspark_df_equal
sc = SparkContext.getOrCreate(conf=conf)
spark_session = SparkSession(sc)
df_1 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
df_2 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
assert_pyspark_df_equal(df_1, df_2)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspark_val-0.1.4.tar.gz
(5.3 kB
view details)
Built Distribution
File details
Details for the file pyspark_val-0.1.4.tar.gz
.
File metadata
- Download URL: pyspark_val-0.1.4.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | babce0cd8d7f5ebe95cf232f60d5fce6d6c2dcf4149b671225561e17ece3558a |
|
MD5 | f08a641a42bc67907d0aa5d3b3bbc3d7 |
|
BLAKE2b-256 | 6e1ee6ba2b95f44f5f8a8956ffeb18e84d83893ebc54d25e7ae89c772e9f2cd7 |
File details
Details for the file pyspark_val-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: pyspark_val-0.1.4-py3-none-any.whl
- Upload date:
- Size: 6.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3940167cb7ed5a2f17de29cde6448bc254278c394c6018f9bb20ec0d8c0825e |
|
MD5 | 669a33a348209f2fecc886998c335d5f |
|
BLAKE2b-256 | de8d07db24311ffe281afcfde3591c276a45d8f54189aa1f9f2d55f443876c47 |