Check that left and right spark DataFrame are equal.
Project description
pyspark-test
Check that left and right spark DataFrame are equal.
This function is intended to compare two spark DataFrames and output any differences. It is inspired from pandas testing module but for pyspark, and for use in unit tests. Additional parameters allow varying the strictness of the equality checks performed.
Installation
pip install pyspark-test
Usage
assert_pyspark_df_equal(left_df, actual_df)
Additional Arguments
check_dtype
: To compare the data types of spark dataframe. Default truecheck_column_names
: To compare column names. Default false. Not required of we are checking data types.check_columns_in_order
: To check the columns should be in order or not. Default to falseorder_by
: Column names with which dataframe must be sorted before comparing. Default None.
Example
import datetime
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark_test import assert_pyspark_df_equal
sc = SparkContext.getOrCreate(conf=conf)
spark_session = SparkSession(sc)
df_1 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
df_2 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
assert_pyspark_df_equal(df_1, df_2)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspark_test-0.2.0.tar.gz
(7.0 kB
view details)
Built Distribution
File details
Details for the file pyspark_test-0.2.0.tar.gz
.
File metadata
- Download URL: pyspark_test-0.2.0.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d9d8d3a352a9b1c30761b0553a5771cb9dbb9a278955b3e7b0aed0ae13892d8 |
|
MD5 | 1e1975b8d80865b0396e5fb71db0a639 |
|
BLAKE2b-256 | f8a93ca6c0f3289da348d25693adb4f80e3d8b2389dea603f222feae4dd78e76 |
File details
Details for the file pyspark_test-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: pyspark_test-0.2.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd4fb03c4f438f718a870a9268a459f2f8924829c767302f5515202707c97709 |
|
MD5 | e35c397e281ab7f8908b4ecfdfe2e73d |
|
BLAKE2b-256 | ec326d75e7d5171393ead86c2a6c0aba5b5cdff495286537732c3a1ad05575c1 |