Check that left and right spark DataFrame are equal.
Project description
pyspark-test
Check that left and right spark DataFrame are equal.
This function is intended to compare two spark DataFrames and output any differences. It is inspired from pandas testing module but for pyspark, and for use in unit tests. Additional parameters allow varying the strictness of the equality checks performed.
Installation
pip install pyspark-test
Usage
assert_pyspark_df_equal(left_df, actual_df)
Additional Arguments
check_dtype: To compare the data types of spark dataframe. Default truecheck_column_names: To compare column names. Default false. Not required of we are checking data types.check_columns_in_order: To check the columns should be in order or not. Default to falseorder_by: Column names with which dataframe must be sorted before comparing. Default None.
Example
import datetime
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark_test import assert_pyspark_df_equal
sc = SparkContext.getOrCreate(conf=conf)
spark_session = SparkSession(sc)
df_1 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
df_2 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
assert_pyspark_df_equal(df_1, df_2)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyspark_test-0.2.0.tar.gz.
File metadata
- Download URL: pyspark_test-0.2.0.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d9d8d3a352a9b1c30761b0553a5771cb9dbb9a278955b3e7b0aed0ae13892d8
|
|
| MD5 |
1e1975b8d80865b0396e5fb71db0a639
|
|
| BLAKE2b-256 |
f8a93ca6c0f3289da348d25693adb4f80e3d8b2389dea603f222feae4dd78e76
|
File details
Details for the file pyspark_test-0.2.0-py3-none-any.whl.
File metadata
- Download URL: pyspark_test-0.2.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd4fb03c4f438f718a870a9268a459f2f8924829c767302f5515202707c97709
|
|
| MD5 |
e35c397e281ab7f8908b4ecfdfe2e73d
|
|
| BLAKE2b-256 |
ec326d75e7d5171393ead86c2a6c0aba5b5cdff495286537732c3a1ad05575c1
|