Skip to main content

Check that left and right spark DataFrame are equal.

Project description

pyspark-test

Code Style: Black License Unit Test PyPI version Downloads

Check that left and right spark DataFrame are equal.

This function is intended to compare two spark DataFrames and output any differences. It is inspired from pandas testing module but for pyspark, and for use in unit tests. Additional parameters allow varying the strictness of the equality checks performed.

Installation

pip install pyspark-test

Usage

assert_pyspark_df_equal(left_df, actual_df)

Additional Arguments

  • check_dtype : To compare the data types of spark dataframe. Default true
  • check_column_names : To compare column names. Default false. Not required of we are checking data types.
  • check_columns_in_order : To check the columns should be in order or not. Default to false
  • order_by : Column names with which dataframe must be sorted before comparing. Default None.

Example

import datetime

from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *

from pyspark_test import assert_pyspark_df_equal

sc = SparkContext.getOrCreate(conf=conf)
spark_session = SparkSession(sc)

df_1 = spark_session.createDataFrame(
    data=[
        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],
        [None, None, None, None],
    ],
    schema=StructType(
        [
            StructField('col_a', DateType(), True),
            StructField('col_b', StringType(), True),
            StructField('col_c', DoubleType(), True),
            StructField('col_d', LongType(), True),
        ]
    ),
)

df_2 = spark_session.createDataFrame(
    data=[
        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],
        [None, None, None, None],
    ],
    schema=StructType(
        [
            StructField('col_a', DateType(), True),
            StructField('col_b', StringType(), True),
            StructField('col_c', DoubleType(), True),
            StructField('col_d', LongType(), True),
        ]
    ),
)

assert_pyspark_df_equal(df_1, df_2)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark_test-0.2.0.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

pyspark_test-0.2.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file pyspark_test-0.2.0.tar.gz.

File metadata

  • Download URL: pyspark_test-0.2.0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for pyspark_test-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0d9d8d3a352a9b1c30761b0553a5771cb9dbb9a278955b3e7b0aed0ae13892d8
MD5 1e1975b8d80865b0396e5fb71db0a639
BLAKE2b-256 f8a93ca6c0f3289da348d25693adb4f80e3d8b2389dea603f222feae4dd78e76

See more details on using hashes here.

File details

Details for the file pyspark_test-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pyspark_test-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for pyspark_test-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd4fb03c4f438f718a870a9268a459f2f8924829c767302f5515202707c97709
MD5 e35c397e281ab7f8908b4ecfdfe2e73d
BLAKE2b-256 ec326d75e7d5171393ead86c2a6c0aba5b5cdff495286537732c3a1ad05575c1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page