Skip to main content

Check that left and right spark DataFrame are equal.

Project description

pyspark-test

Code Style: Black License Unit Test

Check that left and right spark DataFrame are equal.

This function is intended to compare two spark DataFrames and output any differences. Is is mostly intended for use in unit tests. Additional parameters allow varying the strictness of the equality checks performed.

Installation

pip install pyspark-test

Usage

assert_pyspark_df_equal(left_df, actual_df)

Additional Arguments

  • check_dtype : To compare the data types of spark dataframe. Default true
  • check_column_names : To compare column names. Default false. Not required of we are checking data types.
  • check_columns_in_order : To check the columns should be in order or not. Default to false
  • order_by : Column names with which dataframe must be sorted before comparing. Default None.

Example

from pyspark_test import assert_pyspark_df_equal

df_1 = spark_session.createDataFrame(
    data=[
        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],
        [None, None, None, None],
    ],
    schema=StructType(
        [
            StructField('col_a', DateType(), True),
            StructField('col_b', StringType(), True),
            StructField('col_c', DoubleType(), True),
            StructField('col_d', LongType(), True),
        ]
    ),
)

df_2 = spark_session.createDataFrame(
    data=[
        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],
        [None, None, None, None],
    ],
    schema=StructType(
        [
            StructField('col_a', DateType(), True),
            StructField('col_b', StringType(), True),
            StructField('col_c', DoubleType(), True),
            StructField('col_d', LongType(), True),
        ]
    ),
)

assert_pyspark_df_equal(df_1, df_2)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark_test-0.1.0.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

pyspark_test-0.1.0-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file pyspark_test-0.1.0.tar.gz.

File metadata

  • Download URL: pyspark_test-0.1.0.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for pyspark_test-0.1.0.tar.gz
Algorithm Hash digest
SHA256 299d1507d780034d9955cdaa635a51babb8314fe9f86778950c4f27e4e8bba39
MD5 04665a2f3f7d62e0f80172bb2ae6ff88
BLAKE2b-256 254ed739a094bd55584fb7a04a6190e22a28b07d5f727ba80cc95b5d2acc397a

See more details on using hashes here.

File details

Details for the file pyspark_test-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pyspark_test-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for pyspark_test-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 67b6a421d397aada301fa6622853f94c51dfe580357bc23b7b9e260a74d3481d
MD5 c5db3799d46148e412da46a7be7f6caa
BLAKE2b-256 b7c2baaa5a0972803296da2b34d0e8575149754ec0b933632808686d6d086400

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page