Skip to main content

PySpark validation & testing tooling

Project description

pyspark-test

Code Style: Black License: MIT Unit Test PyPI version Downloads

PySpark validation & testing tooling.

Installation

pip install pyspark-val

Usage

assert_pyspark_df_equal(left_df, actual_df)

Additional Arguments

  • check_dtype : To compare the data types of spark dataframe. Default true
  • check_column_names : To compare column names. Default false. Not required of we are checking data types.
  • check_columns_in_order : To check the columns should be in order or not. Default to false
  • order_by : Column names with which dataframe must be sorted before comparing. Default None.

Example

import datetime

from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *

from pyspark_test import assert_pyspark_df_equal

sc = SparkContext.getOrCreate(conf=conf)
spark_session = SparkSession(sc)

df_1 = spark_session.createDataFrame(
    data=[
        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],
        [None, None, None, None],
    ],
    schema=StructType(
        [
            StructField('col_a', DateType(), True),
            StructField('col_b', StringType(), True),
            StructField('col_c', DoubleType(), True),
            StructField('col_d', LongType(), True),
        ]
    ),
)

df_2 = spark_session.createDataFrame(
    data=[
        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],
        [None, None, None, None],
    ],
    schema=StructType(
        [
            StructField('col_a', DateType(), True),
            StructField('col_b', StringType(), True),
            StructField('col_c', DoubleType(), True),
            StructField('col_d', LongType(), True),
        ]
    ),
)

assert_pyspark_df_equal(df_1, df_2)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark_val-0.1.1.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

pyspark_val-0.1.1-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file pyspark_val-0.1.1.tar.gz.

File metadata

  • Download URL: pyspark_val-0.1.1.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for pyspark_val-0.1.1.tar.gz
Algorithm Hash digest
SHA256 21d26c0580a1f2fad1cd9380d1b2ae41ecff337be79d969435cad3a589272520
MD5 581f92dc7e5a6c27c28da6fbbaa12caf
BLAKE2b-256 1449d78e0cfc85296ee9e479d89164790dbb4607bae39e7fc0620bafbfb3b3fa

See more details on using hashes here.

File details

Details for the file pyspark_val-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pyspark_val-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for pyspark_val-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3cc243447d059ca987dcc078bed7096915bf47efde07e2ee5a44716bbdaee560
MD5 a5fdd71d3f8705eb556efbb5efbdcd65
BLAKE2b-256 21b89a13f51dcb341a99d01262684bce7ed7893a720ac21f348b9768a6afaca9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page