Skip to main content

PySpark validation & testing tooling

Project description

pyspark-test

Code Style: Black License: MIT Unit Test PyPI version Downloads

PySpark validation & testing tooling.

Installation

pip install pyspark-val

Usage

assert_pyspark_df_equal(left_df, actual_df)

Additional Arguments

  • check_dtype : To compare the data types of spark dataframe. Default true
  • check_column_names : To compare column names. Default false. Not required of we are checking data types.
  • check_columns_in_order : To check the columns should be in order or not. Default to false
  • order_by : Column names with which dataframe must be sorted before comparing. Default None.

Example

import datetime

from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *

from pyspark_test import assert_pyspark_df_equal

sc = SparkContext.getOrCreate(conf=conf)
spark_session = SparkSession(sc)

df_1 = spark_session.createDataFrame(
    data=[
        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],
        [None, None, None, None],
    ],
    schema=StructType(
        [
            StructField('col_a', DateType(), True),
            StructField('col_b', StringType(), True),
            StructField('col_c', DoubleType(), True),
            StructField('col_d', LongType(), True),
        ]
    ),
)

df_2 = spark_session.createDataFrame(
    data=[
        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],
        [None, None, None, None],
    ],
    schema=StructType(
        [
            StructField('col_a', DateType(), True),
            StructField('col_b', StringType(), True),
            StructField('col_c', DoubleType(), True),
            StructField('col_d', LongType(), True),
        ]
    ),
)

assert_pyspark_df_equal(df_1, df_2)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark_val-0.1.4.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

pyspark_val-0.1.4-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file pyspark_val-0.1.4.tar.gz.

File metadata

  • Download URL: pyspark_val-0.1.4.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for pyspark_val-0.1.4.tar.gz
Algorithm Hash digest
SHA256 babce0cd8d7f5ebe95cf232f60d5fce6d6c2dcf4149b671225561e17ece3558a
MD5 f08a641a42bc67907d0aa5d3b3bbc3d7
BLAKE2b-256 6e1ee6ba2b95f44f5f8a8956ffeb18e84d83893ebc54d25e7ae89c772e9f2cd7

See more details on using hashes here.

File details

Details for the file pyspark_val-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: pyspark_val-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for pyspark_val-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a3940167cb7ed5a2f17de29cde6448bc254278c394c6018f9bb20ec0d8c0825e
MD5 669a33a348209f2fecc886998c335d5f
BLAKE2b-256 de8d07db24311ffe281afcfde3591c276a45d8f54189aa1f9f2d55f443876c47

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page