Skip to main content

Extension to unittest for pySpark

Project description

unittest-pyspark

Extensions for testing pyspark with unittest and doctest.

These utils can be used in standalone Python or in Databricks notebooks.

Usage With Doctest

from unittest_pyspark import get_spark
spark = get_spark()

def go_spark():
    """
    >>> spark.sql("SELECT 'hello world'").show()
    +-----------+
    |hello world|
    +-----------+
    |hello world|
    +-----------+
    <BLANKLINE>
    >>> spark.createDataFrame([{'hello':'world'}], 'hello:string').show()
    +-----+
    |hello|
    +-----+
    |world|
    +-----+
    <BLANKLINE>
    """
    pass

import doctest
doctest.testmod()

Usage With Unittest

Here is a simple unittest test case, which can be used as template for pySpark test case.

import unittest
from unittest_pyspark import as_list, get_spark
import pyspark.sql.types as pst

class Test_Spark(unittest.TestCase):
  def setUp(self):
      self.spark = get_spark()

  def test_i_can_fly(self):
    input = [ pst.Row(a=1, b=2)]
    input_df = self.spark.createDataFrame(input)

    expect = [{'a':1}]

    actual_df = input_df.select("a")
    actual = as_list(actual_df)

    self.assertEqual(actual, expect)

You can find this entire example in the tests.test_sample module. To execute it from the command line:

python -m unittest tests.test_sample

Usage With Unittest and Databricks

To execute the unittest test cases in Databricks, add following cell:

from unittest_pyspark.unittest import *
if __name__ == "__main__":
  execute_test_cases(discover_test_cases())

Above code will automatically discover all test cases (unittest.TestCase sub classes) defined in the global scope and execute them.

Build package

You will need setuptools and twine:

pip install --upgrade setuptools
pip install --upgrade wheel

Build and upload:

python setup.py sdist bdist_wheel
python -m twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unittest-pyspark-0.0.5.tar.gz (3.3 kB view details)

Uploaded Source

Built Distribution

unittest_pyspark-0.0.5-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file unittest-pyspark-0.0.5.tar.gz.

File metadata

  • Download URL: unittest-pyspark-0.0.5.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.6rc1

File hashes

Hashes for unittest-pyspark-0.0.5.tar.gz
Algorithm Hash digest
SHA256 ca2ba2e859b8c564fa61c01ac9c6b163033fb902bfa15b4c25f607606ac9a8e2
MD5 d7dc38345cecb4ec6e391e2707815e95
BLAKE2b-256 264f68bf4887b44f8ccb994d46cc732ead42555d3149b4869e88160e985cd945

See more details on using hashes here.

File details

Details for the file unittest_pyspark-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: unittest_pyspark-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.6rc1

File hashes

Hashes for unittest_pyspark-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 69a44fcf2e8359336bc67d379222012c4d1e5563db3b5c7e52e94370d5b2f8f3
MD5 f7be62262571e6009e9c760c14286f59
BLAKE2b-256 c435a91caa882cb811e75d6942117de81e9160899757734105ffb82f1f659270

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page