Skip to main content

Utility for testing PySpark code

Project description

mockrdd

A Python3 module for testing PySpark code.

The MockRDD class offers similar behavior to pyspark.RDD with the following extra benefits.

  • Extensive sanity checks to identify invalid inputs
  • More meaningful error messages for debugging issues
  • Straightforward to running within pdb
  • Removes Spark dependencies from development and testing environments
  • No Spark overhead when running through a large test suite

Simple example of using MockRDD in a test.

from mockrdd import MockRDD

def job(rdd):
    return rdd.map(lambda x: x*2).filter(lambda x: x>3)

assert job(MockRDD.empty()).collect() == [] 
assert job(MockRDD.of(1)).collect() == [] 
assert job(MockRDD.of(2)).collect() == [4] 

Conventionally, you'd include a main method to hook the RDD up to product sources and sinks. Further, the testing would be included in a separate file and use the module unittest for defining test cases.

See the docstring of mockrdd.MockRDD for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mockrdd-0.0.2.tar.gz (7.1 kB view hashes)

Uploaded Source

Built Distribution

mockrdd-0.0.2-py3-none-any.whl (7.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page