Skip to main content

Utility for testing PySpark code

Project description

mockrdd

A Python3 module for testing PySpark code.

The MockRDD class offers similar behavior to pyspark.RDD with the following extra benefits.

  • Extensive sanity checks to identify invalid inputs
  • More meaningful error messages for debugging issues
  • Straightforward to running within pdb
  • Removes Spark dependencies from development and testing environments
  • No Spark overhead when running through a large test suite

Simple example of using MockRDD in a test.

from mockrdd import MockRDD

def job(rdd):
    return rdd.map(lambda x: x*2).filter(lambda x: x>3)

assert job(MockRDD.empty()).collect() == [] 
assert job(MockRDD.of(1)).collect() == [] 
assert job(MockRDD.of(2)).collect() == [4] 

Conventionally, you'd include a main method to hook the RDD up to product sources and sinks. Further, the testing would be included in a separate file and use the module unittest for defining test cases.

See the docstring of mockrdd.MockRDD for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for mockrdd, version 0.0.2
Filename, size File type Python version Upload date Hashes
Filename, size mockrdd-0.0.2-py3-none-any.whl (7.3 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size mockrdd-0.0.2.tar.gz (7.1 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page