Utility for testing PySpark code
Project description
mockrdd
A Python3 module for testing PySpark code.
The MockRDD class offers similar behavior to pyspark.RDD with the following extra benefits.
- Extensive sanity checks to identify invalid inputs
- More meaningful error messages for debugging issues
- Straightforward to running within pdb
- Removes Spark dependencies from development and testing environments
- No Spark overhead when running through a large test suite
Simple example of using MockRDD in a test.
from mockrdd import MockRDD
def job(rdd):
return rdd.map(lambda x: x*2).filter(lambda x: x>3)
assert job(MockRDD.empty()).collect() == []
assert job(MockRDD.of(1)).collect() == []
assert job(MockRDD.of(2)).collect() == [4]
Conventionally, you'd include a main method to hook the RDD up to product sources and sinks. Further, the testing would be included in a separate file and use the module unittest for defining test cases.
See the docstring of mockrdd.MockRDD for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mockrdd-0.0.2.tar.gz
(7.1 kB
view hashes)