Skip to main content

A test framework for Python Spark application development

Project description

spark-tests

A test framework, that defines several test doubles, to facilitate Python Spark application development.

spark_tests.sql module

Defines the following test doubles:

  • FakeSparkSession
    • Stubs sql(sql_query) method to only log the sql_queries , not sending them to database for execution;
    • table(table_mame) and createDataFrame(data[, schema, samplingRatio, verifySchema]) methods delegate execution to the real SparkSession, but returns a FakeDataFrame instead of a DataFrame;
    • table(table_name) is often overridden in a subclass to return a table from a fake test database.
  • FakeDataFrame
    • write returns a FakeDFWriter;
    • Other methods work just like a real DataFrame, but return FakeDataFrames instead of DataFrames;
  • FakeDFWriter
    • Stubs a DataFrameWriter to only log Rows written, not writing them at all.

spark_tests.delta module

Defines FakeDeltaTable, that stubs merge(source, condition) to only log the merge operation, changing no data.

spark_tests.datetime module

Defines the following test doubles:

  • FakeDatetime
    • Stubs now() method to return always a predefined datetime.
  • FakeDate
    • Stubs today() method to return always a predefined date.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark-tests-1.3.5.tar.gz (6.2 kB view hashes)

Uploaded Source

Built Distribution

pyspark_tests-1.3.5-py3-none-any.whl (7.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page