Skip to main content

pytest plugin to run the tests with support of pyspark.

Project description

https://travis-ci.org/malexer/pytest-spark.svg?branch=master

pytest plugin to run the tests with support of pyspark (Apache Spark).

This plugin will allow to specify SPARK_HOME directory in pytest.ini and thus to make “pyspark” importable in your tests which are executed by pytest.

You can also define “spark_options” in pytest.ini to customize pyspark, including “spark.jars.packages” option which allows to load external libraries (e.g. “com.databricks:spark-xml”).

pytest-spark provides session scope fixtures spark_context and spark_session which can be used in your tests.

Note: no need to define SPARK_HOME if you’ve installed pyspark using pip (e.g. pip install pyspark) - it should be already importable. In this case just don’t define SPARK_HOME neither in pytest (pytest.ini / –spark_home) nor as environment variable.

Install

$ pip install pytest-spark

Usage

Set Spark location

To run tests with required spark_home location you need to define it by using one of the following methods:

  1. Specify command line option “–spark_home”:

    $ pytest --spark_home=/opt/spark
  2. Add “spark_home” value to pytest.ini in your project directory:

    [pytest]
    spark_home = /opt/spark
  3. Set the “SPARK_HOME” environment variable.

pytest-spark will try to import pyspark from provided location.

Customize spark_options

Just define “spark_options” in your pytest.ini, e.g.:

[pytest]
spark_home = /opt/spark
spark_options =
    spark.app.name: my-pytest-spark-tests
    spark.executor.instances: 1
    spark.jars.packages: com.databricks:spark-xml_2.12:0.5.0

Using the spark_context fixture

Use fixture spark_context in your tests as a regular pyspark fixture. SparkContext instance will be created once and reused for the whole test session.

Example:

def test_my_case(spark_context):
    test_rdd = spark_context.parallelize([1, 2, 3, 4])
    # ...

Using the spark_session fixture (Spark 2.0 and above)

Use fixture spark_session in your tests as a regular pyspark fixture. A SparkSession instance with Hive support enabled will be created once and reused for the whole test session.

Example:

def test_spark_session_dataframe(spark_session):
    test_df = spark_session.createDataFrame([[1,3],[2,4]], "a: int, b: int")
    # ...

Overriding default parameters of the spark_session fixture

By default spark_session will be loaded with the following configurations :

Example:

{
    'spark.app.name': 'pytest-spark',
    'spark.default.parallelism': 1,
    'spark.dynamicAllocation.enabled': 'false',
    'spark.executor.cores': 1,
    'spark.executor.instances': 1,
    'spark.io.compression.codec': 'lz4',
    'spark.rdd.compress': 'false',
    'spark.sql.shuffle.partitions': 1,
    'spark.shuffle.compress': 'false',
    'spark.sql.catalogImplementation': 'hive',
}

You can override some of these parameters in your pytest.ini. For example, removing Hive Support for the spark session :

Example:

[pytest]
spark_home = /opt/spark
spark_options =
    spark.sql.catalogImplementation: in-memory

Development

Tests

Run tests locally:

$ docker-compose up --build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest-spark-0.6.0.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

pytest_spark-0.6.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file pytest-spark-0.6.0.tar.gz.

File metadata

  • Download URL: pytest-spark-0.6.0.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.4.2 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.6

File hashes

Hashes for pytest-spark-0.6.0.tar.gz
Algorithm Hash digest
SHA256 06e3fbfa2e7fa69d2976c10037c9ee3549c80580228bde5b9aa602f44b711f17
MD5 d4cc138c0ca0afbfa17c362d4e1a2b3d
BLAKE2b-256 ea060a05e3bb6dbf86a45590d1192236fae6717bfc80d8cfdf1d86ac56af7928

See more details on using hashes here.

File details

Details for the file pytest_spark-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: pytest_spark-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.4.2 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.6

File hashes

Hashes for pytest_spark-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cabfbcfca6a4876c5e03b151ba9217f3888fe5142154c1e885dd7902afa85a89
MD5 dfaeba8dbed1bbae15b9db0f1bb36ceb
BLAKE2b-256 fa580a5820b4912e63f50b043170eeda56efab52104877818e2ac08c2eecc26d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page