Skip to main content

pytest plugin to run the tests with support of pyspark.

Project description

pytest-spark
############

pytest_ plugin to run the tests with support of pyspark (`Apache Spark`_).

This plugin will allow to specify SPARK_HOME directory in ``pytest.ini``
and thus to make "pyspark" importable in your tests which are executed
by pytest.

You can also define "spark_options" in ``pytest.ini`` to customize pyspark,
including "spark.jars.packages" option which allows to load external
libraries (e.g. "com.databricks:spark-xml").

pytest-spark provides session scope fixtures ``spark_context`` and
``spark_session`` which can be used in your tests.


Install
=======

.. code-block:: shell

$ pip install pytest-spark


Usage
=====

Set Spark location
------------------

To run tests with required spark_home location you need to define it by
using one of the following methods:

1. Specify command line option "--spark_home"::

$ pytest --spark_home=/opt/spark

2. Add "spark_home" value to ``pytest.ini`` in your project directory::

[pytest]
spark_home = /opt/spark

3. Set the "SPARK_HOME" environment variable.

pytest-spark will try to import ``pyspark`` from provided location.


.. note::
"spark_home" will be read in the specified order. i.e. you can
override ``pytest.ini`` value by command line option.


Customize spark_options
-----------------------

Just define "spark_options" in your ``pytest.ini``, e.g.:

[pytest]
spark_home = /opt/spark
spark_options =
spark.app.name: my-pytest-spark-tests
spark.executor.instances: 1
spark.jars.packages: com.databricks:spark-xml_2.12:0.5.0


Using the ``spark_context`` fixture
-----------------------------------

Use fixture ``spark_context`` in your tests as a regular pyspark fixture.
SparkContext instance will be created once and reused for the whole test
session.

Example::

def test_my_case(spark_context):
test_rdd = spark_context.parallelize([1, 2, 3, 4])
# ...


Using the ``spark_session`` fixture (Spark 2.0 and above)
---------------------------------------------------------

Use fixture ``spark_session`` in your tests as a regular pyspark fixture.
A SparkSession instance with Hive support enabled will be created once and reused for the whole test
session.

Example::

def test_spark_session_dataframe(spark_session):
test_df = spark_session.createDataFrame([[1,3],[2,4]], "a: int, b: int")
# ...

.. _pytest: http://pytest.org/
.. _Apache Spark: https://spark.apache.org/


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest-spark-0.5.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

pytest_spark-0.5.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file pytest-spark-0.5.0.tar.gz.

File metadata

  • Download URL: pytest-spark-0.5.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.3

File hashes

Hashes for pytest-spark-0.5.0.tar.gz
Algorithm Hash digest
SHA256 44b02899c4f15cca2e23abd4dd2f7016fdbbada8bf28d9fe067028b3c1e0a7ff
MD5 f96eb13ceecbc65670355d469e1eb3f5
BLAKE2b-256 b45534651649e64bf5a4eb127bffc1c6d22e9d632f4f073d61cfb1ad43459ca5

See more details on using hashes here.

File details

Details for the file pytest_spark-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: pytest_spark-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.3

File hashes

Hashes for pytest_spark-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1410a5034afaa92f2a724a40b2fbc1091e7b6039421a8fa7242f01be4927d80f
MD5 48b3e1991f497446f5912df70b351da4
BLAKE2b-256 381e592492795de80bed9d0601293c8712425fdd7afaa8ba25417a5d14b02050

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page