pytest plugin to run the tests with support of pyspark.
Project description
pytest-spark
############
pytest_ plugin to run the tests with support of pyspark (`Apache Spark`_).
This plugin will allow to specify SPARK_HOME directory in ``pytest.ini``
and thus to make "pyspark" importable in your tests which are executed
by pytest.
You can also define "spark_options" in ``pytest.ini`` to customize pyspark,
including "spark.jars.packages" option which allows to load external
libraries (e.g. "com.databricks:spark-xml").
pytest-spark provides session scope fixtures ``spark_context`` and
``spark_session`` which can be used in your tests.
Install
=======
.. code-block:: shell
$ pip install pytest-spark
Usage
=====
Set Spark location
------------------
To run tests with required spark_home location you need to define it by
using one of the following methods:
1. Specify command line option "--spark_home"::
$ pytest --spark_home=/opt/spark
2. Add "spark_home" value to ``pytest.ini`` in your project directory::
[pytest]
spark_home = /opt/spark
3. Set the "SPARK_HOME" environment variable.
pytest-spark will try to import ``pyspark`` from provided location.
.. note::
"spark_home" will be read in the specified order. i.e. you can
override ``pytest.ini`` value by command line option.
Customize spark_options
-----------------------
Just define "spark_options" in your ``pytest.ini``, e.g.:
[pytest]
spark_home = /opt/spark
spark_options =
spark.app.name: my-pytest-spark-tests
spark.executor.instances: 1
spark.jars.packages: com.databricks:spark-xml_2.12:0.5.0
Using the ``spark_context`` fixture
-----------------------------------
Use fixture ``spark_context`` in your tests as a regular pyspark fixture.
SparkContext instance will be created once and reused for the whole test
session.
Example::
def test_my_case(spark_context):
test_rdd = spark_context.parallelize([1, 2, 3, 4])
# ...
Using the ``spark_session`` fixture (Spark 2.0 and above)
---------------------------------------------------------
Use fixture ``spark_session`` in your tests as a regular pyspark fixture.
A SparkSession instance with Hive support enabled will be created once and reused for the whole test
session.
Example::
def test_spark_session_dataframe(spark_session):
test_df = spark_session.createDataFrame([[1,3],[2,4]], "a: int, b: int")
# ...
.. _pytest: http://pytest.org/
.. _Apache Spark: https://spark.apache.org/
############
pytest_ plugin to run the tests with support of pyspark (`Apache Spark`_).
This plugin will allow to specify SPARK_HOME directory in ``pytest.ini``
and thus to make "pyspark" importable in your tests which are executed
by pytest.
You can also define "spark_options" in ``pytest.ini`` to customize pyspark,
including "spark.jars.packages" option which allows to load external
libraries (e.g. "com.databricks:spark-xml").
pytest-spark provides session scope fixtures ``spark_context`` and
``spark_session`` which can be used in your tests.
Install
=======
.. code-block:: shell
$ pip install pytest-spark
Usage
=====
Set Spark location
------------------
To run tests with required spark_home location you need to define it by
using one of the following methods:
1. Specify command line option "--spark_home"::
$ pytest --spark_home=/opt/spark
2. Add "spark_home" value to ``pytest.ini`` in your project directory::
[pytest]
spark_home = /opt/spark
3. Set the "SPARK_HOME" environment variable.
pytest-spark will try to import ``pyspark`` from provided location.
.. note::
"spark_home" will be read in the specified order. i.e. you can
override ``pytest.ini`` value by command line option.
Customize spark_options
-----------------------
Just define "spark_options" in your ``pytest.ini``, e.g.:
[pytest]
spark_home = /opt/spark
spark_options =
spark.app.name: my-pytest-spark-tests
spark.executor.instances: 1
spark.jars.packages: com.databricks:spark-xml_2.12:0.5.0
Using the ``spark_context`` fixture
-----------------------------------
Use fixture ``spark_context`` in your tests as a regular pyspark fixture.
SparkContext instance will be created once and reused for the whole test
session.
Example::
def test_my_case(spark_context):
test_rdd = spark_context.parallelize([1, 2, 3, 4])
# ...
Using the ``spark_session`` fixture (Spark 2.0 and above)
---------------------------------------------------------
Use fixture ``spark_session`` in your tests as a regular pyspark fixture.
A SparkSession instance with Hive support enabled will be created once and reused for the whole test
session.
Example::
def test_spark_session_dataframe(spark_session):
test_df = spark_session.createDataFrame([[1,3],[2,4]], "a: int, b: int")
# ...
.. _pytest: http://pytest.org/
.. _Apache Spark: https://spark.apache.org/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytest-spark-0.5.0.tar.gz
(5.1 kB
view hashes)
Built Distribution
Close
Hashes for pytest_spark-0.5.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1410a5034afaa92f2a724a40b2fbc1091e7b6039421a8fa7242f01be4927d80f |
|
MD5 | 48b3e1991f497446f5912df70b351da4 |
|
BLAKE2b-256 | 381e592492795de80bed9d0601293c8712425fdd7afaa8ba25417a5d14b02050 |