Skip to main content

A collection of the Apache Spark stub files

Project description

PySpark Stubs
=============

|Build Status| |PyPI version|

A collection of the Apache Spark `stub
files <https://www.python.org/dev/peps/pep-0484/#stub-files>`__. These
files were generated by
`stubgen <https://github.com/python/mypy/blob/master/mypy/stubgen.py>`__
and manually edited to include accurate type hints.

Tests and configuration files have been originally contributed to the
`Typeshed project <https://github.com/python/typeshed/>`__. Please refer
to its `contributors
list <https://github.com/python/typeshed/graphs/contributors>`__ and
`license <https://github.com/python/typeshed/blob/master/LICENSE>`__ for
details.

Motivation
----------

- Static error detection (see
`SPARK-20631 <https://issues.apache.org/jira/browse/SPARK-20631>`__)

|SPARK-20631|

- Improved completion for chained method calls.

|Syntax completion|

Installation and usage
-----

Please note that the guidelines for distribution of type information is
still work in progress (`PEP 561 - Distributing and Packaging Type
Information <https://www.python.org/dev/peps/pep-0561/>`__). Currently
installation script overlays existing Spark installations (``pyi`` stub
files are copied next to their ``py`` counterparts in the PySpark
installation directory). If this approach is not acceptable you can stub
files to the search path manually.

According to `PEP
484 <https://www.python.org/dev/peps/pep-0484/#storing-and-distributing-stub-files>`__:

Third-party stub packages can use any location for stub storage.
Type checkers should search for them using PYTHONPATH.

Moreover:

Third-party stub packages can use any location for stub storage.
Type checkers should search for them using PYTHONPATH. A default
fallback directory that is always checked is
shared/typehints/python3.5/ (or 3.6, etc.)

Please check usage before proceeding.

The package is available on PYPI:

.. code:: bash

pip install pyspark-stubs


Depending on your environment you might also need a type checker, like `Mypy <https://github.com/python/mypy>`__
or `Pytype <https://github.com/google/pytype/>`__.

- `PyCharm <https://www.jetbrains.com/pycharm/>`__ - Works out-of-the-box, though as of today (PyCharm 2018.2.4) built-in type checker is somewhat limited compared to MyPy.
- `Atom <https://atom.io/>`__ - Requires `atom-mypy <https://atom.io/packages/atom-mypy>`__ or equivalent.
- `Jupyter Notebooks <https://jupyter.org/>`__ - `It is possible <http://journalpanic.com/post/spice-up-thy-jupyter-notebooks-with-mypy/>`__ to use magics to type check directly in the notebook.
- Environment independent - Just use your favorite checker directly, optionally combined with tool like `entr <http://www.entrproject.org/>`__.

Version Compatibility
---------------------

Package versions follow PySpark versions with exception to maintenance releases - i.e. `pyspark-stubs==2.3.0` should be compatible with `pyspark>=2.3.0,<2.4.0`.
Maintenance releases (`post1`, `post2`, ..., `postN`) are reserved for internal annotations updates.

API Coverage
------------

+------------------------------------------------+---------------------+--------------------+------------+
| Module | Dynamically typed | Statically typed | Notes |
+================================================+=====================+====================+============+
| pyspark | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.accumulators | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.broadcast | ✔ | ✔ | Mixed |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.cloudpickle | ✘ | ✘ | Internal |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.conf | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.context | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.daemon | ✘ | ✘ | Internal |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.files | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.find\_spark\_home | ✘ | ✘ | Internal |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.heapq3 | ✘ | ✘ | Internal |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.java\_gateway | ✘ | ✘ | Internal |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.join | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.base | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.classification | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.clustering | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.common | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.evaluation | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.feature | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.fpm | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.image | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.linalg | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.param | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.param.\_shared\_params\_code\_gen | ✘ | ✘ | Internal |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.param.shared | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.pipeline | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.recommendation | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.regression | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.stat | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.tests | ✘ | ✘ | Tests |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.tuning | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.util | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.ml.wrapper | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.classification | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.clustering | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.common | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.evaluation | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.feature | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.fpm | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.linalg | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.linalg.distributed | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.random | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.recommendation | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.regression | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.stat | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.stat.KernelDensity | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.stat.\_statistics | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.stat.distribution | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.stat.test | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.tests | ✘ | ✘ | Tests |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.tree | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.mllib.util | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.profiler | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.rdd | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.rddsampler | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.resultiterable | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.serializers | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.shell | ✘ | ✘ | Internal |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.shuffle | ✘ | ✘ | Internal |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.catalog | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.column | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.conf | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.context | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.dataframe | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.functions | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.group | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.readwriter | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.session | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.streaming | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.tests | ✘ | ✘ | Tests |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.types | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.utils | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.sql.window | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.statcounter | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.status | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.storagelevel | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.streaming | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.streaming.context | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.streaming.dstream | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.streaming.flume | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.streaming.kafka | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.streaming.kinesis | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.streaming.listener | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.streaming.tests | ✘ | ✘ | Tests |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.streaming.util | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.taskcontext | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.tests | ✘ | ✘ | Tests |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.traceback\_utils | ✘ | ✘ | Internal |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.util | ✔ | ✘ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.version | ✘ | ✔ | |
+------------------------------------------------+---------------------+--------------------+------------+
| pyspark.worker | ✘ | ✘ | Internal |
+------------------------------------------------+---------------------+--------------------+------------+

Disclaimer
----------


Apache Spark, Spark, PySpark, Apache, and the Spark logo are `trademarks <https://www.apache.org/foundation/marks/>`__ of `The
Apache Software Foundation <http://www.apache.org/>`__. This project is not owned, endorsed, or
sponsored by The Apache Software Foundation.

.. |Build Status| image:: https://travis-ci.org/zero323/pyspark-stubs.svg?branch=master
:target: https://travis-ci.org/zero323/pyspark-stubs
.. |PyPI version| image:: https://badge.fury.io/py/pyspark-stubs.svg
:target: https://badge.fury.io/py/pyspark-stubs
.. |SPARK-20631| image:: https://i.imgur.com/GfDCGjv.gif
:alt: SPARK-20631
.. |Syntax completion| image:: https://i.imgur.com/qvkLTAp.gif
:alt: Syntax completion


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark-stubs-2.4.0.tar.gz (47.6 kB view hashes)

Uploaded Source

Built Distribution

pyspark_stubs-2.4.0-py3-none-any.whl (73.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page