A collection of the Apache Spark stub files
Project description
A collection of the Apache Spark stub files. These files were generated by stubgen and manually edited to include accurate type hints.
Tests and configuration files have been originally contributed to the Typeshed project. Please refer to its contributors list and license for details.
Motivation
Static error detection (see SPARK-20631)
Improved completion for chained method calls.
Installation and usage
Please note that the guidelines for distribution of type information is still work in progress (PEP 561 - Distributing and Packaging Type Information). Currently installation script overlays existing Spark installations (pyi stub files are copied next to their py counterparts in the PySpark installation directory). If this approach is not acceptable you can add stub files to the search path manually.
According to PEP 484:
Third-party stub packages can use any location for stub storage. Type checkers should search for them using PYTHONPATH.
Moreover:
Third-party stub packages can use any location for stub storage. Type checkers should search for them using PYTHONPATH. A default fallback directory that is always checked is shared/typehints/python3.5/ (or 3.6, etc.)
Please check usage before proceeding.
The package is available on PYPI:
pip install pyspark-stubs
Depending on your environment you might also need a type checker, like Mypy or Pytype.
PyCharm - Works out-of-the-box, though as of today (PyCharm 2018.2.4) built-in type checker is somewhat limited compared to MyPy.
Jupyter Notebooks - It is possible to use magics to type check directly in the notebook.
Visual Studio Code - With Mypy linter.
Environment independent - Just use your favorite checker directly, optionally combined with tool like entr.
Version Compatibility
Package versions follow PySpark versions with exception to maintenance releases - i.e. pyspark-stubs==2.3.0 should be compatible with pyspark>=2.3.0,<2.4.0. Maintenance releases (post1, post2, …, postN) are reserved for internal annotations updates.
API Coverage
Module |
Dynamically typed |
Statically typed |
Notes |
---|---|---|---|
pyspark |
✔ |
✘ |
|
pyspark.accumulators |
✘ |
✔ |
|
pyspark.broadcast |
✔ |
✔ |
Mixed |
pyspark.cloudpickle |
✘ |
✘ |
Internal |
pyspark.conf |
✘ |
✔ |
|
pyspark.context |
✘ |
✔ |
|
pyspark.daemon |
✘ |
✘ |
Internal |
pyspark.files |
✘ |
✔ |
|
pyspark.find_spark_home |
✘ |
✘ |
Internal |
pyspark.heapq3 |
✘ |
✘ |
Internal |
pyspark.java_gateway |
✘ |
✘ |
Internal |
pyspark.join |
✘ |
✔ |
|
pyspark.ml |
✔ |
✘ |
|
pyspark.ml.base |
✘ |
✔ |
|
pyspark.ml.classification |
✘ |
✔ |
|
pyspark.ml.clustering |
✘ |
✔ |
|
pyspark.ml.common |
✔ |
✔ |
Mixed |
pyspark.ml.evaluation |
✘ |
✔ |
|
pyspark.ml.feature |
✘ |
✔ |
|
pyspark.ml.fpm |
✘ |
✔ |
|
pyspark.ml.image |
✘ |
✔ |
|
pyspark.ml.linalg |
✘ |
✔ |
|
pyspark.ml.param |
✘ |
✔ |
|
pyspark.ml.param._shared_params_code_gen |
✘ |
✘ |
Internal |
pyspark.ml.param.shared |
✘ |
✔ |
|
pyspark.ml.pipeline |
✘ |
✔ |
|
pyspark.ml.recommendation |
✘ |
✔ |
|
pyspark.ml.regression |
✘ |
✔ |
|
pyspark.ml.stat |
✘ |
✔ |
|
pyspark.ml.tests |
✘ |
✘ |
Tests |
pyspark.ml.tuning |
✘ |
✔ |
|
pyspark.ml.util |
✘ |
✔ |
|
pyspark.ml.wrapper |
✔ |
✔ |
Mixed |
pyspark.mllib |
✔ |
✘ |
|
pyspark.mllib.classification |
✘ |
✔ |
|
pyspark.mllib.clustering |
✘ |
✔ |
|
pyspark.mllib.common |
✔ |
✘ |
|
pyspark.mllib.evaluation |
✘ |
✔ |
|
pyspark.mllib.feature |
✘ |
✔ |
|
pyspark.mllib.fpm |
✘ |
✔ |
|
pyspark.mllib.linalg |
✘ |
✔ |
|
pyspark.mllib.linalg.distributed |
✘ |
✔ |
|
pyspark.mllib.random |
✘ |
✔ |
|
pyspark.mllib.recommendation |
✘ |
✔ |
|
pyspark.mllib.regression |
✘ |
✔ |
|
pyspark.mllib.stat |
✘ |
✔ |
|
pyspark.mllib.stat.KernelDensity |
✘ |
✔ |
|
pyspark.mllib.stat._statistics |
✘ |
✔ |
|
pyspark.mllib.stat.distribution |
✘ |
✔ |
|
pyspark.mllib.stat.test |
✘ |
✔ |
|
pyspark.mllib.tests |
✘ |
✘ |
Tests |
pyspark.mllib.tree |
✘ |
✔ |
|
pyspark.mllib.util |
✘ |
✔ |
|
pyspark.profiler |
✘ |
✔ |
|
pyspark.resourceinformation |
✘ |
✔ |
|
pyspark.rdd |
✘ |
✔ |
|
pyspark.rddsampler |
✘ |
✔ |
|
pyspark.resultiterable |
✘ |
✔ |
|
pyspark.serializers |
✔ |
✘ |
|
pyspark.shell |
✘ |
✘ |
Internal |
pyspark.shuffle |
✘ |
✘ |
Internal |
pyspark.sql |
✔ |
✘ |
|
pyspark.sql.catalog |
✘ |
✔ |
|
pyspark.sql.column |
✘ |
✔ |
|
pyspark.sql.conf |
✘ |
✔ |
|
pyspark.sql.context |
✘ |
✔ |
|
pyspark.sql.dataframe |
✘ |
✔ |
|
pyspark.sql.functions |
✘ |
✔ |
|
pyspark.sql.group |
✘ |
✔ |
|
pyspark.sql.readwriter |
✘ |
✔ |
|
pyspark.sql.session |
✘ |
✔ |
|
pyspark.sql.streaming |
✘ |
✔ |
|
pyspark.sql.tests |
✘ |
✘ |
Tests |
pyspark.sql.types |
✘ |
✔ |
|
pyspark.sql.utils |
✔ |
✘ |
|
pyspark.sql.window |
✘ |
✔ |
|
pyspark.statcounter |
✘ |
✔ |
|
pyspark.status |
✘ |
✔ |
|
pyspark.storagelevel |
✘ |
✔ |
|
pyspark.streaming |
✔ |
✘ |
|
pyspark.streaming.context |
✘ |
✔ |
|
pyspark.streaming.dstream |
✘ |
✔ |
|
pyspark.streaming.kinesis |
✔ |
✘ |
|
pyspark.streaming.listener |
✔ |
✘ |
|
pyspark.streaming.tests |
✘ |
✘ |
Tests |
pyspark.streaming.util |
✔ |
✘ |
|
pyspark.taskcontext |
✘ |
✔ |
|
pyspark.tests |
✘ |
✘ |
Tests |
pyspark.traceback_utils |
✘ |
✘ |
Internal |
pyspark.util |
✔ |
✘ |
|
pyspark.version |
✘ |
✔ |
|
pyspark.worker |
✘ |
✘ |
Internal |
Disclaimer
Apache Spark, Spark, PySpark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. This project is not owned, endorsed, or sponsored by The Apache Software Foundation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.