Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

Seamlessly execute pyspark code on remote clusters

Project description

Pypsark Proxy |Build Status| |PyPi|

**Under active development. Do not use for production use.**

Seamlessly execute pyspark code on remote clusters.

How it works

Pyspark proxy is made of up a client and server. The client mimics the
pyspark api but when objects get created or called a request is made to
the API server. The calls the API server receives then calls the actual
pyspark APIs.

What has been implemented

Currently only some basic functionalities with the ``SparkContext``,
``sqlContext`` and ``DataFrame`` classes have been implemented. See the
`tests`_ for more on what is currently working.

Getting Started

Pyspark Proxy requires set up a server where your Spark is located and
simply install the package locally where you want to execute code from.

On Server

Install pyspark proxy via pip:


pip install pysparkproxy

Start the server:


pyspark-proxy-server start

The server listens on ``localhost:8765`` by default. Check the ``pyspark-proxy-server`` help for additional options.


Install pyspark proxy via pip:


pip install pysparkproxy

Now you can start a spark context and do some dataframe operations.


from pyspark_proxy import SparkContext
from pyspark_proxy.sql import SQLContext

sc = SparkContext(appName='pyspark_proxy_app')


sqlContext = SQLContext(sc)

df ='my.json')


Then use the normal python binary to run this ``python``. This
code works the same if you were to run it via ``spark-submit``.

.. _tests:
.. _example:

.. |Build Status| image::

.. |PyPi| image::

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for PysparkProxy, version 0.0.17
Filename, size File type Python version Upload date Hashes
Filename, size PysparkProxy-0.0.17.tar.gz (18.4 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page