Skip to main content

Seamlessly execute pyspark code on remote clusters

Project description

Pypsark Proxy |Build Status| |PyPi|
============================

**Under active development. Do not use for production use.**

Seamlessly execute pyspark code on remote clusters.

How it works
------------

Pyspark proxy is made of up a client and server. The client mimics the
pyspark api but when objects get created or called a request is made to
the API server. The calls the API server receives then calls the actual
pyspark APIs.

What has been implemented
-------------------------

Currently only some basic functionalities with the ``SparkContext``,
``sqlContext`` and ``DataFrame`` classes have been implemented. See the
`tests`_ for more on what is currently working.

Getting Started
---------------

Pyspark Proxy requires set up a server where your Spark is located and
simply install the package locally where you want to execute code from.

On Server
~~~~~~~~~

Install pyspark proxy via pip:

::

pip install pysparkproxy

Start the server:

::

pyspark-proxy-server start


The server listens on ``localhost:8765`` by default. Check the ``pyspark-proxy-server`` help for additional options.

Locally
~~~~~~~

Install pyspark proxy via pip:

::

pip install pysparkproxy

Now you can start a spark context and do some dataframe operations.

::

from pyspark_proxy import SparkContext
from pyspark_proxy.sql import SQLContext

sc = SparkContext(appName='pyspark_proxy_app')

sc.setLogLevel('ERROR')

sqlContext = SQLContext(sc)

df = sqlContext.read.json('my.json')

print(df.count())

Then use the normal python binary to run this ``python my_app.py``. This
code works the same if you were to run it via ``spark-submit``.

.. _tests: https://github.com/abronte/PysparkProxy/tree/master/tests
.. _example: https://github.com/abronte/PysparkProxy/blob/master/examples/pyspark_proxy_server.py

.. |Build Status| image:: https://travis-ci.org/abronte/PysparkProxy.svg?branch=master
:target: https://travis-ci.org/abronte/PysparkProxy

.. |PyPi| image:: https://img.shields.io/pypi/v/pysparkproxy.svg
:target: https://pypi.org/project/PysparkProxy/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PysparkProxy-0.0.17.tar.gz (18.4 kB view details)

Uploaded Source

File details

Details for the file PysparkProxy-0.0.17.tar.gz.

File metadata

  • Download URL: PysparkProxy-0.0.17.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.13

File hashes

Hashes for PysparkProxy-0.0.17.tar.gz
Algorithm Hash digest
SHA256 137383cbf0c8e8603a4d92d2abdc55ea8ec4fb965b2973c5655cb1ef8ed0e1ad
MD5 5f7e6e6f89720561156bd1f2ef761f50
BLAKE2b-256 12780b481e1fa5bd74a1b5ecb16f1502885408ecacfb36cc1311ac271b62c300

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page