Seamlessly execute pyspark code on remote clusters
Project description
Pypsark Proxy |Build Status| |PyPi|
============================
**Under active development. Do not use for production use.**
Seamlessly execute pyspark code on remote clusters.
How it works
------------
Pyspark proxy is made of up a client and server. The client mimics the
pyspark api but when objects get created or called a request is made to
the API server. The calls the API server receives then calls the actual
pyspark APIs.
What has been implemented
-------------------------
Currently only some basic functionalities with the ``SparkContext``,
``sqlContext`` and ``DataFrame`` classes have been implemented. See the
`tests`_ for more on what is currently working.
Getting Started
---------------
Pyspark Proxy requires set up a server where your Spark is located and
simply install the package locally where you want to execute code from.
On Server
~~~~~~~~~
Install pyspark proxy via pip:
::
pip install pysparkproxy
Start the server:
::
pyspark-proxy-server start
The server listens on ``localhost:8765`` by default. Check the ``pyspark-proxy-server`` help for additional options.
Locally
~~~~~~~
Install pyspark proxy via pip:
::
pip install pysparkproxy
Now you can start a spark context and do some dataframe operations.
::
from pyspark_proxy import SparkContext
from pyspark_proxy.sql import SQLContext
sc = SparkContext(appName='pyspark_proxy_app')
sc.setLogLevel('ERROR')
sqlContext = SQLContext(sc)
df = sqlContext.read.json('my.json')
print(df.count())
Then use the normal python binary to run this ``python my_app.py``. This
code works the same if you were to run it via ``spark-submit``.
.. _tests: https://github.com/abronte/PysparkProxy/tree/master/tests
.. _example: https://github.com/abronte/PysparkProxy/blob/master/examples/pyspark_proxy_server.py
.. |Build Status| image:: https://travis-ci.org/abronte/PysparkProxy.svg?branch=master
:target: https://travis-ci.org/abronte/PysparkProxy
.. |PyPi| image:: https://img.shields.io/pypi/v/pysparkproxy.svg
:target: https://pypi.org/project/PysparkProxy/
============================
**Under active development. Do not use for production use.**
Seamlessly execute pyspark code on remote clusters.
How it works
------------
Pyspark proxy is made of up a client and server. The client mimics the
pyspark api but when objects get created or called a request is made to
the API server. The calls the API server receives then calls the actual
pyspark APIs.
What has been implemented
-------------------------
Currently only some basic functionalities with the ``SparkContext``,
``sqlContext`` and ``DataFrame`` classes have been implemented. See the
`tests`_ for more on what is currently working.
Getting Started
---------------
Pyspark Proxy requires set up a server where your Spark is located and
simply install the package locally where you want to execute code from.
On Server
~~~~~~~~~
Install pyspark proxy via pip:
::
pip install pysparkproxy
Start the server:
::
pyspark-proxy-server start
The server listens on ``localhost:8765`` by default. Check the ``pyspark-proxy-server`` help for additional options.
Locally
~~~~~~~
Install pyspark proxy via pip:
::
pip install pysparkproxy
Now you can start a spark context and do some dataframe operations.
::
from pyspark_proxy import SparkContext
from pyspark_proxy.sql import SQLContext
sc = SparkContext(appName='pyspark_proxy_app')
sc.setLogLevel('ERROR')
sqlContext = SQLContext(sc)
df = sqlContext.read.json('my.json')
print(df.count())
Then use the normal python binary to run this ``python my_app.py``. This
code works the same if you were to run it via ``spark-submit``.
.. _tests: https://github.com/abronte/PysparkProxy/tree/master/tests
.. _example: https://github.com/abronte/PysparkProxy/blob/master/examples/pyspark_proxy_server.py
.. |Build Status| image:: https://travis-ci.org/abronte/PysparkProxy.svg?branch=master
:target: https://travis-ci.org/abronte/PysparkProxy
.. |PyPi| image:: https://img.shields.io/pypi/v/pysparkproxy.svg
:target: https://pypi.org/project/PysparkProxy/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
PysparkProxy-0.0.17.tar.gz
(18.4 kB
view details)
File details
Details for the file PysparkProxy-0.0.17.tar.gz
.
File metadata
- Download URL: PysparkProxy-0.0.17.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 137383cbf0c8e8603a4d92d2abdc55ea8ec4fb965b2973c5655cb1ef8ed0e1ad |
|
MD5 | 5f7e6e6f89720561156bd1f2ef761f50 |
|
BLAKE2b-256 | 12780b481e1fa5bd74a1b5ecb16f1502885408ecacfb36cc1311ac271b62c300 |