PysparkProxy

Seamlessly execute pyspark code on remote clusters

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Under active development. Do not use for production use.

Seamlessly execute pyspark code on remote clusters.

How it works

Pyspark proxy is made of up a client and server. The client mimics the pyspark api but when objects get created or called a request is made to the API server. The calls the API server receives then calls the actual pyspark APIs.

What has been implemented

Currently only some basic functionalities with the SparkContext, sqlContext and DataFrame classes have been implemented. See the tests for more on what is currently working.

Getting Started

Pyspark Proxy requires set up a server where your Spark is located and simply install the package locally where you want to execute code from.

On Server

Install pyspark proxy via pip:

pip install pysparkproxy

Start the server:

pyspark-proxy-server start

The server listens on localhost:8765 by default. Check the pyspark-proxy-server help for additional options.

Locally

Install pyspark proxy via pip:

pip install pysparkproxy

Now you can start a spark context and do some dataframe operations.

from pyspark_proxy import SparkContext
from pyspark_proxy.sql import SQLContext

sc = SparkContext(appName='pyspark_proxy_app')

sc.setLogLevel('ERROR')

sqlContext = SQLContext(sc)

df = sqlContext.read.json('my.json')

print(df.count())

Then use the normal python binary to run this python my_app.py. This code works the same if you were to run it via spark-submit.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.17

Nov 15, 2018

0.0.16

Nov 13, 2018

0.0.15

Nov 13, 2018

0.0.14

Nov 13, 2018

This version

0.0.13

Nov 13, 2018

0.0.12

Nov 6, 2018

0.0.11

Nov 5, 2018

0.0.10

Oct 11, 2018

0.0.9

Oct 8, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PysparkProxy-0.0.13.tar.gz (16.9 kB view hashes)

Uploaded Nov 13, 2018 Source

Hashes for PysparkProxy-0.0.13.tar.gz

Hashes for PysparkProxy-0.0.13.tar.gz
Algorithm	Hash digest
SHA256	`b73ba265fec3c20d9a3305a14ad22f649711d180d56295d28afd219b247ab745`
MD5	`7b1ac4f357f8c99debbd3c3aed609661`
BLAKE2b-256	`5ba5608122a0e22ac29e483a32b78477b9aa569884975a85407a65400841d8b2`