Skip to main content

A reverse proxy server which allows secure connectivity to a Spark Connect server

Project description

spark-connect-proxy

A reverse proxy server which allows secure connectivity to a Spark Connect server

spark-connect-proxy-ci Supported Python Versions PyPI version PyPI Downloads

Setup (to run locally)

Install Python package

You can install spark-connect-proxy from PyPi or from source.

Option 1 - from PyPi

# Create the virtual environment
python3 -m venv .venv

# Activate the virtual environment
. .venv/bin/activate

pip install spark-connect-proxy

Option 2 - from source - for development

git clone https://github.com/prmoore77/spark-connect-proxy

cd spark-connect-proxy

# Create the virtual environment
python3 -m venv .venv

# Activate the virtual environment
. .venv/bin/activate

# Upgrade pip, setuptools, and wheel
pip install --upgrade pip setuptools wheel

# Install Spark Connect Proxy - in editable mode with client and dev dependencies
pip install --editable .[client,dev]

Note

For the following commands - if you running from source and using --editable mode (for development purposes) - you will need to set the PYTHONPATH environment variable as follows:

export PYTHONPATH=$(pwd)/src

Usage

This repo contains scripts to let you provision an AWS EMR Spark cluster with a secure Spark Connect Proxy server to allow you to securely and remotely connect to it.

The scripts the AWS CLI to provision the EMR Spark cluster - so you will need to have the AWS CLI installed and configured with your AWS credentials.

You can create a file in your local copy of the scripts directory called .env with the following contents:

export AWS_ACCESS_KEY_ID="put value from AWS here"
export AWS_SECRET_ACCESS_KEY="put value from AWS here"
export AWS_SESSION_TOKEN="put value from AWS here"
export AWS_REGION="us-east-2"

To provision the EMR Spark cluster - run the following command from the root directory of this repo:

scripts/provision_emr_spark_cluster.sh

That will output several files:

  • file: tls/ca.crt - the EMR Spark cluster generated TLS certificate - needed for your PySpark client to trust the Spark Connect Proxy server (b/c it is self-signed)
  • file: scripts/output/instance_details.txt - shows the ssh command for connecting to the master node of the EMR Spark cluster
  • file: scripts/output/spark_connect_proxy_details.log - shows how to run a PySpark Ibis client example - which connects securely from your local computer to the remote EMR Spark cluster. Example command:
spark-connect-proxy-ibis-client-example \
  --host ec2-01-01-01-01.us-east-2.compute.amazonaws.com \
  --port 50051 \
  --use-tls \
  --tls-roots tls/ca.crt \
  --token honey.badger.dontcare

Handy development commands

Version management

Bump the version of the application - (you must have installed from source with the [dev] extras)
bumpver update --patch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_connect_proxy-0.0.7.tar.gz (11.3 kB view hashes)

Uploaded Source

Built Distribution

spark_connect_proxy-0.0.7-py3-none-any.whl (12.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page