A reverse proxy server which allows secure connectivity to a Spark Connect server
Project description
spark-connect-proxy
A reverse proxy server which allows secure connectivity to a Spark Connect server
Setup (to run locally)
Install Python package
You can install spark-connect-proxy
from PyPi or from source.
Option 1 - from PyPi
# Create the virtual environment
python3 -m venv .venv
# Activate the virtual environment
. .venv/bin/activate
pip install spark-connect-proxy
Option 2 - from source - for development
git clone https://github.com/prmoore77/spark-connect-proxy
cd spark-connect-proxy
# Create the virtual environment
python3 -m venv .venv
# Activate the virtual environment
. .venv/bin/activate
# Upgrade pip, setuptools, and wheel
pip install --upgrade pip setuptools wheel
# Install Spark Connect Proxy - in editable mode with client and dev dependencies
pip install --editable .[client,dev]
Note
For the following commands - if you running from source and using --editable
mode (for development purposes) - you will need to set the PYTHONPATH environment variable as follows:
export PYTHONPATH=$(pwd)/src
Usage
This repo contains scripts to let you provision an AWS EMR Spark cluster with a secure Spark Connect Proxy server to allow you to securely and remotely connect to it.
The scripts the AWS CLI to provision the EMR Spark cluster - so you will need to have the AWS CLI installed and configured with your AWS credentials.
You can create a file in your local copy of the scripts
directory called .env
with the following contents:
export AWS_ACCESS_KEY_ID="put value from AWS here"
export AWS_SECRET_ACCESS_KEY="put value from AWS here"
export AWS_SESSION_TOKEN="put value from AWS here"
export AWS_REGION="us-east-2"
To provision the EMR Spark cluster - run the following command from the root directory of this repo:
scripts/provision_emr_spark_cluster.sh
That will output several files:
- file:
tls/ca.crt
- the EMR Spark cluster generated TLS certificate - needed for your PySpark client to trust the Spark Connect Proxy server (b/c it is self-signed) - file:
scripts/output/instance_details.txt
- shows the ssh command for connecting to the master node of the EMR Spark cluster - file:
scripts/output/spark_connect_proxy_details.log
- shows how to run a PySpark Ibis client example - which connects securely from your local computer to the remote EMR Spark cluster. Example command:
spark-connect-proxy-ibis-client-example \
--host ec2-01-01-01-01.us-east-2.compute.amazonaws.com \
--port 50051 \
--use-tls \
--tls-roots tls/ca.crt \
--token honey.badger.dontcare
Handy development commands
Version management
Bump the version of the application - (you must have installed from source with the [dev] extras)
bumpver update --patch
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spark_connect_proxy-0.0.7.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1917d20a9868d6d832f674a2e602542437e45b97d5898137ae0a7ae09c8c66a5 |
|
MD5 | 663b6e224a96afb6198833cfc0aa3de8 |
|
BLAKE2b-256 | 30648051287c11d7cd5f4a2e17eed108cc43b42443edef0c88b6b1785d51f17f |
Hashes for spark_connect_proxy-0.0.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4015cadd8641ea479708065ca4d0cfd498cde384de1191962605a04a894ab47c |
|
MD5 | 21a60f8e9fd0c8508e38b4fc265100d1 |
|
BLAKE2b-256 | 92ac596f59bc7d3187f34a755b7f0a901eeb0277fa26c2380d378d0bebffa01b |