dataproc-spark-connect

Dataproc client library for Spark Connect

Project description

# Dataproc Spark Connect Client

A wrapper of the Apache [Spark Connect](https://spark.apache.org/spark-connect/) client with additional functionalities that allow applications to communicate with a remote Dataproc Spark cluster using the Spark Connect protocol without requiring additional steps.

## Install

`console pip install dataproc_spark_connect `

## Uninstall

`console pip uninstall dataproc_spark_connect `

## Setup This client requires permissions to manage [Dataproc sessions and session templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam). If you are running the client outside of Google Cloud, you must set following environment variables:

GOOGLE_CLOUD_PROJECT - The Google Cloud project you use to run Spark workloads
GOOGLE_CLOUD_REGION - The Compute Engine [region](https://cloud.google.com/compute/docs/regions-zones#available) where you run the Spark workload.
GOOGLE_APPLICATION_CREDENTIALS - Your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)
DATAPROC_SPARK_CONNECT_SESSION_DEFAULT_CONFIG (Optional) - The config location, such as tests/integration/resources/session.textproto

## Usage

Install the latest version of Dataproc Python client and Dataproc Spark Connect modules:

`console pip install google_cloud_dataproc --force-reinstall pip install dataproc_spark_connect --force-reinstall `
Add the required import into your PySpark application or notebook:

`python from google.cloud.dataproc_spark_connect import DataprocSparkSession `
There are two ways to create a spark session,
1. Start a Spark session using properties defined in DATAPROC_SPARK_CONNECT_SESSION_DEFAULT_CONFIG:
  
  `python spark = DataprocSparkSession.builder.getOrCreate() `
2. Start a Spark session with the following code instead of using a config file:
  
  `python from google.cloud.dataproc_v1 import SparkConnectConfig from google.cloud.dataproc_v1 import Session dataproc_session_config = Session() dataproc_session_config.spark_connect_session = SparkConnectConfig() dataproc_session_config.environment_config.execution_config.subnetwork_uri = "<subnet>" dataproc_session_config.runtime_config.version = '3.0' spark = DataprocSparkSession.builder.dataprocSessionConfig(dataproc_session_config).getOrCreate() `

## Billing As this client runs the spark workload on Dataproc, your project will be billed as per [Dataproc Serverless Pricing](https://cloud.google.com/dataproc-serverless/pricing). This will happen even if you are running the client from a non-GCE instance.

## Contributing ### Building and Deploying SDK

Install the requirements in virtual environment.

`console pip install -r requirements-dev.txt `
Build the code.

`console python setup.py sdist bdist_wheel `
Copy the generated .whl file to Cloud Storage. Use the version specified in the setup.py file.

`sh VERSION=<version> gsutil cp dist/dataproc_spark_connect-${VERSION}-py2.py3-none-any.whl gs://<your_bucket_name> `
Download the new SDK on Vertex, then uninstall the old version and install the new one.

`sh %%bash export VERSION=<version> gsutil cp gs://<your_bucket_name>/dataproc_spark_connect-${VERSION}-py2.py3-none-any.whl . yes | pip uninstall dataproc_spark_connect pip install dataproc_spark_connect-${VERSION}-py2.py3-none-any.whl `

Project details

Release history Release notifications | RSS feed

1.1.0

Apr 7, 2026

1.0.2

Feb 4, 2026

1.0.1

Dec 5, 2025

1.0.0

Dec 4, 2025

1.0.0rc7 pre-release

Nov 14, 2025

1.0.0rc6 pre-release

Oct 20, 2025

1.0.0rc5 pre-release

Sep 19, 2025

1.0.0rc4 pre-release

Aug 29, 2025

1.0.0rc3 pre-release

Aug 21, 2025

1.0.0rc2 pre-release

Aug 18, 2025

1.0.0rc1 pre-release

Aug 5, 2025

0.9.0

Aug 1, 2025

0.8.3

Jul 21, 2025

0.8.2

Jul 8, 2025

0.8.1

Jul 2, 2025

0.8.0

Jun 16, 2025

0.7.5

May 27, 2025

0.7.4

May 16, 2025

0.7.3

May 8, 2025

0.7.2

Apr 29, 2025

0.7.1

Apr 28, 2025

0.7.0

Apr 24, 2025

This version

0.6.0

Apr 22, 2025

0.2.1

Jan 30, 2025

0.2.0

Dec 5, 2024

0.1.0

Sep 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataproc_spark_connect-0.6.0.tar.gz (17.6 kB view details)

Uploaded Apr 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataproc_spark_connect-0.6.0-py2.py3-none-any.whl (21.5 kB view details)

Uploaded Apr 22, 2025 Python 2Python 3

File details

Details for the file dataproc_spark_connect-0.6.0.tar.gz.

File metadata

Download URL: dataproc_spark_connect-0.6.0.tar.gz
Upload date: Apr 22, 2025
Size: 17.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for dataproc_spark_connect-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`36da1b83ab0cd2781e5ab6c8eecffd33d52d77a7cba7afb6e22c86272e411efd`
MD5	`b0bf48efd075150ebb3dc7321c44316c`
BLAKE2b-256	`47e4920c57830255d3eced8a69ee8d37bbd203a388be9cc91f5aaccc29dda9da`

See more details on using hashes here.

File details

Details for the file dataproc_spark_connect-0.6.0-py2.py3-none-any.whl.

File metadata

Download URL: dataproc_spark_connect-0.6.0-py2.py3-none-any.whl
Upload date: Apr 22, 2025
Size: 21.5 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for dataproc_spark_connect-0.6.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`87a750c0f339af1658e31d0369b248a6eb2eb2e5645ea0297826429cab7fa9cf`
MD5	`9fd2c2b5462c5ec208c8c2cd46610ad5`
BLAKE2b-256	`8a0da6ee16f773163032dcc4c6c2bcb121b1f281729ffb3bd04f78049a4a9dff`

See more details on using hashes here.

dataproc-spark-connect 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes