dataproc-spark-connect

Dataproc client library for Spark Connect

Project description

# Dataproc Spark Connect Client

A wrapper of the Apache [Spark Connect](https://spark.apache.org/spark-connect/) client with additional functionalities that allow applications to communicate with a remote Dataproc Spark Session using the Spark Connect protocol without requiring additional steps.

## Install

`sh pip install dataproc_spark_connect `

## Uninstall

`sh pip uninstall dataproc_spark_connect `

## Setup

This client requires permissions to manage [Dataproc Sessions and Session Templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam). If you are running the client outside of Google Cloud, you must set following environment variables:

GOOGLE_CLOUD_PROJECT - The Google Cloud project you use to run Spark workloads
GOOGLE_CLOUD_REGION - The Compute Engine [region](https://cloud.google.com/compute/docs/regions-zones#available) where you run the Spark workload.
GOOGLE_APPLICATION_CREDENTIALS - Your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)

## Usage

Install the latest version of Dataproc Python client and Dataproc Spark Connect modules:

`sh pip install google_cloud_dataproc dataproc_spark_connect --force-reinstall `
Add the required imports into your PySpark application or notebook and start a Spark session with the following code instead of using environment variables:

`python from google.cloud.dataproc_spark_connect import DataprocSparkSession from google.cloud.dataproc_v1 import Session session_config = Session() session_config.environment_config.execution_config.subnetwork_uri = '<subnet>' session_config.runtime_config.version = '2.2' spark = DataprocSparkSession.builder.dataprocSessionConfig(session_config).getOrCreate() `

### Using Spark SQL Magic Commands (Jupyter Notebooks)

The package supports the [sparksql-magic](https://github.com/cryeo/sparksql-magic) library for executing Spark SQL queries directly in Jupyter notebooks.

Installation: To use magic commands, install the required dependencies manually: `bash pip install dataproc-spark-connect pip install IPython sparksql-magic `

Load the magic extension: `python %load_ext sparksql_magic `
Configure default settings (optional): `python %config SparkSql.limit=20 `
Execute SQL queries: `python %%sparksql SELECT * FROM your_table `
Advanced usage with options: `python # Cache results and create a view %%sparksql --cache --view result_view df SELECT * FROM your_table WHERE condition = true `

Available options: - –cache / -c: Cache the DataFrame - –eager / -e: Cache with eager loading - –view VIEW / -v VIEW: Create a temporary view - –limit N / -l N: Override default row display limit - variable_name: Store result in a variable

See [sparksql-magic](https://github.com/cryeo/sparksql-magic) for more examples.

Note: Magic commands are optional. If you only need basic DataprocSparkSession functionality without Jupyter magic support, install only the base package: `bash pip install dataproc-spark-connect `

## Developing

For development instructions see [guide](DEVELOPING.md).

## Contributing

We’d love to accept your patches and contributions to this project. There are just a few small guidelines you need to follow.

### Contributor License Agreement

Contributions to this project must be accompanied by a Contributor License Agreement. You (or your employer) retain the copyright to your contribution; this simply gives us permission to use and redistribute your contributions as part of the project. Head over to <https://cla.developers.google.com> to see your current agreements on file or to sign a new one.

You generally only need to submit a CLA once, so if you’ve already submitted one (even if it was for a different project), you probably don’t need to do it again.

### Code reviews

All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose. Consult [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more information on using pull requests.

Project details

Release history Release notifications | RSS feed

1.1.0

Apr 7, 2026

1.0.2

Feb 4, 2026

1.0.1

Dec 5, 2025

1.0.0

Dec 4, 2025

1.0.0rc7 pre-release

Nov 14, 2025

This version

1.0.0rc6 pre-release

Oct 20, 2025

1.0.0rc5 pre-release

Sep 19, 2025

1.0.0rc4 pre-release

Aug 29, 2025

1.0.0rc3 pre-release

Aug 21, 2025

1.0.0rc2 pre-release

Aug 18, 2025

1.0.0rc1 pre-release

Aug 5, 2025

0.9.0

Aug 1, 2025

0.8.3

Jul 21, 2025

0.8.2

Jul 8, 2025

0.8.1

Jul 2, 2025

0.8.0

Jun 16, 2025

0.7.5

May 27, 2025

0.7.4

May 16, 2025

0.7.3

May 8, 2025

0.7.2

Apr 29, 2025

0.7.1

Apr 28, 2025

0.7.0

Apr 24, 2025

0.6.0

Apr 22, 2025

0.2.1

Jan 30, 2025

0.2.0

Dec 5, 2024

0.1.0

Sep 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataproc_spark_connect-1.0.0rc6.tar.gz (25.0 kB view details)

Uploaded Oct 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataproc_spark_connect-1.0.0rc6-py2.py3-none-any.whl (29.3 kB view details)

Uploaded Oct 20, 2025 Python 2Python 3

File details

Details for the file dataproc_spark_connect-1.0.0rc6.tar.gz.

File metadata

Download URL: dataproc_spark_connect-1.0.0rc6.tar.gz
Upload date: Oct 20, 2025
Size: 25.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for dataproc_spark_connect-1.0.0rc6.tar.gz
Algorithm	Hash digest
SHA256	`235b82e660ef18e50499ee157b54a23a4d3abd6cab3acaacbec5959404d90c12`
MD5	`c6be584a9da581ec273ef1beea37c524`
BLAKE2b-256	`058166dcc4537f1963cb5fddc1f22907d4415c4e72465e022e4c2c60dba741fd`

See more details on using hashes here.

File details

Details for the file dataproc_spark_connect-1.0.0rc6-py2.py3-none-any.whl.

File metadata

Download URL: dataproc_spark_connect-1.0.0rc6-py2.py3-none-any.whl
Upload date: Oct 20, 2025
Size: 29.3 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for dataproc_spark_connect-1.0.0rc6-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`a40e1ab5e30deada341c3f0c16faf1fe9ee8c773c130d08383a1d57dce8995bd`
MD5	`c91dad843ad6d6d756ceab85dd33ba0d`
BLAKE2b-256	`19b58f0b40fff870041639e9dd2f6fc933bdc7d16c319aa219ab7e6e4190b1f5`

See more details on using hashes here.

dataproc-spark-connect 1.0.0rc6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes