Skip to main content

Dataproc client library for Spark Connect

Project description

# Dataproc Spark Connect Client

A wrapper of the Apache [Spark Connect](https://spark.apache.org/spark-connect/) client with additional functionalities that allow applications to communicate with a remote Dataproc Spark Session using the Spark Connect protocol without requiring additional steps.

## Install

`sh pip install dataproc_spark_connect `

## Uninstall

`sh pip uninstall dataproc_spark_connect `

## Setup

This client requires permissions to manage [Dataproc Sessions and Session Templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam). If you are running the client outside of Google Cloud, you must set following environment variables:

## Usage

  1. Install the latest version of Dataproc Python client and Dataproc Spark Connect modules:

    `sh pip install google_cloud_dataproc dataproc_spark_connect --force-reinstall `

  2. Add the required imports into your PySpark application or notebook and start a Spark session with the following code instead of using environment variables:

    `python from google.cloud.dataproc_spark_connect import DataprocSparkSession from google.cloud.dataproc_v1 import Session session_config = Session() session_config.environment_config.execution_config.subnetwork_uri = '<subnet>' session_config.runtime_config.version = '2.2' spark = DataprocSparkSession.builder.dataprocSessionConfig(session_config).getOrCreate() `

## Developing

For development instructions see [guide](DEVELOPING.md).

## Contributing

We’d love to accept your patches and contributions to this project. There are just a few small guidelines you need to follow.

### Contributor License Agreement

Contributions to this project must be accompanied by a Contributor License Agreement. You (or your employer) retain the copyright to your contribution; this simply gives us permission to use and redistribute your contributions as part of the project. Head over to <https://cla.developers.google.com> to see your current agreements on file or to sign a new one.

You generally only need to submit a CLA once, so if you’ve already submitted one (even if it was for a different project), you probably don’t need to do it again.

### Code reviews

All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose. Consult [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more information on using pull requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataproc_spark_connect-0.7.2.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataproc_spark_connect-0.7.2-py2.py3-none-any.whl (21.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file dataproc_spark_connect-0.7.2.tar.gz.

File metadata

  • Download URL: dataproc_spark_connect-0.7.2.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for dataproc_spark_connect-0.7.2.tar.gz
Algorithm Hash digest
SHA256 677d9422bac3d797dcb3b8c5608d8b2cfc3a2c0d2b576ff8a9610568972b16ed
MD5 8303e951c7d5bc44ae77a8e6afbd2ab8
BLAKE2b-256 6ea49a87016e6a4dd75a4052c15f67fa361a7c1bbb79920f16c5b2262f3673d3

See more details on using hashes here.

File details

Details for the file dataproc_spark_connect-0.7.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for dataproc_spark_connect-0.7.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5c4ffc733df5ead5c504b548174e32b6d5ebb89a2da904ecddd469c6fdd668cc
MD5 9dc9030a6675d654d787db6792f5bd9d
BLAKE2b-256 4b50d8c3f043370a0a9cb08733103798b83d4af4037ab6ebc20e2c018dfc1ebf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page