Skip to main content

Databricks Connect Client

Project description

Databricks Connect is a Spark client library that lets you connect your favorite IDE (IntelliJ, Eclipse, PyCharm, and so on), notebook server (Zeppelin, Jupyter, RStudio), and other custom applications to Databricks clusters and run Spark code.

To get started, run databricks-connect configure after installation.

Overview

Databricks Connect allows you to write jobs using Spark native APIs and have them execute remotely on a Databricks cluster instead of in the local Spark session.

For example, when you run the DataFrame command spark.read.parquet(...). groupBy(...).agg(...).show() using Databricks Connect, the parsing and planning of the job runs on your local machine. Then, the logical representation of the job is sent to the Spark server running in Databricks for execution in the cluster.

With Databricks Connect, you can:

  • Run large-scale Spark jobs from any Python, Java, Scala, or R application. Anywhere you can import pyspark, import org.apache.spark, or require(SparkR), you can now run Spark jobs directly from your application, without needing to install any IDE plugins or use Spark submission scripts.
  • Step through and debug code in your IDE even when working with a remote cluster.
  • Iterate quickly when developing libraries. You do not need to restart the cluster after changing Python or Java library dependencies in Databricks Connect, because each client session is isolated from each other in the cluster.
  • Shut down idle clusters without losing work. Because the client session is decoupled from the cluster, it is unaffected by cluster restarts or upgrades, which would normally cause you to lose all the variables, RDDs, and DataFrame objects defined in a notebook.

LICENSE

Copyright (2018) Databricks, Inc. This library (the "Software") may not be used except in connection with the Licensees use of the Databricks Platform Services pursuant to an Agreement (defined below) between Licensee (defined below) and Databricks, Inc. ("Databricks"). This Software shall be deemed part of the "Subscription Services" under the Agreement, or if the Agreement does not define Subscription Services, then the term in such Agreement that refers to the applicable Databricks Platform Services (as defined below) shall be substituted herein for "Subscription Services." Licensees use of the Software must comply at all times with any restrictions applicable to the Subscription Services, generally, and must be used in accordance with any applicable documentation. If you have not agreed to an Agreement or otherwise do not agree to these terms, you may not use the Software. This license terminates automatically upon the termination of the Agreement or Licensees breach of these terms. Agreement: the agreement between Databricks and Licensee governing the use of the Databricks Platform Services, which shall be, with respect to Databricks, the Databricks Terms of Service located at www.databricks.com/termsofservice, and with respect to Databricks Community Edition, the Community Edition Terms of Service located at www.databricks.com/ce-termsofuse, in each case unless Licensee has entered into a separate written agreement with Databricks governing the use of the applicable Databricks Platform Services. Databricks Platform Services: the Databricks services or the Databricks Community Edition services, according to where the Software is used. Licensee: the user of the Software, or, if the Software is being used on behalf of a company, the company.

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-connect-10.4.60.tar.gz (305.2 MB view details)

Uploaded Source

File details

Details for the file databricks-connect-10.4.60.tar.gz.

File metadata

  • Download URL: databricks-connect-10.4.60.tar.gz
  • Upload date:
  • Size: 305.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.6.15

File hashes

Hashes for databricks-connect-10.4.60.tar.gz
Algorithm Hash digest
SHA256 0b62bb79d65e5ed1ffc87b05317efcbe8a3197f35edf7781c8ebd49c63951683
MD5 62a482565d4ee874c615a9044485d30c
BLAKE2b-256 79bfd91520d7216dfac7142dd75040ff02079a1c4204c461fb3f6b3a473719e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page