Databricks Connect Client
Databricks Connect is a Spark client library that lets you connect your favorite IDE (IntelliJ, Eclipse, PyCharm, and so on), notebook server (Zeppelin, Jupyter, RStudio), and other custom applications to Databricks clusters and run Spark code.
To get started, run databricks-connect configure after installation.
Databricks Connect allows you to write jobs using Spark native APIs and have them execute remotely on a Databricks cluster instead of in the local Spark session.
For example, when you run the DataFrame command spark.read.parquet(...).groupBy(...).agg(...).show() using Databricks Connect, the parsing and planning of the job runs on your local machine. Then, the logical representation of the job is sent to the Spark server running in Databricks for execution in the cluster.
With Databricks Connect, you can:
- Run large-scale Spark jobs from any Python, Java, Scala, or R application. Anywhere you can import pyspark, import org.apache.spark, or require(SparkR), you can now run Spark jobs directly from your application, without needing to install any IDE plugins or use Spark submission scripts.
- Step through and debug code in your IDE even when working with a remote cluster.
- Iterate quickly when developing libraries. You do not need to restart the cluster after changing Python or Java library dependencies in Databricks Connect, because each client session is isolated from each other in the cluster.
- Shut down idle clusters without losing work. Because the client session is decoupled from the cluster, it is unaffected by cluster restarts or upgrades, which would normally cause you to lose all the variables, RDDs, and DataFrame objects defined in a notebook.
Copyright (2018) Databricks, Inc.
This library (the “Software”) may not be used except in connection with the Licensee’s use of the Databricks Platform Services pursuant to an Agreement (defined below) between Licensee (defined below) and Databricks, Inc. (“Databricks”). This Software shall be deemed part of the “Subscription Services” under the Agreement, or if the Agreement does not define Subscription Services, then the term in such Agreement that refers to the applicable Databricks Platform Services (as defined below) shall be substituted herein for “Subscription Services.” Licensee’s use of the Software must comply at all times with any restrictions applicable to the Subscription Services, generally, and must be used in accordance with any applicable documentation. If you have not agreed to an Agreement or otherwise do not agree to these terms, you may not use the Software. This license terminates automatically upon the termination of the Agreement or Licensee’s breach of these terms.
Agreement: the agreement between Databricks and Licensee governing the use of the Databricks Platform Services, which shall be, with respect to Databricks, the Databricks Terms of Service located at www.databricks.com/termsofservice, and with respect to Databricks Community Edition, the Community Edition Terms of Service located at www.databricks.com/ce-termsofuse, in each case unless Licensee has entered into a separate written agreement with Databricks governing the use of the applicable Databricks Platform Services.
Databricks Platform Services: the Databricks services or the Databricks Community Edition services, according to where the Software is used.
Licensee: the user of the Software, or, if the Software is being used on behalf of a company, the company.