Interact with Google Sheets from Databricks
Project description
databricks-google-sheets
Interact with Google Sheets from Databricks.
Please follow these steps. Be sure to replace all {{PLACEHOLDERS}} with appropriate values:
- Create a spreadsheet in Google Sheets, and note the spreadsheet_id (You can get this from the URL)
- Go to console.cloud.google.com and create a new project (OR select an existing project)
- Within this project, create a service account [IAM & Admin --> Service Accounts --> Create], and call it
databricks-google-sheets
. You can skip the optional steps - Note down the email address associated with your new service account. Eg.
databricks-google-sheets@{{MY-AWESOME-PROJECT}}.iam.gserviceaccount.com
- Switch to the Keys tab and create a new key. Choose JSON as the Key type. This will generate a JSON file and download it locally
- On your local machine, use the Databricks CLI to create a new secret scope. Eg.
databricks secrets create-scope --scope databricks-google-sheets --profile DEFAULT
- Create a new secret within the newly created secret scope. Eg.
databricks secrets put --scope databricks-google-sheets --key "databricks-google-sheets@{{MY-AWESOME-PROJECT}}.iam.gserviceaccount.com" --binary-file {{PATH-TO-JSON-FILE-FROM-STEP-5}} --profile DEFAULT
- Share your Google Sheet with the user
databricks-google-sheets@{{MY-AWESOME-PROJECT}}.iam.gserviceaccount.com
, and give Editor privileges
Usage within Databricks Python notebook
from databricks_google_sheets import DatabricksGoogleSheets
dgs = DatabricksGoogleSheets(dbutils) # Pass in dbutils from the global scope of your Databricks runtime
# OPTION 1 - Create a Pandas dataframe from a Spark SQL query in Python
df = spark.sql("SELECT 'Hello from databricks-google-sheets' greeting;").toPandas()
# OPTION 2 - Create a dataframe directly in Pandas
import pandas as pd
df = pd.DataFrame({'greeting': ['Hello from databricks-google-sheets']})
# Write this dataframe to Google Sheets
dgs.df_to_sheets(df, spreadsheet_id='{{SPREADSHEET-ID-FROM-STEP-1}}', sheet_name='test',
dbsecret_scope='databricks-google-sheets', dbsecret_key='databricks-google-sheets@{{MY-AWESOME-PROJECT}}.iam.gserviceaccount.com')
# Read from Google Sheets into a new dataframe
newdf = dgs.sheets_to_df(spreadsheet_id='{{SPREADSHEET-ID-FROM-STEP-1}}', sheet_name='test',
dbsecret_scope='databricks-google-sheets', dbsecret_key='databricks-google-sheets@{{MY-AWESOME-PROJECT}}.iam.gserviceaccount.com')
# Compare the new dataframe to the original to ensure they are the same
assert df.equals(newdf) == True
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file databricks-google-sheets-0.8.0.tar.gz
.
File metadata
- Download URL: databricks-google-sheets-0.8.0.tar.gz
- Upload date:
- Size: 2.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1ab24dbb9ad623e9f15a028b9fd1b9199ee7a8d137c50e645c51fa21bc6ec28 |
|
MD5 | 3099659cb45e5ac3414397ac337ae83e |
|
BLAKE2b-256 | 8141b3f94ab1bc570908a42720798c7b300bb780f4a59c798a16f5f1dc0319ee |