Interact with Google Sheets from Databricks
Project description
databricks-google-sheets
Interact with Google Sheets from Databricks.
Please follow these steps. Be sure to replace all {{PLACEHOLDERS}} with appropriate values:
- Create a spreadsheet in Google Sheets, and note the spreadsheet_id (You can get this from the URL)
- Go to console.cloud.google.com and create a new project (OR select an existing project)
- Within this project, create a service account [IAM & Admin --> Service Accounts --> Create], and call it
databricks-google-sheets
. You can skip the optional steps - Note down the email address associated with your new service account. Eg.
databricks-google-sheets@{{MY-AWESOME-PROJECT}}.iam.gserviceaccount.com
- Switch to the Keys tab and create a new key. Choose JSON as the Key type. This will generate a JSON file and download it locally
- On your local machine, use the Databricks CLI to create a new secret scope. Eg.
databricks secrets create-scope --scope databricks-google-sheets --profile DEFAULT
- Create a new secret within the newly created secret scope. Eg.
databricks secrets put --scope databricks-google-sheets --key "databricks-google-sheets@{{MY-AWESOME-PROJECT}}.iam.gserviceaccount.com" --binary-file {{PATH-TO-JSON-FILE-FROM-STEP-5}} --profile DEFAULT
- Share your Google Sheet with the user
databricks-google-sheets@{{MY-AWESOME-PROJECT}}.iam.gserviceaccount.com
, and give Editor privileges
Usage within Databricks Python notebook
from databricks_google_sheets import DatabricksGoogleSheets
dgs = DatabricksGoogleSheets(dbutils) # Pass in dbutils from the global scope of your Databricks runtime
# OPTION 1 - Create a Pandas dataframe from a Spark SQL query in Python
df = spark.sql("SELECT 'Hello from databricks-google-sheets' greeting;").toPandas()
# OPTION 2 - Create a dataframe directly in Pandas
import pandas as pd
df = pd.DataFrame({'greeting': ['Hello from databricks-google-sheets']})
# Write this dataframe to Google Sheets
dgs.df_to_sheets(df, spreadsheet_id='{{SPREADSHEET-ID-FROM-STEP-1}}', sheet_name='test',
dbsecret_scope='databricks-google-sheets', dbsecret_key='databricks-google-sheets@{{MY-AWESOME-PROJECT}}.iam.gserviceaccount.com')
# Read from Google Sheets into a new dataframe
newdf = dgs.sheets_to_df(spreadsheet_id='{{SPREADSHEET-ID-FROM-STEP-1}}', sheet_name='test',
dbsecret_scope='databricks-google-sheets', dbsecret_key='databricks-google-sheets@{{MY-AWESOME-PROJECT}}.iam.gserviceaccount.com')
# Compare the new dataframe to the original to ensure they are the same
assert df.equals(newdf) == True
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for databricks-google-sheets-0.8.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1ab24dbb9ad623e9f15a028b9fd1b9199ee7a8d137c50e645c51fa21bc6ec28 |
|
MD5 | 3099659cb45e5ac3414397ac337ae83e |
|
BLAKE2b-256 | 8141b3f94ab1bc570908a42720798c7b300bb780f4a59c798a16f5f1dc0319ee |