gspread-pandas

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

These details have not been verified by PyPI

Project links

Project description

author: Diego Fernandez

Links:

Overview

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames. It enables you to easily pull data from Google spreadsheets into DataFrames as well as push data into spreadsheets from DataFrames. It leverages gspread in the backend for most of the heavylifting, but it has a lot of added functionality to handle things specific to working with DataFrames as well as some extra nice to have features.

The target audience are Data Analysts and Data Scientists, but it can also be used by Data Engineers or anyone trying to automate workflows with Google Sheets and Pandas.

Some key goals/features:

Be easy to use interactively, with good docstrings and auto-completion
Nicely handle headers and indexes (including multi-level headers and merged cells)
Run on Jupyter, headless server, and/or scripts
Allow storing different user credentials or using Service Accounts
Automatically handle token refreshes
Enable handling of frozen rows and columns
Enable filling in all merged cells when pulling data
Nicely handle large data sets and auto-retries
Enable creation of filters
Handle retries when exceeding 100 second user quota
When pushing DataFrames with MultiIndex columns, allow merging or flattening headers
Ability to handle Spreadsheet permissions
Ability to specify ValueInputOption and ValueRenderOption for specific columns

Installation / Usage

To install use pip:

$ pip install gspread-pandas

Or clone the repo:

$ git clone https://github.com/aiguofer/gspread-pandas.git
$ python setup.py install

Before using, you will need to download Google client credentials for your app.

Client Credentials

To allow a script to use Google Drive API we need to authenticate our self towards Google. To do so, we need to create a project, describing the tool and generate credentials. Please use your web browser and go to Google console and :

Choose Create Project in popup menu on the top.
A dialog box appears, so give your project a name and click on Create button.
On the left-side menu click on API Manager.
A table of available APIs is shown. Switch Drive API and click on Enable API button. Do the same for Sheets API. Other APIs might be switched off, for our purpose.
On the left-side menu click on Credentials.
In section OAuth consent screen select your email address and give your product a name. Then click on Save button.
In section Credentials click on Add credentials and switch OAuth client ID (if you want to use your own account or enable the use of multiple accounts) or Service account key (if you prefer to have a service account interacting with spreadsheets).
If you select OAuth client ID:
- Select Application type item as Desktop app and give it a name.
- Click on Create button.
- Click on Download JSON icon on the right side of created OAuth client IDs and store the downloaded file on your file system.
If you select Service account key
- Click on Service account dropdown and select New service account
- Give it a Service account name and ignore the Role dropdown (unless you know you need this for something else, it’s not necessary for working with spreadsheets)
- Note the Service account ID as you might need to give that user permission to interact with your spreadsheets
- Leave Key type as JSON
- Click Create and store the downloaded file on your file system.
Please be aware, the file contains your private credentials, so take care of the file in the same way you care of your private SSH key; Move the downloaded JSON to ~/.config/gspread_pandas/google_secret.json (or you can configure the directory and file name by directly calling gspread_pandas.conf.get_config

Thanks to similar project df2gspread for this great description of how to get the client credentials.

You can read more about it in the configuration docs including how to change the default behavior.

Example

import pandas as pd
from gspread_pandas import Spread, Client

file_name = "http://stats.idre.ucla.edu/stat/data/binary.csv"
df = pd.read_csv(file_name)

# 'Example Spreadsheet' needs to already exist and your user must have access to it
spread = Spread('Example Spreadsheet')
# This will ask to authenticate if you haven't done so before

# Display available worksheets
spread.sheets

# Save DataFrame to worksheet 'New Test Sheet', create it first if it doesn't exist
spread.df_to_sheet(df, index=False, sheet='New Test Sheet', start='A2', replace=True)
spread.update_cells('A1', 'B1', ['Created by:', spread.email])
print(spread)
# <gspread_pandas.client.Spread - User: '<example_user>@gmail.com', Spread: 'Example Spreadsheet', Sheet: 'New Test Sheet'>

# You can now first instanciate a Client separately and query folders and
# instanciate other Spread objects by passing in the Client
client = Client()
# Assumming you have a dir called 'example dir' with sheets in it
available_sheets = client.find_spreadsheet_files_in_folders('example dir')
spreads = []
for sheet in available_sheets.get('example dir', []):
    spreads.append(Spread(sheet['id'], client=client))

Troubleshooting

EOFError in Rodeo

If you’re trying to use gspread_pandas from within Rodeo you might get an EOFError: EOF when reading a line error when trying to pass in the verification code. The workaround for this is to first verify your account in a regular shell. Since you’re just doing this to get your Oauth token, the spreadsheet doesn’t need to be valid. Just run this in shell:

python -c "from gspread_pandas import Spread; Spread('<user_key>','')"

Then follow the instructions to create and store the OAuth creds.

This action would increase the number of cells in the workbook above the limit of 10000000 cells.

IMO, Google sheets is not the right tool for large datasets. However, there’s probably good reaons you might have to use it in such cases. When uploading a large DataFrame, you might run into this error.

By default, Spread.df_to_sheet will add rows and/or columns needed to accomodate the DataFrame. Since a new sheet contains a fairly large number of columns, if you’re uploading a DF with lots of rows you might exceed the max number of cells in a worksheet even if your data does not. In order to fix this you have 2 options:

The easiest is to pass replace=True, which will first resize the worksheet and clear out all values.
Another option is to first resize to 1x1 using Spread.sheet.resize(1, 1) and then do df_to_sheet

There’s a strange caveat with resizing, so going to 1x1 first is recommended (replace=True already does this). To read more see this issue

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

3.3.0

Feb 13, 2024

3.2.3

Aug 31, 2023

3.2.2

Jun 29, 2022

3.2.1

Jun 29, 2022

3.2.0

Mar 20, 2022

3.0.4

Jan 18, 2022

3.0.3

Jan 5, 2022

3.0.2

Dec 29, 2021

3.0.0

Dec 29, 2021

2.3.1

Nov 30, 2021

2.3.0

Mar 22, 2021

2.2.4

Mar 22, 2021

2.2.3

Mar 26, 2020

2.2.2

Mar 21, 2020

2.2.1

Jan 6, 2020

2.2.0

Nov 18, 2019

2.1.4

Nov 18, 2019

2.1.3

Aug 25, 2019

2.1.2

Jul 11, 2019

2.1.1

Mar 22, 2021

2.1.0

Mar 22, 2021

2.0.0

Jun 14, 2019

1.3.1

May 18, 2019

1.3.0

Apr 30, 2019

1.2.2

Apr 16, 2019

1.2.1

Aug 30, 2018

1.1.3

Jul 8, 2018

1.1.2

Jun 23, 2018

1.1.1

Jun 13, 2018

1.1.0

Jun 2, 2018

1.0.5

Apr 14, 2018

1.0.4

Apr 8, 2018

1.0.3

Apr 2, 2018

1.0.2

Jun 14, 2019

1.0.1

Mar 26, 2018

1.0.0

Mar 26, 2018

0.16.4

Mar 27, 2018

0.16.3

Mar 27, 2018

0.16.2

Mar 26, 2018

0.16.1

Mar 24, 2018

0.16.0

Mar 27, 2018

0.15.6

Mar 12, 2018

0.15.5

Mar 12, 2018

0.15.4

Feb 13, 2018

0.15.3

Nov 21, 2017

0.15.2

Nov 18, 2017

0.15.1

Oct 5, 2017

0.15.0

Sep 11, 2017

0.14.3

Jun 22, 2017

0.14.2

Jun 19, 2017

0.14.1

Jun 5, 2017

0.14.0

May 25, 2017

0.13.0

Apr 28, 2017

0.12.1

Apr 25, 2017

0.12.0

Mar 31, 2017

0.11.2

Mar 22, 2017

0.11.1

Mar 22, 2017

0.11.0

Feb 15, 2017

0.10.1

Jan 26, 2017

0.10.0

Jan 18, 2017

0.9

Dec 8, 2016

0.8

Nov 11, 2016

0.7

Nov 11, 2016

0.6

Oct 27, 2016

0.5

Oct 19, 2016

0.4

Oct 19, 2016

0.3

Oct 19, 2016

0.2

Oct 12, 2016

0.1

Oct 12, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gspread-pandas-3.3.0.tar.gz (29.5 kB view details)

Uploaded Feb 13, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gspread_pandas-3.3.0-py2.py3-none-any.whl (27.3 kB view details)

Uploaded Feb 13, 2024 Python 2Python 3

File details

Details for the file gspread-pandas-3.3.0.tar.gz.

File metadata

Download URL: gspread-pandas-3.3.0.tar.gz
Upload date: Feb 13, 2024
Size: 29.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for gspread-pandas-3.3.0.tar.gz
Algorithm	Hash digest
SHA256	`aac84bd63594db6271ad2cfe10be64614ea5d1129d063ca57b40c2b9dcc18013`
MD5	`4d1972299a19b787f698dc5bd9260038`
BLAKE2b-256	`5e5c851abfb9adf4e70b232f3b6e0237e416776aae9257b59eaa0f44c0960084`

See more details on using hashes here.

File details

Details for the file gspread_pandas-3.3.0-py2.py3-none-any.whl.

File metadata

Download URL: gspread_pandas-3.3.0-py2.py3-none-any.whl
Upload date: Feb 13, 2024
Size: 27.3 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for gspread_pandas-3.3.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`a4dfddd7b1c5418742e30099a01766ecd96d1f5bbcad8b1e1060c4a5f16fd627`
MD5	`55a4abb250d3123f4527b36f63b47aef`
BLAKE2b-256	`55408839c83d13d31687e28d2eed21ae66f77889dcdbfcedc71ae6a0aa93863f`

See more details on using hashes here.

gspread-pandas 3.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Overview

Installation / Usage

Client Credentials

Example

Troubleshooting

EOFError in Rodeo

This action would increase the number of cells in the workbook above the limit of 10000000 cells.

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes