Access the Tables of your Google Sheets as Pandas Dataframes and write them to a database

These details have not been verified by PyPI

Project links

Project description

Google Sheets Tables extraction

Use Google Spreadsheet Tables (only Tables) as Pandas Dataframes or save them to a SQL database.

pip install gsheetstables

PyPi page: https://pypi.org/project/gsheetstables/

In case you want your distribution’s dependencies, here are what’s needed in Fedora:

dnf install -y \
    python3-unidecode \
    python3-dotmap \
    python3-pandas \
    python3-google-auth-oauthlib \
    python3-google-api-client \
    python3-jinja2 \
    python3-sqlalchemy

Command Line tool

The tool does one thing and does it well: Makes database tables of all the Google Sheets Tables (only Tables) found on the spreadsheet. On any database; tested with SQLite, MariaDB and PostgreSQL. Just make sure you have the correct SQLAlchemy driver installed. Simplest example with SQLite:

gsheetstables2db -s 1zYR...tT8

This will create the SQLite database on file tables.sqlite with all tables from GSheet 1zYR...tT8.

Execute some SQL queries after (or before, with --sql-pre) the tables were loaded/created:

gsheetstables2db -s 1zYR...tT8 \
    --table-prefix _raw_tables_ \
    --sql-split-char § \
    --sql-post "{% for table in tables %}create index if not exists idx_snapshot_{{table}} on _raw_tables_{{table}} (_gsheet_utc_timestamp) § create view if not exists {{table}} as select * from _raw_tables_{{table}} where _gsheet_utc_timestamp=(select max(_gsheet_utc_timestamp) from _raw_tables_{{table}}) § {% endfor %}"

Prepend “mysheet_” to all table names in DB, keep up to 6 snapshots of each table (after running it multiple times) and save a column with the row numbers that users see in GSpread:

gsheetstables2db -s 1zYR...tT8 \
    --table-prefix mysheet_ \
    --keep-snapshots 6 \
    --row-numbers

Write it to a MariaDB/MySQL database accessible through local socket:

pip install mysql-connector-python
gsheetstables2db -s 1zYR...tT8 --db mariadb://localhost/marketing_db

Run it regularly via `cron` or a `systemd` timer

I have the following in my personal ~/.config/systemd/user/gsheetstables.service:

[Unit]
Description=Sync tables from Investment Dashboard Google Sheet into MariaDB

[Service]
Type=oneshot
ExecStart=%h/.local/bin/gsheetstables2db \
    --sheet 1i…so \
    --db mariadb://localhost/my_db \
    --table-prefix raw__ \
    --keep-snapshots 100 \
    --sql-split-char § \
    --sql-post "\
        {% for table in tables %} \
            CREATE INDEX IF NOT EXISTS idx_snapshot_{{table}} \
            ON raw__{{table}} (_gsheet_utc_timestamp) § \
            DROP VIEW IF EXISTS {{table}} § \
            CREATE VIEW {{table}} AS \
                SELECT * FROM raw__{{table}} \
                WHERE _gsheet_utc_timestamp=( \
                    SELECT max(_gsheet_utc_timestamp) FROM raw__{{table}} \
                ) § \
        {% endfor %} \
    " \
    --rename ' \
        { \
            "table_1": { \
                "Original column name": "new_name1", \
                "Other original column name/tag": "new_name2" \
            }, \
            "table_2": { \
                "Once more an original column name": "another_new_name1", \
                "And again another original/nice column name": "new_name2" \
            } \
        } \
    ' \
    --service-account gsheets-access@my-app.iam.gserviceaccount.com \
    --service-account-private-key "MII…F4c="

And my personal ~/.config/systemd/user/gsheetstables.timer contains:

[Unit]
Description=Run Google Sheets sync frequently

[Timer]
OnCalendar=*:0/20

[Install]
WantedBy=timers.target

Every 20 minutes, it will try to sync and update my DB tables with the Google Sheets Tables. Database tables will only be updated if its correspondent GSheets Table has been modified. I’m connecting to a local MariaDB server configured to accept authenticated connections via UDP socket (mariadb://localhost/…), which eliminates the need for passwords.

This single command contains everything thats is needed for a successful run, no external config file is needed. The …very long private key… is computed and displayed to you when you run gsheetstables2db -i SERVICE_ACCOUNT_FILE -vv. Grab that long string, use it in the command line, along with --service-account, and then you can discard the service account JSON file.

This command writes and logs up to 100 versions the my GSheets Tables into my DB with prefix raw__ and then create views without that prefix with only the last snapshot of that table. Data consumers should use the view, not the raw table. The raw tables contains the time-tagged history of changes to each table.

Test it before scheduling it:

systemctl --user daemon-reload;
systemctl --user start gsheetstables.service;
systemctl --user status gsheetstables.service

When you are happy with results in your DB, enable the scheduler:

systemctl --user enable --now gsheetstables.timer

Table Foreign Keys

Here is an example with foreign keys. They will be defined for each table after the ETL, and, to avoid problems during data copy, the --sql-pre script removes temporarily the FKs and unique indices.

gsheetstables2db \
    --db postgresql+psycopg:///business_intelligence \
    --sql-pre "
        {%
            set foreign_keys = dict(
                sales = dict(
                    id_client = 'client(id)',
                    id_product = 'product(id)'
                ),
                client_addresses = dict(
                    id_client = 'client(id)'
                ),
            )
        %}

        {% for table, fks in foreign_keys.items() %}
            {% for fk, target in fks.items() %}
                DO $$$$
                BEGIN
					ALTER TABLE IF EXISTS {{table}}
					DROP CONSTRAINT IF EXISTS fk_table_{{table}}_column_{{fk}};
                END $$$$; §
            {% endfor %}
        {% endfor %}

		DROP INDEX IF EXISTS idx_product_id;
		DROP INDEX IF EXISTS idx_client_id;
    " \
    --sql-post "
        CREATE UNIQUE INDEX IF NOT EXISTS idx_client_id  ON client  (id); §
        CREATE UNIQUE INDEX IF NOT EXISTS idx_product_id ON product (id); §

        {%
            set foreign_keys = dict(
                sales = dict(
                    id_client = 'client(id)',
                    id_product = 'product(id)'
                ),
                client_addresses = dict(
                    id_client = 'client(id)'
                ),
            )
        %}

        {% for table, fks in foreign_keys.items() %}
            {% for fk, target in fks.items() %}
                DO $$$$
                BEGIN
                    IF NOT EXISTS (
                        SELECT 1
                        FROM information_schema.table_constraints
                        WHERE
                            table_schema = 'public' AND
                            table_name = '{{table}}' AND
                            constraint_name = 'fk_table_{{table}}_column_{{fk}}'
                    ) THEN
                        ALTER TABLE {{table}}
                        ADD CONSTRAINT fk_table_{{table}}_column_{{fk}}
                        FOREIGN KEY ({{fk}})
                        REFERENCES {{target}}
                        ON DELETE CASCADE;
                    END IF;
                END $$$$; §
            {% endfor %}
        {% endfor %}
    " \
    --sql-split-char § \
    --sheet 1i…so \
    --service-account gsheets-access@my-app.iam.gserviceaccount.com \
    --service-account-private-key "MII…F4c="

This Jinja template:

Defines a map of tables and their columns that are foreign keys, and which table and column it references.
Create a unique index in master tables (client, product) -- a requirement to be a foreign key reference.
Iterate over the map and defines each foreign key, but only if it was not defined yet.

This is the way of doing it idempotently on PostgreSQL. Other DBs might have simpler ways with ALTER TABLE ... ADD CONSTRAINT IF NOT EXISTS ....

SQLAlchemy Drivers Reference

Here are SQLAlchemy URL examples along with drivers required for connectors (table provided by ChatGPT and then edited a bit):

Database	Example SQLAlchemy URL	Driver / Package to install	Notes
MariaDB via local socket	`mariadb://localhost/sales_db`	`dnf install python3-mysqlclient` or `pip install mysqlclient`	Unix user must match a MariaDB user configured with unix_socket
MariaDB with regular user	`mariadb://dbuser:dbpass@mariadb.example.com:3306/sales_db`	`dnf install python3-mysqlclient` or `pip install mysqlclient`	Native MariaDB driver
MariaDB (alt)	`mysql+pymysql://dbuser:dbpass@mariadb.example.com:3306/sales_db?charset=utf8mb4`	`pip install pymysql`	Pure Python
PostgreSQL via local socket	`postgresql+psycopg:///analytics_db` (note the 3 slashes)	`dnf install python3-psycopg3 python3-sqlalchemy+postgresql` or `pip install psycopg[binary]`	Recommended
PostgreSQL	`postgresql+psycopg://dbuser:dbpass@postgres.example.com:5432/analytics_db`	`dnf install python3-psycopg3 python3-sqlalchemy+postgresql` or `pip install psycopg[binary]`	Recommended
PostgreSQL (legacy)	`postgresql+psycopg2://dbuser:dbpass@postgres.example.com:5432/analytics_db`	`pip install psycopg2-binary`	Legacy
Oracle	`oracle+oracledb://dbuser:dbpass@oracle.example.com:1521/?service_name=ORCLPDB1`	`pip install oracledb`	Thin mode (no Oracle Client)
AWS Athena	`awsathena+rest://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@athena.us-east-1.amazonaws.com:443/my_schema?s3_staging_dir=s3://my-athena-results/&work_group=primary`	`pip install sqlalchemy-athena`	Uses REST API
Databricks SQL	`databricks+connector://token:dapiXXXXXXXXXXXXXXXX@adb-123456789012.3.azuredatabricks.net:443/default?http_path=/sql/1.0/warehouses/abc123`	`pip install databricks-sql-connector sqlalchemy-databricks`	Token-based auth

API Usage

Initialize and bring all tables (only tables) from a Google Sheet:

import gsheetstables

account_file = "account.json"
gsheetid = "1zYR7Hlo7EtmY6...tT8"

tables = gsheetstables.GSheetsTables(
    gsheetid             = gsheetid,
    service_account_file = account_file,
    slugify              = True
)

This is done very efficiently, doing exactly 2 calls to Google’s API. One for table discovery and second one to retrieve all tables data at once.

See bellow how to get the service account file

Tables retrieved:

>>> tables.tables
[
    'products',
    'clients',
    'sales'
]

Use the tables as Pandas Dataframes.

tables.t('products')

ID	Name	Price
1	Laptop	999.99
2	Smartphone	699.00
3	Headphones	149.50
4	Keyboard	89.90

Sheet rows that are completeley empty will be removed from resulting dataframe. But the index will always match the Google Sheet row number as seen by spreadsheet users. So you can use loc method to get a specific sheet row number:

tables.t('products').loc[1034]

There is also:

tables.p('products')

Which returns a dict() of table properties as reported by Google Sheets API.

Handling Date and Time

Consider this:

tables.t('clients')

ID	Name	birthdate	affiliated
1	Alice Silva	1990-05-12T00:00:00-03:00	2021-03-15T10:45:00-03:00
2	Bruno Costa	1985-11-23T00:00:00-03:00	2019-08-02T14:20:00-03:00
3	Carla Mendes	1998-02-07T00:00:00-03:00	2022-01-10T09:00:00-03:00
4	Daniel Rocha	1976-09-30T00:00:00-03:00	2015-06-25T16:35:00-03:00

Notice that Google Sheets Table columns of type DATE (which may contain also time) will be converted to pandas.Timestamps and the spreadsheet timezone will be associated to it, aiming at minimum loss of data. If you want just naive dates, as they are probably formated in your sheets, use Pandas like this:

(
    tables.t('clients')
    .assign(
        birthdate  = lambda table: table.birthdate.dt.normalize().dt.tz_localize(None),
        affiliated = lambda table: table.affiliated.dt.normalize().dt.tz_localize(None),
    )
)

ID	Name	birthdate	affiliated
1	Alice Silva	1990-05-12	2021-03-15
2	Bruno Costa	1985-11-23	2019-08-02
3	Carla Mendes	1998-02-07	2022-01-10
4	Daniel Rocha	1976-09-30	2015-06-25

Remember that the complete concept of universal and portable Time always includes date, time and timezone. Displaying as just the date is an abbreviation that assumes interpretation by the reader. Information that seems to contain just a date, is actually stored as the starting midnight of that day, in the timezone of the spreadsheet. If that date is describing a business transaction, it probably didn't happen at that moment, most likely closer to the mid of the day.

Your spreadsheet must display timestamps as date and time to reduce ambiguity. Example of ambiguity is Alices‘s birthday as it is actually stored by your spreadsheet: 1990-05-12T00:00:00-03:00 (not just 1990-05-12 as spreadsheet formatting shows). This timestamp is a different day in other timezones, for example, it is the same moment in Time as timestamp 1990-05-11T23:00:00-04:00 (late night of the previous day of another time zone).

If you hide time and timezone from users, specially the ones that input data, you are increasing the chance of ambiguity. Data processing must always, ALWAYS, consider and handle time and timezone.

Column names normalization

People that edit spreadsheets can get creative when naming columns. Pass slugify=True (the default) to:

transliterate accents and international characters with Unidecode
convert /, :, #, <, >, and spaces to _
remove ?, !, ", '
lowercase all characters

So a column named Column with strange chars/letters will become column_with_strange_chars_letters.

Still pretty long and annoying for your later SQL, so in addition, you can pass a dict for custom column renaming as:

tables = gsheetstables.GSheetsTables(
    ...
    column_rename_map = {
        "table_1": {
            "Column with strange chars/letters": "short_name",
            "Other crazy column name": "other_short_name",
        },
        "table_2": {
            "Column with strange chars/letters": "short_name",
            "Other crazy column name": "other_short_name",
        }
    },
    ...
)

Pass only the columns you want to rename. Combine with slugify=True to have a complete service. Your column_rename_map dict will have priority over slugification.

What are Google Sheets Tables

Tables feature was introduced in 2024-05 and they look like this:

Google Sheets Tables

More than looks, Tables have structure:

table names are unique
columns have names
columns have types as number, date, text, dropdown (kind of categories)
cells have validation and can reference data in other tables and sheets

These are features that make data entry by humans less susceptible to errors, yet as easy and well known as editing a spreadsheet.

This Python module closes the gap of bringing all that nice and structured human-generated data back to the database or to your app.

Get a Service Account file for authorization

Go to https://console.cloud.google.com/projectcreate, make sure you are under correct Google account and create a project named My Project (or reuse a previously existing project)
On same page, edit the Project ID to make it smaller and more meanigfull (or leave defaults); this will be part of an e-mail address that we’ll use later
Activate Sheets API and Drive API (Drive is optional, just to get file modification time)
Go to https://console.cloud.google.com/apis/credentials, make sure you are in the correct project and select Create Credentials → Service account. This is like creating an operator user that will access your Google Spreadsheet; and as a user, it has an e-mail address that appears on the screen. Copy this e-mail address.
After service account created, go into its details and create a keypair (or upload the public part of an existing keypair).
Download the JSON file generated for this keypair, it contains the private part of the key, required to identify the program as your service account.
Go to the Google Sheet your program needs to extract tables, hit Share button on top right and add the virtual e-mail address of the service account you just created and copied. This is an e-mail address that looks like operator-one@my-project.iam.gserviceaccount.com

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

5.0.2

May 19, 2026

5.0.1

May 19, 2026

This version

5.0

May 15, 2026

4.1

Apr 14, 2026

4.0.1

Mar 29, 2026

4.0

Mar 29, 2026

3.2

Mar 18, 2026

3.1.5

Mar 6, 2026

3.1.4

Mar 4, 2026

3.1.3

Mar 2, 2026

3.1.2

Feb 27, 2026

3.1.1

Feb 27, 2026

3.1

Feb 27, 2026

3.0.11

Feb 27, 2026

3.0.10

Feb 27, 2026

3.0.9

Feb 27, 2026

3.0.8

Feb 27, 2026

3.0.7

Feb 27, 2026

3.0.6

Feb 27, 2026

3.0.5

Feb 24, 2026

3.0.4

Feb 24, 2026

3.0.3

Feb 24, 2026

3.0.2

Feb 23, 2026

3.0.1

Feb 21, 2026

3.0

Feb 19, 2026

2.3

Jan 14, 2026

2.2

Jan 14, 2026

2.1

Dec 31, 2025

2.0

Dec 29, 2025

1.5

Dec 24, 2025

1.4

Dec 24, 2025

1.3

Dec 23, 2025

1.2

Dec 23, 2025

1.1

Dec 22, 2025

1.0.1

Dec 21, 2025

1.0

Dec 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gsheetstables-5.0.tar.gz (38.0 kB view details)

Uploaded May 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gsheetstables-5.0-py3-none-any.whl (32.0 kB view details)

Uploaded May 15, 2026 Python 3

File details

Details for the file gsheetstables-5.0.tar.gz.

File metadata

Download URL: gsheetstables-5.0.tar.gz
Upload date: May 15, 2026
Size: 38.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for gsheetstables-5.0.tar.gz
Algorithm	Hash digest
SHA256	`f6aba41193e63b8722b227c6784b50dd911373cd0fd38f77be3cdc18fe666dbf`
MD5	`12e7677deaddd772844668c78185d183`
BLAKE2b-256	`6576db9ea45e18d6796a7c6bd6a1832cebc3f76c40650ef5aba19325bb372ca9`

See more details on using hashes here.

File details

Details for the file gsheetstables-5.0-py3-none-any.whl.

File metadata

Download URL: gsheetstables-5.0-py3-none-any.whl
Upload date: May 15, 2026
Size: 32.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for gsheetstables-5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e7fefec2eaaba5033229cab9c0adf2774fb47115cd3c6898e936c0226bad757c`
MD5	`6ba90d2abdeb18d6069008f1c1ac2f73`
BLAKE2b-256	`3087cef141792659ce299acc1b7e7edd1b7529c49c68325a9b7ab7b870791b28`

See more details on using hashes here.

gsheetstables 5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Google Sheets Tables extraction

Command Line tool

Run it regularly via `cron` or a `systemd` timer

Table Foreign Keys

SQLAlchemy Drivers Reference

API Usage

Handling Date and Time

Column names normalization

What are Google Sheets Tables

Get a Service Account file for authorization

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

gsheetstables 5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Google Sheets Tables extraction

Command Line tool

Run it regularly via cron or a systemd timer

Table Foreign Keys

SQLAlchemy Drivers Reference

API Usage

Handling Date and Time

Column names normalization

What are Google Sheets Tables

Get a Service Account file for authorization

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Run it regularly via `cron` or a `systemd` timer