Skip to main content

Lightweight helper module to extract data from Google Sheets and Excel files as list of dictionaries

Project description

Google Sheets Helper

A Python ETL driver for reading and transforming Google Sheets and Excel data from Google Drive. Simplifies the process of extracting spreadsheet data and converting it to database-ready Python lists of dictionaries with comprehensive optimization features.

PyPI version Issues Last Commit License

Features

  • Google Sheets & Excel Support: Read Google Sheets and Excel files directly from Google Drive
  • Flexible Column Naming: Choose between snake_case or camelCase column conventions, with robust ASCII normalization and automatic removal of unnamed columns (e.g., from Excel/CSV exports)
  • Progress Logging for Large Downloads: Shows download progress for large Excel files
  • Pure Python Data Cleaning and Transformation: No pandas required; all utilities work with list-of-dictionaries
  • Robust Error Handling: Comprehensive error handling with specific exceptions
  • Type Hints: Full type hint support for better IDE experience

Installation

pip install google-sheets-helper

Quick Start

1. Set up credentials

Place your Google service account credentials in secrets/client_secret.json.

2. Basic usage

from google_sheets_helper import GoogleSheetsHelper, WorksheetUtils, load_client_secret

client_secret = load_client_secret()
gs_helper = GoogleSheetsHelper(client_secret)

spreadsheet_id = "your_spreadsheet_id"
worksheet_name = "your_worksheet_name"

# Load data as list of dictionaries
data = gs_helper.load_sheet_as_json(spreadsheet_id, worksheet_name)

utils = WorksheetUtils()
data = utils.handle_missing_values(data)
data = utils.clean_text_encoding(data)
data = utils.transform_column_names(data, naming_convention="snake_case")

# Print first row
print(data[0])

# Save to CSV
import csv
with open("output.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=data[0].keys())
    writer.writeheader()
    writer.writerows(data)

Data Cleaning Pipeline

All data cleaning and transformation utilities now work with list-of-dictionaries:

from google_sheets_helper import WorksheetUtils

utils = WorksheetUtils()
data = utils.handle_missing_values(data)
data = utils.clean_text_encoding(data)
data = utils.transform_column_names(data, naming_convention="snake_case")
data = utils.remove_unnamed_and_null_columns(data)

API Reference

  • GoogleSheetsHelper: Main class for reading and transforming Google Sheets/Excel data
  • get_drive_file_metadata: Retrieve file name and MIME type from Google Drive
  • list_files_in_folder: List files in a Google Drive folder
  • load_client_secret: Loads credentials from a JSON file
  • WorksheetUtils: Pure Python utilities for cleaning and transforming list-of-dictionaries
  • Exception classes: AuthenticationError, APIError, ConfigurationError, DataProcessingError, ValidationError

Error Handling

from google_sheets_helper import (
    GoogleSheetsHelper,
    AuthenticationError,
    ValidationError,
    APIError,
    DataProcessingError,
    ConfigurationError
)

try:
    data = gs_helper.load_sheet_as_json(spreadsheet_id, worksheet_name)
except AuthenticationError:
    # Handle credential issues
    pass
except ValidationError:
    # Handle input validation errors
    pass
except APIError:
    # Handle API errors
    pass
except DataProcessingError:
    # Handle data processing errors
    pass

Examples

Check the examples/ directory for comprehensive usage examples:

  • basic_usage.py - Simple sheet extraction and cleaning

Requirements

  • Python 3.10-3.12
  • gspread >= 6.0.0
  • google-api-python-client >= 2.0.0
  • python-calamine >= 0.4.0

Development

For development installation:

git clone https://github.com/machado000/google-sheets-helper
cd google-sheets-helper
pip install -e ".[dev]"

License

MIT License. See LICENSE file for details.

Support

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google_sheets_helper-2.0.0.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

google_sheets_helper-2.0.0-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file google_sheets_helper-2.0.0.tar.gz.

File metadata

  • Download URL: google_sheets_helper-2.0.0.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for google_sheets_helper-2.0.0.tar.gz
Algorithm Hash digest
SHA256 743a0a74a2f592e056f32b098f45a775d94320b1106ece59e396eaab546aec80
MD5 309d90db70c1c7760200befedaddda52
BLAKE2b-256 a2847478b0b9f7a993d858eda13835c7146c5b15d188f5e0cbda8052e9c31569

See more details on using hashes here.

File details

Details for the file google_sheets_helper-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: google_sheets_helper-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for google_sheets_helper-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6bbd32fedf671125ab4798d15f7feddd3e48fd10e1655d1363c891399626da24
MD5 e5929d9b399c995ebcea3797d3a7238c
BLAKE2b-256 2c2d3e3ff4c8777b28ed0b36998c00829162fcb630fcbad3585c7981facadfb9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page