Skip to main content

Helper module to parse data from GSheets into database-optimized DataFrames

Project description

Google Sheets Helper

A Python ETL driver for reading and transforming Google Sheets and Excel data from Google Drive. Simplifies the process of extracting spreadsheet data and converting it to database-ready pandas DataFrames with comprehensive optimization features.

PyPI version Issues Last Commit License

Features

  • Google Sheets & Excel Support: Read Google Sheets and Excel files directly from Google Drive
  • Database-Ready DataFrames: Optimized data types and encoding for seamless database storage
  • Flexible Column Naming: Choose between snake_case or camelCase column conventions, with robust ASCII normalization and automatic removal of unnamed columns (e.g., from Excel/CSV exports)
  • Smart Type Detection: Dynamic conversion of metrics to appropriate int64/float64 types
  • Configurable Missing Values: Granular control over NaN/NaT handling by column type
  • Character Encoding Cleanup: Automatic text sanitization for database compatibility
  • Robust Error Handling: Comprehensive error handling with specific exceptions
  • Progress Bar for Large Downloads: Uses tqdm to show download progress for large Excel files
  • Type Hints: Full type hint support for better IDE experience

Installation

pip install google-sheets-helper

Quick Start

1. Set up credentials

Place your Google service account credentials in secrets/client_secret.json.

2. Basic usage

from google_sheets_helper import GoogleSheetsHelper, load_client_secret, setup_logging

setup_logging()
client_secret = load_client_secret()
gs_helper = GoogleSheetsHelper(client_secret)

spreadsheet_id = "your_spreadsheet_id"
worksheet_name = "your_worksheet_name"

df = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)
utils = DataframeUtils()

df = utils.fix_data_types(df, skip_columns=None)
df = utils.handle_missing_values(df)
df = utils.clean_text_encoding(df)
df = utils.transform_column_names(df, naming_convention="snake_case")

print(df.head(), df.dtypes)

os.makedirs("data", exist_ok=True)
filename = os.path.join("data", f"{spreadsheet_id}_{worksheet_name}.csv")

df.to_csv(filename, index=False)

Data Cleaning Pipeline

You can use the built-in DataFrame utilities for further cleaning, including robust column name normalization and unnamed column removal:

from google_sheets_helper import DataframeUtils

utils = DataframeUtils()
df = utils.fix_data_types(df)
df = utils.handle_missing_values(df)
df = utils.clean_text_encoding(df)
df = utils.transform_column_names(df, naming_convention="snake_case")
df = utils.remove_unnamed_columns(df)

API Reference

  • GoogleSheetsHelper: Main class for reading and transforming Google Sheets/Excel data
  • get_drive_file_metadata: Retrieve file name and MIME type from Google Drive
  • list_files_in_folder: List files in a Google Drive folder
  • load_client_secret: Loads credentials from a JSON file
  • setup_logging: Configures logging for the package
  • DataframeUtils: Utility class for DataFrame cleaning and optimization
  • remove_unnamed_columns: Remove columns named 'Unnamed: ...' (common in Excel/CSV)
  • Exception classes: AuthenticationError, APIError, ConfigurationError, DataProcessingError, ValidationError

Error Handling

from google_sheets_helper import (
    GoogleSheetsHelper,
    AuthenticationError,
    ValidationError,
    APIError,
    DataProcessingError,
    ConfigurationError
)

try:
    df = gs_helper.load_sheet_as_dataframe(spreadsheet_id, worksheet_name)
except AuthenticationError:
    # Handle credential issues
    pass
except ValidationError:
    # Handle input validation errors
    pass
except APIError:
    # Handle API errors
    pass
except DataProcessingError:
    # Handle data processing errors
    pass

Examples

Check the examples/ directory for comprehensive usage examples:

  • basic_usage.py - Simple sheet extraction and cleaning

Requirements

  • Python 3.9-3.12
  • pandas >= 2.0.0
  • gspread >= 5.10.0
  • google-api-python-client >= 2.0.0
  • tqdm >= 4.65.0

Development

For development installation:

git clone https://github.com/machado000/google-sheets-helper
cd google-sheets-helper
pip install -e ".[dev]"

License

MIT License. See LICENSE file for details.

Support

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google_sheets_helper-1.1.1.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

google_sheets_helper-1.1.1-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file google_sheets_helper-1.1.1.tar.gz.

File metadata

  • Download URL: google_sheets_helper-1.1.1.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for google_sheets_helper-1.1.1.tar.gz
Algorithm Hash digest
SHA256 9f84c104757f69a7b4fe299e655390e76954bea8468113a3ee05fabd3be673b3
MD5 06327ac511a66726e9fa80214f1247bb
BLAKE2b-256 607a8aeaaa464b2a083c34ca02b1f91c070bff25206a01b69fbbe483b11c66c6

See more details on using hashes here.

File details

Details for the file google_sheets_helper-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: google_sheets_helper-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for google_sheets_helper-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 938bc99a08976e1183050b35b962ed244c8ea1c91b7a4773216f989f0f4e8c04
MD5 9b79056f41ef44de8b0b2c744c48a0b6
BLAKE2b-256 d5c18c3ffabb6e371124b343a4aa227843896abbe2e45428bbaf45c9ec4cced2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page