Helper module to parse data from GSheets into database-optimized DataFrames
Project description
Google Sheets Helper
A Python ETL driver for reading and transforming Google Sheets and Excel data from Google Drive. Simplifies the process of extracting spreadsheet data and converting it to database-ready pandas DataFrames with comprehensive optimization features.
Features
- Google Sheets & Excel Support: Read Google Sheets and Excel files directly from Google Drive
- Database-Ready DataFrames: Optimized data types and encoding for seamless database storage
- Flexible Column Naming: Choose between snake_case or camelCase column conventions
- Smart Type Detection: Dynamic conversion of metrics to appropriate int64/float64 types
- Configurable Missing Values: Granular control over NaN/NaT handling by column type
- Character Encoding Cleanup: Automatic text sanitization for database compatibility
- Robust Error Handling: Comprehensive error handling with specific exceptions
- Type Hints: Full type hint support for better IDE experience
Installation
pip install google-sheets-helper
Quick Start
1. Set up credentials
Place your Google service account credentials in secrets/client_secret.json.
2. Basic usage
from google_sheets_helper import GoogleSheetsHelper, load_client_secret, setup_logging
setup_logging()
client_secret = load_client_secret()
gs_helper = GoogleSheetsHelper(client_secret)
spreadsheet_id = "your_spreadsheet_id"
worksheet_name = "your_worksheet_name"
df = gs_helper.read_sheet_to_df(spreadsheet_id, worksheet_name)
print(df.head())
df.to_csv("output.csv", index=False)
Data Cleaning Pipeline
You can use the built-in DataFrame utilities for further cleaning:
from google_sheets_helper import DataframeUtils
utils = DataframeUtils()
df = utils.fix_data_types(df)
df = utils.handle_missing_values(df)
df = utils.clean_text_encoding(df)
df = utils.transform_column_names(df, naming_convention="snake_case")
API Reference
GoogleSheetsHelper: Main class for reading and transforming Google Sheets/Excel dataload_client_secret: Loads credentials from a JSON filesetup_logging: Configures logging for the packageDataframeUtils: Utility class for DataFrame cleaning and optimization- Exception classes:
AuthenticationError,APIError,ConfigurationError,DataProcessingError,ValidationError
Error Handling
from google_sheets_helper import (
GoogleSheetsHelper,
AuthenticationError,
ValidationError,
APIError,
DataProcessingError,
ConfigurationError
)
try:
df = gs_helper.read_sheet_to_df(spreadsheet_id, worksheet_name)
except AuthenticationError:
# Handle credential issues
pass
except ValidationError:
# Handle input validation errors
pass
except APIError:
# Handle API errors
pass
except DataProcessingError:
# Handle data processing errors
pass
Examples
Check the examples/ directory for comprehensive usage examples:
basic_usage.py- Simple sheet extraction and cleaning
Requirements
- Python 3.9-3.12
- pandas >= 2.0.0
- gspread >= 5.10.0
- google-api-python-client >= 2.0.0
- tqdm >= 4.65.0
Development
For development installation:
git clone https://github.com/machado000/google-sheets-helper
cd google-sheets-helper
pip install -e ".[dev]"
License
MIT License. See LICENSE file for details.
Support
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file google_sheets_helper-1.0.0.tar.gz.
File metadata
- Download URL: google_sheets_helper-1.0.0.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d08fe8acb6dab9620bb1d9eec99e67ade5ee5b8aebeb57b26a27f27759ce5f4
|
|
| MD5 |
93a6e4e089a7d44fe074f46c75b3f503
|
|
| BLAKE2b-256 |
0e643c7ac77b32516ea3d9b42e6306802547409120ca6dc1cd791c550cd0107c
|
File details
Details for the file google_sheets_helper-1.0.0-py3-none-any.whl.
File metadata
- Download URL: google_sheets_helper-1.0.0-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd1aa4dbe99ac520fbf602a8ab1acd29b26cf62cd6e77851d575d36d570c7918
|
|
| MD5 |
3f3f9c77b6f5192150e08f52a71911d4
|
|
| BLAKE2b-256 |
5eceb8fbf234db8c105281a114b932c770b29f5b0a18e0b8b101900f964399b3
|