Skip to main content

Google Sheets Parser

Project description

GSParse - Google Sheets Parser Library

PythonVersion

Library for extracting data from Google Sheets by URL. Supports working with multiple worksheets in a single spreadsheet.

Features

  • ๐Ÿ“Š Working with multiple worksheets in a single spreadsheet
  • ๐Ÿ”— Loading data by Google Sheets URL
  • ๐Ÿ“ Parsing CSV and XLSX data from buffer
  • ๐ŸŽฏ Convenient API for working with cells and ranges
  • ๐Ÿ” Data search by value or regular expression
  • ๐Ÿ“‹ Export to various formats
  • โšก Support for XLSX and CSV formats
  • ๐Ÿ”„ Automatic retries on loading errors
  • ๐Ÿ“Š Export data to Python dictionaries

Installation

pip install gsparse

Quick Start

Basic Usage

from gsparse import GSParseClient

# Create client
client = GSParseClient()

# Load spreadsheet by URL (default format is XLSX)
url = "https://docs.google.com/spreadsheets/d/YOUR_SHEET_ID/edit"
spreadsheet = client.load_spreadsheet(url)

# Get first worksheet
worksheet = spreadsheet.get_first_worksheet()

# Work with cells
cell = worksheet.get_cell(1, 1)  # A1
print(f"Cell A1 value: {cell.value}")

# Get range
range_obj = worksheet.get_range(1, 10, 1, 3)  # A1:C10
cells = worksheet.get_cells_in_range(range_obj)

# Export to dictionary
data_dict = worksheet.get_data_as_dict()
for row in data_dict:
    print(row)

Working with Different Formats

# Load spreadsheet in XLSX format (default)
spreadsheet = client.load_spreadsheet(url, format_type="xlsx")

# Load spreadsheet in CSV format
spreadsheet = client.load_spreadsheet(url, format_type="csv")

# Load from CSV string
csv_data = """Name,Age,City
John,25,Moscow
Mary,30,St. Petersburg"""

worksheet = client.load_from_csv_string(csv_data, "My Data")

# Export to dictionary
data_dict = worksheet.get_data_as_dict()
for row in data_dict:
    print(row)

Data Search

# Search by value
found_cells = client.find_data(url, "Moscow")

# Search by regular expression
pattern_cells = client.find_by_pattern(url, r"^\d+$")  # numbers only

API Reference

GSParseClient

Main class for working with the library.

Methods

  • load_spreadsheet(url, format_type="xlsx") - Loads entire spreadsheet
  • load_worksheet(url, worksheet_name, format_type="xlsx") - Loads specific worksheet
  • load_from_csv_string(csv_string, worksheet_name) - Loads from CSV string
  • find_data(url, value) - Search cells by value
  • find_by_pattern(url, pattern) - Search by regular expression

Spreadsheet

Represents Google Sheets table with multiple worksheets.

Properties

  • title - Spreadsheet title
  • worksheets - List of worksheets
  • worksheet_count - Number of worksheets
  • worksheet_names - List of all worksheet names

Methods

  • get_worksheet(name) - Get worksheet by name
  • get_worksheet_by_index(index) - Get worksheet by index
  • get_first_worksheet() - Get first worksheet
  • export_to_dict(headers_row) - Export all worksheets to dictionary

Worksheet

Represents worksheet in spreadsheet.

Properties

  • name - Worksheet name
  • row_count - Number of rows
  • column_count - Number of columns

Methods

  • get_cell(row, column) - Get cell
  • get_range(start_row, end_row, start_column, end_column) - Get range
  • get_cells_in_range(range_obj) - Get all cells in range
  • get_data_as_dict(headers_row) - Export to dictionary
  • find_cells_by_value(value) - Search cells by value

Cell

Represents cell in spreadsheet.

Properties

  • row - Row number
  • column - Column number
  • value - Cell value
  • address - Cell address (A1, B2, etc.)
  • is_empty - Whether cell is empty

Project Structure

src/gsparse/
โ”œโ”€โ”€ __init__.py              # Main module
โ”œโ”€โ”€ client.py                # Main client
โ”œโ”€โ”€ core/                    # Core entities
โ”‚   โ”œโ”€โ”€ cell.py             # Cell
โ”‚   โ”œโ”€โ”€ range.py            # Cell range
โ”‚   โ”œโ”€โ”€ worksheet.py        # Worksheet
โ”‚   โ””โ”€โ”€ spreadsheet.py      # Spreadsheet
โ”œโ”€โ”€ downloaders/            # Downloaders
โ”‚   โ””โ”€โ”€ google_sheets_downloader.py
โ”œโ”€โ”€ parsers/                # Parsers
โ”‚   โ”œโ”€โ”€ base_parser.py      # Base parser
โ”‚   โ”œโ”€โ”€ csv_parser.py       # CSV parser
โ”‚   โ””โ”€โ”€ xlsx_parser.py      # XLSX parser
โ””โ”€โ”€ utils/                  # Utilities
    โ”œโ”€โ”€ url_utils.py        # URL handling
    โ””โ”€โ”€ data_utils.py       # Data processing

Requirements

  • Python >= 3.10
  • requests >= 2.31.0
  • chardet >= 5.2.0
  • openpyxl >= 3.1.2

Examples

The library includes comprehensive examples and tests:

  • Tests: Located in tests/ directory

    • test_client.py - Tests for GSParseClient
    • test_core.py - Tests for core entities (Cell, Range, Worksheet, Spreadsheet)
  • Example Usage: See the Quick Start section above for common use cases

Development

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Code check
ruff check src/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

If you have any questions or issues, please open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gsparse-0.2.2.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gsparse-0.2.2-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file gsparse-0.2.2.tar.gz.

File metadata

  • Download URL: gsparse-0.2.2.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for gsparse-0.2.2.tar.gz
Algorithm Hash digest
SHA256 11e9c06e4197e36edf68b438db5a96feb295e0d3ab1105d84c344ebfae9f2d5b
MD5 db5714558e9dac4bd0438bceb6c22959
BLAKE2b-256 24f4b769a57d3f068ecbbe6cd7f0b9dc5f39629f347428e89cdeddab705e4b92

See more details on using hashes here.

File details

Details for the file gsparse-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: gsparse-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for gsparse-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bddef5d55946c549b2e3305a3ba1ee0e57fb012267d03d452b9908437ea6bd8b
MD5 afe2602422b4ed46b830b1abf00d1207
BLAKE2b-256 efa5a5240571c219bb3480ba89a01fa2fcc9c9d748264f098eca624bbd49613b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page