Skip to main content

Simplified intelligent spreadsheet ingestion framework with automatic table detection

Project description

GridGulp

Automatically detect and extract tables from Excel, CSV, and text files.

What is GridGulp?

GridGulp finds tables in your spreadsheets - even when there are multiple tables on one sheet or when tables don't start at cell A1. It comes with reasonable defaults and is fully configurable.

Supported formats: .xlsx, .xls, .xlsm, .xlsb, .csv, .tsv, .txt

Installation

pip install gridgulp

Quick Start

from gridgulp import GridGulp

# Detect tables in a file
porter = GridGulp()
result = await porter.detect_tables("sales_report.xlsx")

# Process results
for sheet in result.sheets:
    print(f"{sheet.name}: {len(sheet.tables)} tables found")
    for table in sheet.tables:
        print(f"  - {table.range.excel_range}")

Jupyter Notebook Usage

In Jupyter notebooks, you can use synchronous methods for simplicity:

from gridgulp import GridGulp

# Create GridGulp instance
gg = GridGulp()

# Use the sync method - works in Jupyter without any async complexity
result = gg.detect_tables_sync("sales_report.xlsx")

# Display results
print(f"📄 File: {result.file_info.path.name}")
print(f"📊 Total tables found: {result.total_tables}\n")

for sheet in result.sheets:
    print(f"Sheet: {sheet.name}")
    for table in sheet.tables:
        print(f"  - Table at {table.range.excel_range}")
        print(f"    Size: {table.shape[0]} rows × {table.shape[1]} columns")
        print(f"    Confidence: {table.confidence:.1%}")

Extract DataFrames

Extract detected tables as pandas DataFrames with automatic type inference and quality scoring:

from gridgulp.extractors import DataFrameExtractor
from gridgulp.readers import get_reader

# Example: Extract tables from a sales report
reader = get_reader("sales_report.xlsx")
file_data = reader.read_sync()

extractor = DataFrameExtractor()
for sheet_result in result.sheets:
    sheet_data = next(s for s in file_data.sheets if s.name == sheet_result.name)

    for table in sheet_result.tables:
        df, metadata, quality = extractor.extract_dataframe(sheet_data, table.range)
        if df is not None:
            print(f"\n📊 Extracted table from {table.range.excel_range}")
            print(f"   Shape: {df.shape} | Quality: {quality:.1%}")
            print(f"   Headers: {', '.join(df.columns[:5])}{'...' if len(df.columns) > 5 else ''}")
            print(f"\nFirst few rows:")
            print(df.head())

Key Features

  • Automatic Detection - Finds all tables with sensible defaults
  • Fully Configurable - Customize detection thresholds and behavior
  • Smart Headers - Detects single and multi-row headers automatically
  • Multiple Tables - Handles sheets with multiple separate tables
  • Quality Scoring - Confidence scores for each detected table
  • Fast - Processes most files in under a second

Documentation

License

MIT License - see LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gridgulp-0.3.1.tar.gz (278.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gridgulp-0.3.1-py3-none-any.whl (100.7 kB view details)

Uploaded Python 3

File details

Details for the file gridgulp-0.3.1.tar.gz.

File metadata

  • Download URL: gridgulp-0.3.1.tar.gz
  • Upload date:
  • Size: 278.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for gridgulp-0.3.1.tar.gz
Algorithm Hash digest
SHA256 e6218bb34d9f6895277bad235419a53ad5c668f219bbda38145ca5e7dbdb4f75
MD5 37f803ac196fa44a205003c4012d11d0
BLAKE2b-256 e931648835fe1eff97e63742ab0cde276e06613842c90a377d06805c1d9abfee

See more details on using hashes here.

Provenance

The following attestation bundles were made for gridgulp-0.3.1.tar.gz:

Publisher: release.yml on Ganymede-Bio/gridgulp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gridgulp-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: gridgulp-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 100.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for gridgulp-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5bba280f9931e1f1b2d48cb7bb362e2fb13f96fa4321655e8d8ea90af524b55f
MD5 487506c55fa149fbaceb241af9cbe731
BLAKE2b-256 ae91319ecb6cf2d8b602c111c2cfd7a4821919090ebbc20272f0d96165e31a43

See more details on using hashes here.

Provenance

The following attestation bundles were made for gridgulp-0.3.1-py3-none-any.whl:

Publisher: release.yml on Ganymede-Bio/gridgulp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page