Simplified intelligent spreadsheet ingestion framework with automatic table detection

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bensonlee5

These details have not been verified by PyPI

Project description

GridGulp

Automatically detect and extract tables from Excel, CSV, and text files.

What is GridGulp?

GridGulp finds tables in your spreadsheets - even when there are multiple tables on one sheet or when tables don't start at cell A1. It comes with reasonable defaults and is fully configurable.

Supported formats: .xlsx, .xls, .xlsm, .xlsb, .csv, .tsv, .txt

Installation

pip install gridgulp

Quick Start

from gridgulp import GridGulp

# Detect tables in a file
porter = GridGulp()
result = await porter.detect_tables("sales_report.xlsx")

# Process results
for sheet in result.sheets:
    print(f"{sheet.name}: {len(sheet.tables)} tables found")
    for table in sheet.tables:
        print(f"  - {table.range.excel_range}")

Jupyter Notebook Usage

In Jupyter notebooks, you can use synchronous methods for simplicity:

from gridgulp import GridGulp

# Create GridGulp instance
gg = GridGulp()

# Use the sync method - works in Jupyter without any async complexity
result = gg.detect_tables_sync("sales_report.xlsx")

# Display results
print(f"📄 File: {result.file_info.path.name}")
print(f"📊 Total tables found: {result.total_tables}\n")

for sheet in result.sheets:
    print(f"Sheet: {sheet.name}")
    for table in sheet.tables:
        print(f"  - Table at {table.range.excel_range}")
        print(f"    Size: {table.shape[0]} rows × {table.shape[1]} columns")
        print(f"    Confidence: {table.confidence:.1%}")

Extract DataFrames

Extract detected tables as pandas DataFrames with automatic type inference and quality scoring:

from gridgulp.extractors import DataFrameExtractor
from gridgulp.readers import get_reader

# Example: Extract tables from a sales report
reader = get_reader("sales_report.xlsx")
file_data = reader.read_sync()

extractor = DataFrameExtractor()
for sheet_result in result.sheets:
    sheet_data = next(s for s in file_data.sheets if s.name == sheet_result.name)

    for table in sheet_result.tables:
        df, metadata, quality = extractor.extract_dataframe(sheet_data, table.range)
        if df is not None:
            print(f"\n📊 Extracted table from {table.range.excel_range}")
            print(f"   Shape: {df.shape} | Quality: {quality:.1%}")
            print(f"   Headers: {', '.join(df.columns[:5])}{'...' if len(df.columns) > 5 else ''}")
            print(f"\nFirst few rows:")
            print(df.head())

Key Features

Automatic Detection - Finds all tables with sensible defaults
Fully Configurable - Customize detection thresholds and behavior
Smart Headers - Detects single and multi-row headers automatically
Multiple Tables - Handles sheets with multiple separate tables
Quality Scoring - Confidence scores for each detected table
Fast - Processes most files in under a second

Documentation

Full Usage Guide - Detailed examples and configuration
API Reference - Complete API documentation
Architecture - How GridGulp works internally

License

MIT License - see LICENSE file.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bensonlee5

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.4

Jul 30, 2025

This version

0.3.1

Jul 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gridgulp-0.3.1.tar.gz (278.9 kB view details)

Uploaded Jul 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gridgulp-0.3.1-py3-none-any.whl (100.7 kB view details)

Uploaded Jul 29, 2025 Python 3

File details

Details for the file gridgulp-0.3.1.tar.gz.

File metadata

Download URL: gridgulp-0.3.1.tar.gz
Upload date: Jul 29, 2025
Size: 278.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for gridgulp-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`e6218bb34d9f6895277bad235419a53ad5c668f219bbda38145ca5e7dbdb4f75`
MD5	`37f803ac196fa44a205003c4012d11d0`
BLAKE2b-256	`e931648835fe1eff97e63742ab0cde276e06613842c90a377d06805c1d9abfee`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gridgulp-0.3.1.tar.gz:

Publisher: release.yml on Ganymede-Bio/gridgulp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gridgulp-0.3.1.tar.gz
- Subject digest: e6218bb34d9f6895277bad235419a53ad5c668f219bbda38145ca5e7dbdb4f75
- Sigstore transparency entry: 327199804
- Sigstore integration time: Jul 29, 2025
Source repository:
- Permalink: Ganymede-Bio/gridgulp@44226bab1b2e978d8c9c51ddae7809a9027f66d6
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Ganymede-Bio
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@44226bab1b2e978d8c9c51ddae7809a9027f66d6
- Trigger Event: workflow_dispatch

File details

Details for the file gridgulp-0.3.1-py3-none-any.whl.

File metadata

Download URL: gridgulp-0.3.1-py3-none-any.whl
Upload date: Jul 29, 2025
Size: 100.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for gridgulp-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5bba280f9931e1f1b2d48cb7bb362e2fb13f96fa4321655e8d8ea90af524b55f`
MD5	`487506c55fa149fbaceb241af9cbe731`
BLAKE2b-256	`ae91319ecb6cf2d8b602c111c2cfd7a4821919090ebbc20272f0d96165e31a43`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gridgulp-0.3.1-py3-none-any.whl:

Publisher: release.yml on Ganymede-Bio/gridgulp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gridgulp-0.3.1-py3-none-any.whl
- Subject digest: 5bba280f9931e1f1b2d48cb7bb362e2fb13f96fa4321655e8d8ea90af524b55f
- Sigstore transparency entry: 327199864
- Sigstore integration time: Jul 29, 2025
Source repository:
- Permalink: Ganymede-Bio/gridgulp@44226bab1b2e978d8c9c51ddae7809a9027f66d6
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Ganymede-Bio
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@44226bab1b2e978d8c9c51ddae7809a9027f66d6
- Trigger Event: workflow_dispatch

gridgulp 0.3.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

GridGulp

What is GridGulp?

Installation

Quick Start

Jupyter Notebook Usage

Extract DataFrames

Key Features

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance