Add your description here
Project description
ExStruct — Excel Structured Extraction Engine
ExStruct reads Excel workbooks and outputs structured data (tables, shapes, charts) as JSON by default, with optional YAML/TOON formats. It targets both COM/Excel environments (rich extraction) and non-COM environments (cells + table candidates), with tunable detection heuristics and multiple output modes to fit LLM/RAG pipelines.
Features
- Excel → Structured JSON: cells, shapes, charts, and table candidates per sheet.
- Output modes:
light(cells + table candidates only),standard(texted shapes + arrows, charts),verbose(all shapes with width/height). - Formats: JSON (compact by default,
--prettyavailable), YAML, TOON (optional dependencies). - Table detection tuning: adjust heuristics at runtime via API.
- CLI rendering (Excel required): optional PDF and per-sheet PNGs.
- Graceful fallback: if Excel COM is unavailable, extraction falls back to cells + table candidates without crashing.
Installation
pip install exstruct
Optional extras:
- YAML:
pip install pyyaml - TOON:
pip install python-toon - Rendering (PDF/PNG): Excel +
pip install pypdfium2
Quick Start (CLI)
exstruct input.xlsx # compact JSON (default)
exstruct input.xlsx --pretty # pretty-printed JSON
exstruct input.xlsx --format yaml # YAML (needs pyyaml)
exstruct input.xlsx --format toon # TOON (needs python-toon)
exstruct input.xlsx --mode light # cells + table candidates only
exstruct input.xlsx --pdf --image # PDF and PNGs (Excel required)
Quick Start (Python)
from pathlib import Path
from exstruct import extract, export, set_table_detection_params
# Tune table detection (optional)
set_table_detection_params(table_score_threshold=0.3, density_min=0.04)
# Extract with modes: "light", "standard", "verbose"
wb = extract("input.xlsx", mode="standard")
export(wb, Path("out.json"), pretty=False) # compact JSON
Table Detection Tuning
from exstruct import set_table_detection_params
set_table_detection_params(
table_score_threshold=0.35, # increase to be stricter
density_min=0.05,
coverage_min=0.2,
min_nonempty_cells=3,
)
Use higher thresholds to reduce false positives; lower them if true tables are missed.
Output Modes
- light: cells + table candidates (no COM needed).
- standard: texted shapes + arrows, charts (COM if available), table candidates.
- verbose: all shapes (with width/height), charts, table candidates.
Error Handling / Fallbacks
- Excel COM unavailable → falls back to cells + table candidates; shapes/charts empty.
- Shape extraction failure → logs warning, still returns cells + table candidates.
- CLI prints errors to stdout/stderr and returns non-zero on failures.
Optional Rendering
Requires Excel and pypdfium2.
exstruct input.xlsx --pdf --image --dpi 144
Creates <output>.pdf and <output>_images/ PNGs per sheet.
Notes
- Default JSON is compact to reduce tokens; use
--prettyorpretty=Truewhen readability matters. - Field
table_candidatesreplacestables; adjust downstream consumers accordingly.
License
BSD-3-Clause. See LICENSE for details.
Documentation
- API Reference (GitHub Pages): https://harumiweb.github.io/exstruct/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file exstruct-0.1.0.tar.gz.
File metadata
- Download URL: exstruct-0.1.0.tar.gz
- Upload date:
- Size: 23.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a356454a9789fa67a44a272bc058342946196acdfdb52916d6b6a00c7a4792c6
|
|
| MD5 |
1e14fe2f72cdc193464ba646ad4c0079
|
|
| BLAKE2b-256 |
6d14553116e6562f1a731b8a7cf85065ebc05ba99cada6b92c14e19d1cffdd0b
|
File details
Details for the file exstruct-0.1.0-py3-none-any.whl.
File metadata
- Download URL: exstruct-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f8d47ffa247d6089eea4fb84f1755c0a27e325b5a4b00361630a1e7f36109e4
|
|
| MD5 |
d3ad3eaec9a758b0c6dafa2350fae21d
|
|
| BLAKE2b-256 |
a697d1d566aeb0fa3f330407dd402e5198332f8d12d5e68a261bd65caa52c3de
|