Extract and normalize Excel workbook artifacts (sheets, connections, formulas) into a lightweight graph.
Project description
excelminer
excelminer extracts Excel workbook artifacts into a small, normalized in-memory graph (nodes + edges) that you can serialize to deterministic JSON.
It is designed for inventory, analysis, and reproducible diffs (stable ordering), not for “opening Excel” or evaluating formulas.
What you can extract
From OOXML files (.xlsx/.xlsm/.xltx/.xltm) without Excel installed:
- sheets
- defined names
- connections + basic source inference
- Power Query queries (when stored as
xl/queries/*.xml) - Power Query mashup-container detection (best-effort, metadata-only)
- pivot tables + pivot caches (best-effort)
- VBA project presence for macro-enabled OOXML (
.xlsm/.xltm/.xlam) (metadata-only) - formula text + basic dependencies (via
openpyxl, when enabled)
Optional enrichment:
- used-range “value blocks” via calamine (fast scanning)
- Windows Excel COM automation (for legacy formats like
.xls/.xlsband opt-in enrichment for modern OOXML)
Install
Base install:
pip install excelminer
Optional extras:
pip install "excelminer[calamine]" # pandas + python-calamine
pip install "excelminer[com]" # Windows + Microsoft Excel required
Quickstart
JSON output
from excelminer import AnalysisOptions, analyze_to_dict
result = analyze_to_dict(
"workbook.xlsx",
options=AnalysisOptions(include_formulas=True),
)
print(result["graph"]["stats"]) # counts by node kind
print(result["reports"][0]["backend"]) # per-backend reports
Graph output
from excelminer import AnalysisOptions, analyze_workbook
graph, reports, ctx = analyze_workbook(
"workbook.xlsx",
options=AnalysisOptions(include_formulas=True),
)
print(graph.stats())
print([r.backend for r in reports])
print(ctx.issues)
Output shape (high level)
analyze_to_dict() returns:
path,options,issuesreports: per-backend stats/issuesgraph:{ nodes: [...], edges: [...], stats: {...} }
Common node kinds include: sheet, connection, source, powerquery, pivot_table, pivot_cache, vba_project, formula_cell, cell_block.
Default backend pipeline
By default, backends run in this order:
- OOXML zip parsing (structure)
- VBA projects (macro detection for
.xlsm/.xltm/.xlam) - Power Query (queries XML + mashup-container detection)
- Pivot tables (pivots + caches)
- Calamine (used-range/value blocks; optional)
- openpyxl (formula text)
- Excel COM (Windows-only enrichment; opt-in for modern OOXML)
You can override the pipeline via the backends= argument.
Security & privacy notes
- Connection parsing produces a sanitized key/value view (
password/user id/ etc masked) inconnection_kv. - The raw connection string may also be stored in
connection.raw.
Treat the output JSON as potentially sensitive. If you don’t need connections, use AnalysisOptions(include_connections=False).
Documentation (in this repo)
- docs/README.md: documentation index
- docs/USAGE.md: usage patterns + backend ordering
- docs/OPTIONS.md:
AnalysisOptionsflags and limits - docs/BACKENDS.md: backend behavior and requirements
- docs/OUTPUT.md: output schema and common node/edge kinds
- docs/SECURITY.md: security & privacy notes
- docs/DEVELOPMENT.md: tests, COM opt-in, coverage profiles
Development notes
COM integration tests are opt-in because some environments can crash the Python process when Excel COM is invoked.
PowerShell:
$env:EXCELMINER_RUN_COM_TESTS='1'
pytest -m integration
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file excelminer-0.0.0.tar.gz.
File metadata
- Download URL: excelminer-0.0.0.tar.gz
- Upload date:
- Size: 32.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94450a957255c9955d8358c018c4b6ccc5a1282abd7d9b256428614472d5ae12
|
|
| MD5 |
bc87216f35f13f84d7056cb91e5715b6
|
|
| BLAKE2b-256 |
9806d0ed0951b9c85dc5fb17154964295c8e5b00f5bf8271d4aed82f893a3d46
|
File details
Details for the file excelminer-0.0.0-py3-none-any.whl.
File metadata
- Download URL: excelminer-0.0.0-py3-none-any.whl
- Upload date:
- Size: 29.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0874e67439cc594ee6a5fbdd16e3e1f9fe67e5b2e3cd0b01f96b6eaf41dd0aa
|
|
| MD5 |
5bbf612054e82e0122b16770ec67e257
|
|
| BLAKE2b-256 |
a66cffd725b0c654c302d2b3d3fda26340128f0836a800bdc3b9d6b24124a876
|