Skip to main content

Extract and normalize Excel workbook artifacts (sheets, connections, formulas) into a lightweight graph.

Project description

excelminer

excelminer extracts Excel workbook artifacts into a small, normalized in-memory graph (nodes + edges) that you can serialize to deterministic JSON.

It is designed for inventory, analysis, and reproducible diffs (stable ordering), not for “opening Excel” or evaluating formulas.

Python 3.12+ License: MIT

What you can extract

From OOXML files (.xlsx/.xlsm/.xltx/.xltm) without Excel installed:

  • sheets
  • defined names
  • connections + basic source inference
  • Power Query queries (when stored as xl/queries/*.xml)
  • Power Query mashup-container detection (best-effort, metadata-only)
  • pivot tables + pivot caches (best-effort)
  • VBA project presence for macro-enabled OOXML (.xlsm/.xltm/.xlam) (metadata-only)
  • formula text + basic dependencies (via openpyxl, when enabled)

Optional enrichment:

  • used-range “value blocks” via calamine (fast scanning)
  • Windows Excel COM automation (for legacy formats like .xls/.xlsb and opt-in enrichment for modern OOXML)

Install

Base install:

pip install excelminer

Optional extras:

pip install "excelminer[calamine]"  # pandas + python-calamine
pip install "excelminer[com]"       # Windows + Microsoft Excel required

Quickstart

JSON output

from excelminer import AnalysisOptions, analyze_to_dict

result = analyze_to_dict(
    "workbook.xlsx",
    options=AnalysisOptions(include_formulas=True),
)

print(result["graph"]["stats"])          # counts by node kind
print(result["reports"][0]["backend"])    # per-backend reports

Graph output

from excelminer import AnalysisOptions, analyze_workbook

graph, reports, ctx = analyze_workbook(
    "workbook.xlsx",
    options=AnalysisOptions(include_formulas=True),
)

print(graph.stats())
print([r.backend for r in reports])
print(ctx.issues)

Output shape (high level)

analyze_to_dict() returns:

  • path, options, issues
  • reports: per-backend stats/issues
  • graph: { nodes: [...], edges: [...], stats: {...} }

Common node kinds include: sheet, connection, source, powerquery, pivot_table, pivot_cache, vba_project, formula_cell, cell_block.

Default backend pipeline

By default, backends run in this order:

  1. OOXML zip parsing (structure)
  2. VBA projects (macro detection for .xlsm/.xltm/.xlam)
  3. Power Query (queries XML + mashup-container detection)
  4. Pivot tables (pivots + caches)
  5. Calamine (used-range/value blocks; optional)
  6. openpyxl (formula text)
  7. Excel COM (Windows-only enrichment; opt-in for modern OOXML)

You can override the pipeline via the backends= argument.

Security & privacy notes

  • Connection parsing produces a sanitized key/value view (password / user id / etc masked) in connection_kv.
  • The raw connection string may also be stored in connection.raw.

Treat the output JSON as potentially sensitive. If you don’t need connections, use AnalysisOptions(include_connections=False).

Documentation (in this repo)

  • docs/README.md: documentation index
  • docs/USAGE.md: usage patterns + backend ordering
  • docs/OPTIONS.md: AnalysisOptions flags and limits
  • docs/BACKENDS.md: backend behavior and requirements
  • docs/OUTPUT.md: output schema and common node/edge kinds
  • docs/SECURITY.md: security & privacy notes
  • docs/DEVELOPMENT.md: tests, COM opt-in, coverage profiles

Development notes

COM integration tests are opt-in because some environments can crash the Python process when Excel COM is invoked.

PowerShell:

$env:EXCELMINER_RUN_COM_TESTS='1'
pytest -m integration

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

excelminer-0.0.1.tar.gz (34.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

excelminer-0.0.1-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file excelminer-0.0.1.tar.gz.

File metadata

  • Download URL: excelminer-0.0.1.tar.gz
  • Upload date:
  • Size: 34.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for excelminer-0.0.1.tar.gz
Algorithm Hash digest
SHA256 c0f9e51581ea0cebf664c9e31bd1a3c7ee2b7a5275c339a2ab97f22569b05dac
MD5 d9dc321fdce2646f9eda67a429ed139d
BLAKE2b-256 17627d83a06cc980f8c9a998ca717c30f0dc79b68b216d390609b8c9945c7f77

See more details on using hashes here.

Provenance

The following attestation bundles were made for excelminer-0.0.1.tar.gz:

Publisher: publish.yml on brentwc/excelminer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file excelminer-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: excelminer-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 31.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for excelminer-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b00d2129a580a97de1b07f07b50078480698ccb48cf4a15abc8318675722ea8a
MD5 4d6141dc09b711f54a213eb0ae8e02ed
BLAKE2b-256 413d56cf9be293e57ae320ebcc609428b10da5f0abde07c40ed5138548dbfe12

See more details on using hashes here.

Provenance

The following attestation bundles were made for excelminer-0.0.1-py3-none-any.whl:

Publisher: publish.yml on brentwc/excelminer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page