Skip to main content

Excel to Markdown converter with CSV markdown output support

Project description

excel2md

English | 日本語

Elvez IXV Ecosystem License: MIT Python Stars

Excel to Markdown converter. Reads Excel workbooks (.xlsx/.xlsm) and automatically generates Markdown format output.

Features

  • Smart Table Detection: Automatically detects Excel print areas and converts them to Markdown tables
  • CSV Markdown Output: Exports entire sheets in CSV format with validation metadata
  • Image Extraction: Extracts images from Excel files and outputs them as Markdown image links
  • Mermaid Flowcharts: Generates Mermaid diagrams from Excel shapes and tables
  • Hyperlink Support: Multiple output modes (inline, footnote, plain text)
  • Split by Sheet: Generate individual files per sheet
  • Customizable: Detailed settings for formatting, alignment, and data processing

Use Cases

  • Document Generation: Convert Excel specifications to Markdown
  • AI/LLM Processing: CSV markdown format optimized for token efficiency
  • Flowchart Extraction: Extract diagrams from Excel shapes
  • Data Migration: Export Excel data to portable Markdown format
  • Version Control: Track Excel changes in text-based format

Documentation

Setup

Requirements

  • Python 3.10 or higher
  • uv package manager

Install Dependencies

# Install uv (if not already installed)
# Details: https://docs.astral.sh/uv/getting-started/installation/
curl -LsSf https://astral.sh/uv/install.sh | sh

uv sync

Usage

uv run python v2.2.0/excel_to_md.py input.xlsx

This generates:

  • input_csv.md: CSV markdown format (default)
  • input_images/: Image directory (if images exist)

Note

  • Output filenames and directories are based on input filename (e.g., input.xlsxinput_csv.md, input_images/)
  • Output is saved in the same directory as input file (use --csv-output-dir to change)

Common Examples

Convert with Mermaid flowchart support:

uv run python v2.2.0/excel_to_md.py input.xlsx --mermaid-enabled

Generate individual files per sheet:

uv run python v2.2.0/excel_to_md.py input.xlsx --split-by-sheet

Specify CSV markdown output directory:

uv run python v2.2.0/excel_to_md.py input.xlsx --csv-output-dir ./output
# CSV markdown: ./output/input_csv.md
# Images: ./output/input_images/

Output standard Markdown only (no CSV output):

uv run python v2.2.0/excel_to_md.py input.xlsx -o output.md --no-csv-markdown-enabled

Plain text hyperlinks (no Markdown syntax):

uv run python v2.2.0/excel_to_md.py input.xlsx --hyperlink-mode inline_plain

Reduce token count (exclude CSV summary section):

uv run python v2.2.0/excel_to_md.py input.xlsx --no-csv-include-description

Key Options

Output Control

Option Default Description
--split-by-sheet false Generate individual files per sheet
--csv-markdown-enabled true Enable CSV markdown output
--csv-output-dir Same as input Output directory for CSV markdown and images
--csv-include-description true Include summary section in CSV output
--csv-include-metadata true Include validation metadata in CSV output
--image-extraction true Enable image extraction
-o, --output - Output file path for standard Markdown

Hyperlink Formats

Mode Description Output Example
inline Markdown format [text](URL)
inline_plain Plain text format text (URL)
footnote Footnote format [text][^1] + [^1]: URL
text_only Display text only text
both Inline + footnote Both formats

Mermaid Flowcharts

Option Default Description
--mermaid-enabled false Enable Mermaid conversion
--mermaid-detect-mode shapes Detection mode: shapes, column_headers, heuristic
--mermaid-direction TD Flowchart direction: TD, LR, BT, RL
--mermaid-keep-source-table true Output original table along with Mermaid

Table Processing

Option Default Description
--header-detection first_row Treat first row as header
--align-detection numbers_right Right-align numeric columns
--max-cells-per-table 200000 Maximum cells per table
--no-print-area-mode used_range Behavior when print area not set

Output Examples

Standard Markdown Output

# Conversion Result: sample.xlsx

- Spec Version: 2.0
- Sheet Count: 2
- Sheet List: Sheet1, Summary

---

## Sheet1

### Table 1 (A1:C4)
| Item | Quantity | Notes |
| --- | ---: | --- |
| Apple | 10 | [Supplier](https://example.com)[^1] |
| Orange | 5 |  |

[^1]: https://example.com

CSV Markdown Output

# CSV Output: sample.xlsx

## Summary

### File Information
- Original Excel filename: sample.xlsx
- Sheet count: 2
- Generated at: 2025-01-05 10:00:00

### About This File
This CSV markdown file is designed to help AI understand Excel content...

---

## Sheet1

```csv
Item,Quantity,Notes
Apple,10,Supplier
Orange,5,
```

---

## Validation Metadata

- **Generated at**: 2025-01-05 10:00:00
- **Original Excel file**: sample.xlsx
- **Validation status**: OK

Image Extraction

Images in Excel files are automatically processed:

  1. Automatic Extraction: Images from each sheet are saved as external files

    • Filename format: {sheet_name}_img_{number}.{extension}
    • Example: Sheet1_img_1.png, Sheet1_img_2.jpg
  2. Save Location: Output to same directory as CSV markdown

    • Directory name: {input_filename}_images/
    • Example: input.xlsxinput_images/ directory
    • Use --csv-output-dir option to change output location
  3. Markdown Links: Generates Markdown image links for cells with images

    • Format: ![alt text](relative_path)
    • Uses cell value as alt text if available
    • Auto-generates alt text like Image at A1 if cell is empty
  4. Supported Formats: PNG, JPEG, GIF

Example:

If a company logo image is at cell position (B2):

  • Image file: saved as input_images/Sheet1_img_1.png
  • CSV output: ![Company Logo](input_images/Sheet1_img_1.png)
  • Cell text "Company Logo" is used as alt text

Advanced Options

List all options:

uv run python v2.2.0/excel_to_md.py --help

Key advanced options:

  • Cell merge policy
  • Date/number format control
  • Whitespace handling
  • Markdown escape level
  • Hidden row/column policy
  • Locale-specific formatting

Directory Structure

excel2md/
├── v2.2.0/                     # Latest version
│   ├── excel_to_md.py          # Entry point
│   ├── excel2md/               # Main package
│   ├── tests/                  # Test suite
│   ├── spec.md                 # Specification
│   └── spec_appendix.md        # Specification appendix
├── v2.1.1/                     # Previous version (frozen snapshot, not published to PyPI)
├── v2.1.0/                     # Previous version (frozen snapshot)
├── v2.0.1/                     # Previous version
├── v2.0/                       # Previous version
├── v1.8/                       # Legacy version
│   ├── excel_to_md.py          # Main conversion program
│   ├── spec.md                 # Specification
│   └── tests/                  # Test suite
├── v1.7/                       # Legacy version
│   ├── excel_to_md.py          # Main conversion program
│   ├── spec.md                 # Specification
│   └── tests/                  # Test suite
├── docs/                   # Documentation
├── pyproject.toml          # Project metadata
├── LICENSE                 # MIT License
├── README.md / _ja.md     # README (English / Japanese)
├── CONTRIBUTING.md / _ja.md # Contribution guide (English / Japanese)
├── SECURITY.md / _ja.md   # Security policy (English / Japanese)
└── CHANGELOG.md / _ja.md  # Version history (English / Japanese)

Security

For security concerns, please see SECURITY.md.

Key security notes:

  • Only process Excel files from trusted sources
  • Use read_only=True mode to prevent file modification
  • Excel macros are not executed
  • Sanitize Markdown output to prevent injection

Contributing

Contributions are welcome! See CONTRIBUTING.md for details.

  • Report bugs via GitHub Issues
  • Submit pull requests for improvements
  • Follow existing code style
  • Add tests for new features

Changelog

See CHANGELOG.md for details.

Background

This tool was created during the development of IXV, an AI development support tool targeting Japanese development documents and specifications.

IXV addresses challenges in understanding, structuring, and utilizing Japanese documents in system development. This repository publicly shares a portion of that work.

License

MIT License - See LICENSE for details.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

excel2md-2.2.0.tar.gz (62.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

excel2md-2.2.0-py3-none-any.whl (57.2 kB view details)

Uploaded Python 3

File details

Details for the file excel2md-2.2.0.tar.gz.

File metadata

  • Download URL: excel2md-2.2.0.tar.gz
  • Upload date:
  • Size: 62.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for excel2md-2.2.0.tar.gz
Algorithm Hash digest
SHA256 9613a41c677c0696d32a5c8083ad9c31736d7a4e2c3f7142c327739166d96bbc
MD5 5efddbba17d44381c043e1340a9744b3
BLAKE2b-256 9318953d674839bb09032ad7a99760758c304cbbb4037da3d7df3a394ef2ebb9

See more details on using hashes here.

Provenance

The following attestation bundles were made for excel2md-2.2.0.tar.gz:

Publisher: publish.yml on elvezjp/excel2md

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file excel2md-2.2.0-py3-none-any.whl.

File metadata

  • Download URL: excel2md-2.2.0-py3-none-any.whl
  • Upload date:
  • Size: 57.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for excel2md-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 142e445b1252bc62243f929772f25abc81d1667808a2f4276e0e85afd7bcc824
MD5 6e1e22d429ee7f20525c3de3aec83f26
BLAKE2b-256 26450726838f8d0d50c9059515f25102da944693a954981507fd955fe760d873

See more details on using hashes here.

Provenance

The following attestation bundles were made for excel2md-2.2.0-py3-none-any.whl:

Publisher: publish.yml on elvezjp/excel2md

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page