Skip to main content

Convert equation-heavy Word documents (.docx) to Markdown with LaTeX math for LLM recognition.

Project description

eqword2llm

PyPI version PyPI downloads Python CI License: MIT

Equation Word → LLM: Convert equation-heavy Word documents (.docx) to Markdown with LaTeX math for LLM recognition.

Why eqword2llm?

Most Word-to-Markdown converters ignore or break mathematical equations. eqword2llm is specifically designed for scientific and technical documents where math equations are critical.

Word to Markdown conversion flow

Features

  • 🔢 Math equation conversion - OMML to LaTeX (inline $...$ and block $$...$$)
  • 🔖 Automatic equation numbering - Block equations get \tag{N} (can be disabled)
  • 🤖 LLM-optimized output - Clean Markdown that LLMs can understand
  • 🌍 Full Unicode support - Japanese, Chinese, Korean, and more
  • 📊 Tables, lists, headings, formatting support
  • 🐍 Zero dependencies - Python standard library only

Installation

# PyPI
pip install eqword2llm

# or with uv
uv add eqword2llm

Quick Start

Command Line

# Output to stdout (with equation numbers by default)
eqword2llm document.docx

# Output to file
eqword2llm document.docx -o output.md

# Disable equation numbering
eqword2llm document.docx -o output.md --no-equation-numbers

Python API

from eqword2llm import WordToMarkdownConverter

# With equation numbers (default)
converter = WordToMarkdownConverter("research_paper.docx")
markdown = converter.convert()

# Without equation numbers
converter = WordToMarkdownConverter("research_paper.docx", equation_numbers=False)
markdown = converter.convert()

With LLM APIs

import anthropic
from eqword2llm import WordToMarkdownConverter

# Convert Word document with equations
converter = WordToMarkdownConverter("math_paper.docx")
markdown = converter.convert()

# Send to Claude - equations are now readable!
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": f"Explain the equations in this document:\n\n{markdown}"}
    ]
)

Equation Numbering

Block equations are automatically numbered using LaTeX \tag{N} syntax:

With numbering (default):

$$
E = mc^{2} \tag{1}
$$

$$
F = ma \tag{2}
$$

Without numbering (equation_numbers=False or --no-equation-numbers):

$$
E = mc^{2}
$$

$$
F = ma
$$

Supported Math Elements

Element LaTeX Output
Fraction \frac{a}{b}
Superscript x^{2}
Subscript x_{i}
Radical \sqrt{x}, \sqrt[n]{x}
Integral \int_{a}^{b} f(x) dx
Summation \sum_{i=1}^{n} x_i
Matrix \begin{pmatrix}...\end{pmatrix}
Greek letters \alpha, \beta, \gamma ...
Functions \sin, \cos, \log, \lim ...
Brackets \left(...\right)
Accents \hat{x}, \vec{v}, \bar{x}

Multilingual Support

Full support for documents in any language:

Language Support
Japanese (日本語) ✅ Hiragana, Katakana, Kanji
Chinese (中文) ✅ Simplified and Traditional
Korean (한국어) ✅ Hangul
Arabic (العربية) ✅ RTL text
Cyrillic (Русский) ✅ Russian, Ukrainian, etc.

Math symbols (α, β, ∑, ∫, etc.) are converted to LaTeX while preserving surrounding text.

Development

# Clone and setup
git clone https://github.com/manabelab/eqword2llm.git
cd eqword2llm
uv sync --dev

# Run tests
uv run pytest tests/ -v

# Lint and type check
uv run ruff check src tests
uv run mypy src

Comparison with Other Tools

Feature eqword2llm mammoth pandoc
Math equations ✅ LaTeX △ Partial
Equation numbering
Zero dependencies
LLM-optimized
Unicode support

Limitations

  • Images are not currently supported
  • Complex layouts (multiple columns, text boxes) are simplified
  • Some special math symbols may not be converted

License

MIT License - See LICENSE for details.

Contributing

Issues and Pull Requests are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eqword2llm-0.2.1.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eqword2llm-0.2.1-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file eqword2llm-0.2.1.tar.gz.

File metadata

  • Download URL: eqword2llm-0.2.1.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eqword2llm-0.2.1.tar.gz
Algorithm Hash digest
SHA256 01ed0065310af5bff63a79ee5011d81c1c0eb88398bf078ae97de1e800fa248e
MD5 6915eb300006fc22fbb6732d24639c2a
BLAKE2b-256 a55ab6cf1ba97ff6c3d79bb6bbffcf8e0ada102c08b5a45e56d44c4334a04680

See more details on using hashes here.

Provenance

The following attestation bundles were made for eqword2llm-0.2.1.tar.gz:

Publisher: publish.yml on manabelab/eqword2llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eqword2llm-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: eqword2llm-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eqword2llm-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 aa3190fc7d5b2b3550f5b37487ac38230d63d9152d8a0bb4975b1512d346a06b
MD5 b67e3e18cefc38b6e4586dfbaa389af8
BLAKE2b-256 6e5374bdd96de993b16c7da8e235fdaf0d265da67d11415d85d4f67341493d56

See more details on using hashes here.

Provenance

The following attestation bundles were made for eqword2llm-0.2.1-py3-none-any.whl:

Publisher: publish.yml on manabelab/eqword2llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page