Skip to main content

Convert equation-heavy Word documents (.docx) to Markdown with LaTeX math for LLM recognition.

Project description

eqword2llm

PyPI version PyPI downloads Python CI License: MIT

Equation Word → LLM: Convert equation-heavy Word documents (.docx) to Markdown with LaTeX math for LLM recognition.

Why eqword2llm?

Most Word-to-Markdown converters ignore or break mathematical equations. eqword2llm is specifically designed for scientific and technical documents where math equations are critical.

Word to Markdown conversion flow

Features

  • 🔢 Math equation conversion - OMML to LaTeX (inline $...$ and block $$...$$)
  • 🔖 Automatic equation numbering - Block equations get \tag{N} (can be disabled)
  • 🤖 LLM-optimized output - Clean Markdown that LLMs can understand
  • 🌍 Full Unicode support - Japanese, Chinese, Korean, and more
  • 📊 Tables, lists, headings, formatting support
  • 🐍 Zero dependencies - Python standard library only

Installation

# PyPI
pip install eqword2llm

# or with uv
uv add eqword2llm

Quick Start

Command Line

# Output to stdout (with equation numbers by default)
eqword2llm document.docx

# Output to file
eqword2llm document.docx -o output.md

# Disable equation numbering
eqword2llm document.docx -o output.md --no-equation-numbers

Python API

from eqword2llm import WordToMarkdownConverter

# With equation numbers (default)
converter = WordToMarkdownConverter("research_paper.docx")
markdown = converter.convert()

# Without equation numbers
converter = WordToMarkdownConverter("research_paper.docx", equation_numbers=False)
markdown = converter.convert()

With LLM APIs

import anthropic
from eqword2llm import WordToMarkdownConverter

# Convert Word document with equations
converter = WordToMarkdownConverter("math_paper.docx")
markdown = converter.convert()

# Send to Claude - equations are now readable!
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": f"Explain the equations in this document:\n\n{markdown}"}
    ]
)

Equation Numbering

Block equations are automatically numbered using LaTeX \tag{N} syntax:

With numbering (default):

$$
E = mc^{2} \tag{1}
$$

$$
F = ma \tag{2}
$$

Without numbering (equation_numbers=False or --no-equation-numbers):

$$
E = mc^{2}
$$

$$
F = ma
$$

Supported Math Elements

Element LaTeX Output
Fraction \frac{a}{b}
Superscript x^{2}
Subscript x_{i}
Radical \sqrt{x}, \sqrt[n]{x}
Integral \int_{a}^{b} f(x) dx
Summation \sum_{i=1}^{n} x_i
Matrix \begin{pmatrix}...\end{pmatrix}
Greek letters \alpha, \beta, \gamma ...
Functions \sin, \cos, \log, \lim ...
Brackets \left(...\right)
Accents \hat{x}, \vec{v}, \bar{x}

Multilingual Support

Full support for documents in any language:

Language Support
Japanese (日本語) ✅ Hiragana, Katakana, Kanji
Chinese (中文) ✅ Simplified and Traditional
Korean (한국어) ✅ Hangul
Arabic (العربية) ✅ RTL text
Cyrillic (Русский) ✅ Russian, Ukrainian, etc.

Math symbols (α, β, ∑, ∫, etc.) are converted to LaTeX while preserving surrounding text.

Development

# Clone and setup
git clone https://github.com/manabelab/eqword2llm.git
cd eqword2llm
uv sync --dev

# Run tests
uv run pytest tests/ -v

# Lint and type check
uv run ruff check src tests
uv run mypy src

Comparison with Other Tools

Feature eqword2llm mammoth pandoc
Math equations ✅ LaTeX △ Partial
Equation numbering
Field code handling
Markdown headings
Zero dependencies
LLM-optimized
Unicode support

Concrete Examples

1. Determinant (Matrix with vertical bars)

Pandoc output (verbose, non-standard):

$$|A| = \left| \begin{matrix}
a & b \\
c & d
\end{matrix} \right| = ad - bc$$

eqword2llm output (concise, standard LaTeX):

$$
\left|A\right|=\begin{vmatrix}a & b \\ c & d\end{vmatrix}=ad-bc
$$
Aspect Pandoc eqword2llm
Syntax \left| \begin{matrix}...\right| \begin{vmatrix}...
Characters 62 45 (-27%)
LaTeX standard ⚠️ Non-standard combination ✅ Standard amsmath environment

2. Word Field Codes (SEQ Equation)

Pandoc output (broken):

$$E = mc^{2}\#(\ SEQ\ Equation\ \backslash*\ ARABIC\ 1)$$

eqword2llm output (clean):

$$
E=mc^{2}
$$

3. Vector notation

Pandoc output (verbose):

$$\overset{\rightarrow}{v}$$

eqword2llm output (standard):

$$\vec{v}$$

📖 See detailed comparison with more examples →

Limitations

  • Images are not currently supported
  • Complex layouts (multiple columns, text boxes) are simplified
  • Some special math symbols may not be converted

License

MIT License - See LICENSE for details.

Contributing

Issues and Pull Requests are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eqword2llm-0.3.0.tar.gz (28.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eqword2llm-0.3.0-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file eqword2llm-0.3.0.tar.gz.

File metadata

  • Download URL: eqword2llm-0.3.0.tar.gz
  • Upload date:
  • Size: 28.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eqword2llm-0.3.0.tar.gz
Algorithm Hash digest
SHA256 863aace25b78aec6c7538db55ea12025c5c73f365bcb1386b89bdd59e17f71d1
MD5 738d07259f398c51af5b74140ce0527c
BLAKE2b-256 14b1b604f71911da512ae808d55b9d180817f1c3f9ce0997666de5f232168d5d

See more details on using hashes here.

Provenance

The following attestation bundles were made for eqword2llm-0.3.0.tar.gz:

Publisher: publish.yml on manabelab/eqword2llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eqword2llm-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: eqword2llm-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eqword2llm-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 17bf27d23cd2b420e40b67454a97f9d4421ac197013d8fb760ae43962fe70a83
MD5 f0045eff6c9bba488f29df09099faa5b
BLAKE2b-256 05fca1190a10ef92ec73fdcecae1606be2fcadaa4b4d0262f598694e1b62604e

See more details on using hashes here.

Provenance

The following attestation bundles were made for eqword2llm-0.3.0-py3-none-any.whl:

Publisher: publish.yml on manabelab/eqword2llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page