Skip to main content

Convert equation-heavy Word documents (.docx) to Markdown with LaTeX math for LLM recognition.

Project description

eqword2llm

PyPI version Python CI License: MIT

Equation Word โ†’ LLM: Convert equation-heavy Word documents (.docx) to Markdown with LaTeX math for LLM recognition.

Why eqword2llm?

Most Word-to-Markdown converters ignore or break mathematical equations. eqword2llm is specifically designed for scientific and technical documents where math equations are critical.

๐Ÿ“„ Word Document          โ†’    ๐Ÿ“ Markdown + LaTeX
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ xยฒ + yยฒ = zยฒ        โ”‚        โ”‚ $x^{2} + y^{2} = z^{2}$
โ”‚                     โ”‚   โ†’    โ”‚                     โ”‚
โ”‚    a                โ”‚        โ”‚ $$\frac{a}{b}$$     โ”‚
โ”‚   โ”€โ”€โ”€               โ”‚        โ”‚                     โ”‚
โ”‚    b                โ”‚        โ”‚                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                      โ†“
                               ๐Ÿค– LLM can understand!

Features

  • ๐Ÿ”ข Math equation conversion - OMML to LaTeX (inline $...$ and block $$...$$)
  • ๐Ÿค– LLM-optimized output - Clean Markdown that LLMs can understand
  • ๐ŸŒ Full Unicode support - Japanese, Chinese, Korean, and more
  • ๐Ÿ“Š Tables, lists, headings, formatting support
  • ๐Ÿ Zero dependencies - Python standard library only

Installation

# PyPI
pip install eqword2llm

# or with uv
uv add eqword2llm

Quick Start

Command Line

# Output to stdout
eqword2llm document.docx

# Output to file
eqword2llm document.docx -o output.md

Python API

from eqword2llm import WordToMarkdownConverter

converter = WordToMarkdownConverter("research_paper.docx")
markdown = converter.convert()
print(markdown)

With LLM APIs

import anthropic
from eqword2llm import WordToMarkdownConverter

# Convert Word document with equations
converter = WordToMarkdownConverter("math_paper.docx")
markdown = converter.convert()

# Send to Claude - equations are now readable!
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": f"Explain the equations in this document:\n\n{markdown}"}
    ]
)

Supported Math Elements

Element LaTeX Output
Fraction \frac{a}{b}
Superscript x^{2}
Subscript x_{i}
Radical \sqrt{x}, \sqrt[n]{x}
Integral \int_{a}^{b} f(x) dx
Summation \sum_{i=1}^{n} x_i
Matrix \begin{pmatrix}...\end{pmatrix}
Greek letters \alpha, \beta, \gamma ...
Functions \sin, \cos, \log, \lim ...
Brackets \left(...\right)
Accents \hat{x}, \vec{v}, \bar{x}

Multilingual Support

Full support for documents in any language:

Language Support
Japanese (ๆ—ฅๆœฌ่ชž) โœ… Hiragana, Katakana, Kanji
Chinese (ไธญๆ–‡) โœ… Simplified and Traditional
Korean (ํ•œ๊ตญ์–ด) โœ… Hangul
Arabic (ุงู„ุนุฑุจูŠุฉ) โœ… RTL text
Cyrillic (ะ ัƒััะบะธะน) โœ… Russian, Ukrainian, etc.

Math symbols (ฮฑ, ฮฒ, โˆ‘, โˆซ, etc.) are converted to LaTeX while preserving surrounding text.

Development

# Clone and setup
git clone https://github.com/manabelab/eqword2llm.git
cd eqword2llm
uv sync --dev

# Run tests
uv run pytest tests/ -v

# Lint and type check
uv run ruff check src tests
uv run mypy src

Comparison with Other Tools

Feature eqword2llm mammoth pandoc
Math equations โœ… LaTeX โŒ โ–ณ Partial
Zero dependencies โœ… โŒ โŒ
LLM-optimized โœ… โŒ โŒ
Unicode support โœ… โœ… โœ…

Limitations

  • Images are not currently supported
  • Complex layouts (multiple columns, text boxes) are simplified
  • Some special math symbols may not be converted

License

MIT License - See LICENSE for details.

Contributing

Issues and Pull Requests are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eqword2llm-0.1.0.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eqword2llm-0.1.0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file eqword2llm-0.1.0.tar.gz.

File metadata

  • Download URL: eqword2llm-0.1.0.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eqword2llm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 812d977c8adfdbfcb6cb8a1fa44482a71af4369036c45c00a3ef671e89890813
MD5 f6065c619a1c92fa12536123d4aae657
BLAKE2b-256 74bfacd998606e9953dccb6d517930c3d6fc8d9c2bd7e72a27f9e959170ae15f

See more details on using hashes here.

Provenance

The following attestation bundles were made for eqword2llm-0.1.0.tar.gz:

Publisher: publish.yml on manabelab/eqword2llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eqword2llm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: eqword2llm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eqword2llm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5f50b49b9a20b6f8a4096fe1b53fe5e434c5d9bbb167b3ae83a971b5dac5dc6f
MD5 34fb0f7f29f00a06733f243eed7285e5
BLAKE2b-256 12e691af3348e925ca35d3ae1cdb9f5411311a2c6bc6c7131d68bec71294673b

See more details on using hashes here.

Provenance

The following attestation bundles were made for eqword2llm-0.1.0-py3-none-any.whl:

Publisher: publish.yml on manabelab/eqword2llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page