Convert equation-heavy Word documents (.docx) to Markdown with LaTeX math for LLM recognition.
Project description
eqword2llm
Equation Word โ LLM: Convert equation-heavy Word documents (.docx) to Markdown with LaTeX math for LLM recognition.
Why eqword2llm?
Most Word-to-Markdown converters ignore or break mathematical equations. eqword2llm is specifically designed for scientific and technical documents where math equations are critical.
๐ Word Document โ ๐ Markdown + LaTeX
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ xยฒ + yยฒ = zยฒ โ โ $x^{2} + y^{2} = z^{2}$
โ โ โ โ โ
โ a โ โ $$\frac{a}{b}$$ โ
โ โโโ โ โ โ
โ b โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ
๐ค LLM can understand!
Features
- ๐ข Math equation conversion - OMML to LaTeX (inline
$...$and block$$...$$) - ๐ค LLM-optimized output - Clean Markdown that LLMs can understand
- ๐ Full Unicode support - Japanese, Chinese, Korean, and more
- ๐ Tables, lists, headings, formatting support
- ๐ Zero dependencies - Python standard library only
Installation
# PyPI
pip install eqword2llm
# or with uv
uv add eqword2llm
Quick Start
Command Line
# Output to stdout
eqword2llm document.docx
# Output to file
eqword2llm document.docx -o output.md
Python API
from eqword2llm import WordToMarkdownConverter
converter = WordToMarkdownConverter("research_paper.docx")
markdown = converter.convert()
print(markdown)
With LLM APIs
import anthropic
from eqword2llm import WordToMarkdownConverter
# Convert Word document with equations
converter = WordToMarkdownConverter("math_paper.docx")
markdown = converter.convert()
# Send to Claude - equations are now readable!
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": f"Explain the equations in this document:\n\n{markdown}"}
]
)
Supported Math Elements
| Element | LaTeX Output |
|---|---|
| Fraction | \frac{a}{b} |
| Superscript | x^{2} |
| Subscript | x_{i} |
| Radical | \sqrt{x}, \sqrt[n]{x} |
| Integral | \int_{a}^{b} f(x) dx |
| Summation | \sum_{i=1}^{n} x_i |
| Matrix | \begin{pmatrix}...\end{pmatrix} |
| Greek letters | \alpha, \beta, \gamma ... |
| Functions | \sin, \cos, \log, \lim ... |
| Brackets | \left(...\right) |
| Accents | \hat{x}, \vec{v}, \bar{x} |
Multilingual Support
Full support for documents in any language:
| Language | Support |
|---|---|
| Japanese (ๆฅๆฌ่ช) | โ Hiragana, Katakana, Kanji |
| Chinese (ไธญๆ) | โ Simplified and Traditional |
| Korean (ํ๊ตญ์ด) | โ Hangul |
| Arabic (ุงูุนุฑุจูุฉ) | โ RTL text |
| Cyrillic (ะ ัััะบะธะน) | โ Russian, Ukrainian, etc. |
Math symbols (ฮฑ, ฮฒ, โ, โซ, etc.) are converted to LaTeX while preserving surrounding text.
Development
# Clone and setup
git clone https://github.com/manabelab/eqword2llm.git
cd eqword2llm
uv sync --dev
# Run tests
uv run pytest tests/ -v
# Lint and type check
uv run ruff check src tests
uv run mypy src
Comparison with Other Tools
| Feature | eqword2llm | mammoth | pandoc |
|---|---|---|---|
| Math equations | โ LaTeX | โ | โณ Partial |
| Zero dependencies | โ | โ | โ |
| LLM-optimized | โ | โ | โ |
| Unicode support | โ | โ | โ |
Limitations
- Images are not currently supported
- Complex layouts (multiple columns, text boxes) are simplified
- Some special math symbols may not be converted
License
MIT License - See LICENSE for details.
Contributing
Issues and Pull Requests are welcome!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eqword2llm-0.1.0.tar.gz.
File metadata
- Download URL: eqword2llm-0.1.0.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
812d977c8adfdbfcb6cb8a1fa44482a71af4369036c45c00a3ef671e89890813
|
|
| MD5 |
f6065c619a1c92fa12536123d4aae657
|
|
| BLAKE2b-256 |
74bfacd998606e9953dccb6d517930c3d6fc8d9c2bd7e72a27f9e959170ae15f
|
Provenance
The following attestation bundles were made for eqword2llm-0.1.0.tar.gz:
Publisher:
publish.yml on manabelab/eqword2llm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eqword2llm-0.1.0.tar.gz -
Subject digest:
812d977c8adfdbfcb6cb8a1fa44482a71af4369036c45c00a3ef671e89890813 - Sigstore transparency entry: 774083804
- Sigstore integration time:
-
Permalink:
manabelab/eqword2llm@fc29070d33b9d7c2141f6d28bde1c177bb670a9a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/manabelab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fc29070d33b9d7c2141f6d28bde1c177bb670a9a -
Trigger Event:
release
-
Statement type:
File details
Details for the file eqword2llm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: eqword2llm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f50b49b9a20b6f8a4096fe1b53fe5e434c5d9bbb167b3ae83a971b5dac5dc6f
|
|
| MD5 |
34fb0f7f29f00a06733f243eed7285e5
|
|
| BLAKE2b-256 |
12e691af3348e925ca35d3ae1cdb9f5411311a2c6bc6c7131d68bec71294673b
|
Provenance
The following attestation bundles were made for eqword2llm-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on manabelab/eqword2llm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eqword2llm-0.1.0-py3-none-any.whl -
Subject digest:
5f50b49b9a20b6f8a4096fe1b53fe5e434c5d9bbb167b3ae83a971b5dac5dc6f - Sigstore transparency entry: 774083806
- Sigstore integration time:
-
Permalink:
manabelab/eqword2llm@fc29070d33b9d7c2141f6d28bde1c177bb670a9a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/manabelab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fc29070d33b9d7c2141f6d28bde1c177bb670a9a -
Trigger Event:
release
-
Statement type: