Open-source, production-grade LaTeX -> Microsoft Word (.docx) converter with native OMML math and live fields

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

latex2word

An open-source, cross-platform LaTeX → Microsoft Word (.docx) converter that produces genuinely editable Word: native paragraph styles, native OMML equations (editable in Word's equation editor, not images), and live, auto-renumbering fields for equation/figure/table numbers and cross-references.

Status: production-grade. Foundation, math core (direct LaTeX→OMML), the live cross-reference/field plumbing (the differentiator), image embedding, the BibTeX bibliography, and the robustness layer (math cascade, coverage report, OOXML validator, round-trip manifest) are all in. See CHANGELOG.md for the release history.

Why

Pandoc/texmath is the open-source reference but drops equation numbers, can dump raw LaTeX for labelled equations, and emits static cross-references. No open tool produces editable styles and native OMML and live field-based numbering. That gap is the product.

Install & use

Requires Python 3.12+.

From PyPI:

pip install tex2word                 # core (PNG/JPEG figures)
pip install "tex2word[pdf]"          # + PDF figure rasterisation (pypdfium2, Apache-2.0)
pip install "tex2word[mathml]"       # + LaTeX->MathML->OMML for hard math (latex2mathml)
pip install "tex2word[csl]"          # + real CSL citation styles (citeproc-py)
pip install "tex2word[pdf,mathml,csl,mathimg]"   # everything

latex2word convert paper.tex -o paper.docx
latex2word convert paper.tex -o paper.docx --report report.json
latex2word convert paper.tex -o paper.docx --reference-doc journal.docx

Or, for a development checkout with uv:

uv sync --all-extras
uv run latex2word convert paper.tex -o paper.docx

Or from Python:

from latex2word import convert_source, convert_file

out_path, result = convert_file("paper.tex")
print(result.report.summary())   # math coverage + warnings

What works today

Reference Word templates ★: --reference-doc TEMPLATE.docx adopts a journal/corporate template's styles, theme and page geometry (size + margins), so the output matches the required look — while keeping the live fields below. Our custom styles are merged in so nothing renders unstyled.
Structure & styles: \title/\author/\date/abstract, \section… \subparagraph → Word Title/Heading 1–4 (visible in the Navigation pane), paragraphs, \textbf/\emph/\texttt/\underline/\textsc, quotes, code. Sections are auto-numbered (multilevel 1 / 1.1 / 1.1.1) like LaTeX, with \section* unnumbered; \ref to a section shows its live number. In book/report documents \chapter is the top level (sections nest under it) and \appendix switches to lettered headings (A, A.1).
Math (direct LaTeX→OMML): inline $…$ , display \[…\], equation/align/gather; fractions, sub/superscripts, roots, \sum/\int with limits, accents, \left…\right delimiters, matrices/cases, Greek and hundreds of symbols, \mathbb/\mathcal/\mathbf, functions (\sin, \lim). align*/aligned line up at the & (a column-justified matrix); numbered align keeps a live number per line.
Live fields ★: numbered equations get SEQ Equation fields inside bookmarks; \ref/\eqref/\pageref become REF/PAGEREF fields; figure and table captions get SEQ Figure/SEQ Table. Numbers auto-renumber in Word on field refresh. --number-by-section switches to N.M per-section numbering (STYLEREF + SEQ \s), book/report style.
Table of contents ★: \tableofcontents → a live Word TOC field (rebuilds from heading styles on refresh); \listoffigures/\listoftables → caption- sequence lists. Schema-valid and round-tripping.
Lists, tables, figures: itemize/enumerate, tabular/longtable with booktabs, \multicolumn→column span, \multirow→vertical merge, and repeating header rows; captioned figure/table, \includegraphics (PNG/JPEG embedded directly; PDF figures rasterised to PNG when the optional tex2word[pdf] extra — pypdfium2 — is installed). An \includegraphics in running text (an icon/logo) is embedded inline.
Custom macros: \newcommand/\renewcommand/\def are expanded before parsing. Common mathtools/physics math (\abs, \norm, \dv, \ket, …) and siunitx (\SI{9.81}{\meter\per\second\squared} → 9.81 m/s², \num, \ang) work as built-ins when not user-defined. Acronyms (glossaries): \newacronym + \gls/\acrshort/\acrlong/\acrfull expand with the first-use "long (short)" rule.
Footnotes: \footnote → native Word footnotes (footnotes.xml), not inlined text; footnote bodies keep their formatting and math.
Inline verbatim & smart refs: \verb|...| → literal monospace; \cref/\Cref/\autoref add cleveref-style type prefixes ("fig. N" / "Figure N").
Theorem environments: theorem/lemma/proof/definition/… render with a bold numbered lead (live SEQ per kind), optional [title], and a QED mark for proofs; \ref to a theorem shows its number.
Algorithms: algorithm + algorithmic/algpseudocode/algorithm2e → numbered, indented pseudocode with bold keywords, inline OMML math, and a live SEQ Algorithm caption.
Graceful degradation: unknown constructs never abort; they pass through best-effort and are logged to the conversion report (math coverage telemetry included). The math decision-cascade (direct OMML → LaTeX→MathML→OMML secondary path → image fallback --math-image-fallback → raw) records which path each equation took.
Round-trip: the IR is embedded as a JSON manifest custom part, so the exact IR can be recovered from the .docx (latex2word.roundtrip.recover_ir) and converted back to LaTeX (latex2word to-latex out.docx); the corpus latex→docx→latex keeps the same block structure. Reconcile (on by default) merges Word edits against the manifest, and Word Track Changes are accepted on read (insertions kept, deletions dropped).
Reports & validation: --report report.json|report.html writes a coverage report; latex2word.validate.validate_docx structurally validates output; latex2word benchmark <dir> reports a quantitative baseline (math-OMML %, validity, warnings, 0-abort) across a paper set (CI-gated on the corpus + UATs: currently 100% native-OMML math, 100% valid, 0 aborts).
Reproducible: set SOURCE_DATE_EPOCH and the same input yields byte-identical output (the .docx ZIP is built deterministically).
Live citations (opt-in --citations zotero): emit ADDIN ZOTERO_ITEM CSL_CITATION / CSL_BIBLIOGRAPHY fields so citations are editable by Zotero/Mendeley in Word (default is static formatted text).
Real CSL styles (opt-in --csl style.csl, needs tex2word[csl]): a genuine citeproc-py engine formats in-text citations and the reference list against any .csl style, with proper sorting; the built-in heuristic is the fallback. \nocite{key}/\nocite{*} are honoured.
Front-end choice: the default pure front-end (pylatexenc-based) is the validated engine — it converts the corpus and three real-paper UATs at 100% native-OMML math, 100% valid output, 0 aborts. --frontend latexml is experimental: it shells out to a real latexml install for genuine TeX expansion, but is not yet proven end-to-end (it silently falls back to pure on any failure; see the advisory real-tool CI lane).

Architecture

LaTeX ─▶ front-end (preprocess, macro-expand, pylatexenc walk) ─▶ IR
      ─▶ transforms (cross-reference resolution) ─▶ IR
      ─▶ back-end (raw OOXML via lxml: document/styles/numbering) ─▶ .docx

The IR (src/latex2word/ir.py) is the format-neutral seam, so a LaTeXML front-end can replace the static parser post-V1 without touching the back-end.

Development

uv run pytest          # tests
uv run ruff check src tests
uv run mypy src
uv run pre-commit install   # optional: run the lint/type gate on every commit

Releases: pushing a vX.Y.Z tag builds the wheel/sdist and publishes to PyPI (via the Release workflow, using PyPI Trusted Publishing). Notable changes are recorded in CHANGELOG.md.

License

MIT — see LICENSE.

Author

Yifan Yang yfyang.86@hotmail.com

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

yifanyang

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.8.2

Jun 5, 2026

This version

0.8.1

Jun 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tex2word-0.8.1.tar.gz (122.0 kB view details)

Uploaded Jun 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tex2word-0.8.1-py3-none-any.whl (142.2 kB view details)

Uploaded Jun 5, 2026 Python 3

File details

Details for the file tex2word-0.8.1.tar.gz.

File metadata

Download URL: tex2word-0.8.1.tar.gz
Upload date: Jun 5, 2026
Size: 122.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tex2word-0.8.1.tar.gz
Algorithm	Hash digest
SHA256	`4c2d60afb403647f30353fcac40028d17b7088bfd8f18c75b115757ff777f333`
MD5	`ba02ce1e6442a3c847cd4114fa8a3bea`
BLAKE2b-256	`6aea31cc93a5a8fb85f815eeb8ffd13c2dbf12a8fe79757b4db9e1cd553ee78f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tex2word-0.8.1.tar.gz:

Publisher: release.yml on yfyang86/tex2word

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tex2word-0.8.1.tar.gz
- Subject digest: 4c2d60afb403647f30353fcac40028d17b7088bfd8f18c75b115757ff777f333
- Sigstore transparency entry: 1731597046
- Sigstore integration time: Jun 5, 2026
Source repository:
- Permalink: yfyang86/tex2word@4034f5e969a7d0256166cf1713f96a4e0b328dd9
- Branch / Tag: refs/tags/v0.8.1
- Owner: https://github.com/yfyang86
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@4034f5e969a7d0256166cf1713f96a4e0b328dd9
- Trigger Event: push

File details

Details for the file tex2word-0.8.1-py3-none-any.whl.

File metadata

Download URL: tex2word-0.8.1-py3-none-any.whl
Upload date: Jun 5, 2026
Size: 142.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tex2word-0.8.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5383bb3c98062d7117f783f415952d53dd397bde3b162bb9722c3ab3c5a598e7`
MD5	`17317f8c28d7cd519d9a9df08394c6fe`
BLAKE2b-256	`fd20f0fd4158f346e68a3c240c32d0969bcf5d3ac7eca095f15ea6ead2fc473a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tex2word-0.8.1-py3-none-any.whl:

Publisher: release.yml on yfyang86/tex2word

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tex2word-0.8.1-py3-none-any.whl
- Subject digest: 5383bb3c98062d7117f783f415952d53dd397bde3b162bb9722c3ab3c5a598e7
- Sigstore transparency entry: 1731597104
- Sigstore integration time: Jun 5, 2026
Source repository:
- Permalink: yfyang86/tex2word@4034f5e969a7d0256166cf1713f96a4e0b328dd9
- Branch / Tag: refs/tags/v0.8.1
- Owner: https://github.com/yfyang86
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@4034f5e969a7d0256166cf1713f96a4e0b328dd9
- Trigger Event: push

tex2word 0.8.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

latex2word

Why

Install & use

What works today

Architecture

Development

License

Author

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance