Open-source, production-grade LaTeX -> Microsoft Word (.docx) converter with native OMML math and live fields

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

tex2word

An open-source, cross-platform LaTeX → Microsoft Word (.docx) converter that produces genuinely editable Word: native paragraph styles, native OMML equations (editable in Word's equation editor, not images), and live, auto-renumbering fields for equation/figure/table numbers and cross-references. Chinese/Japanese/Korean documents (XeLaTeX/xeCJK) are supported — the configured CJK fonts carry through to Word.

Status: 1.0 — stable. Math core (direct LaTeX→OMML), the live cross-reference/field plumbing (the differentiator), image embedding, TikZ figure rendering, CJK/XeLaTeX support, the BibTeX bibliography, and the robustness layer (math cascade, coverage report, OOXML validator, round-trip manifest) are all in. See CHANGELOG.md for the release history.

Why

Pandoc/texmath is the open-source reference but drops equation numbers, can dump raw LaTeX for labelled equations, and emits static cross-references. No open tool produces editable styles and native OMML and live field-based numbering. That gap is the product.

Install & use

Requires Python 3.12+.

From PyPI:

pip install tex2word                 # core (PNG/JPEG figures)
pip install "tex2word[pdf]"          # + PDF figure rasterisation (pypdfium2, Apache-2.0)
pip install "tex2word[mathml]"       # + LaTeX->MathML->OMML for hard math (latex2mathml)
pip install "tex2word[csl]"          # + real CSL citation styles (citeproc-py)
pip install "tex2word[pdf,mathml,csl,mathimg]"   # everything

tex2word convert paper.tex -o paper.docx
tex2word convert paper.tex -o paper.docx --report report.json
tex2word convert paper.tex -o paper.docx --reference-doc journal.docx

Or, for a development checkout with uv:

uv sync --all-extras
uv run tex2word convert paper.tex -o paper.docx

Or from Python:

from tex2word import convert_source, convert_file

out_path, result = convert_file("paper.tex")
print(result.report.summary())   # math coverage + warnings

Chinese / CJK documents (XeLaTeX)

xeCJK documents convert out of the box — the fonts you select in the preamble are mapped onto Word's font slots, so Chinese/Japanese/Korean text (in prose, headings, tables and equations) renders in the intended font:

\documentclass{article}
\usepackage{xeCJK}
\setmainfont{Times New Roman}   % Latin  -> Word ascii/hAnsi
\setCJKmainfont{SimSun}         % CJK    -> Word eastAsia (body text)
\setCJKsansfont{SimHei}         % CJK    -> headings
\begin{document}
测试中文字体。Formula $\sum E = m c^2 \text{（公式）}$。
\end{document}

tex2word convert zhongwen.tex -o zhongwen.docx

The font name is recorded as written, so it must be installed on the machine that opens the .docx (e.g. SimSun/宋体, or any installed CJK font such as Noto Serif CJK SC). The choices round-trip back to a XeLaTeX preamble.

Web app (GUI)

Prefer a browser to the CLI? tex2word-gui is a companion web application — a thin adapter around tex2word's IR and outputs — for editing, converting and previewing LaTeX projects without touching a terminal. Highlights:

Convert & download a single file or a multi-file project to a real .docx, with the tex2word coverage report (math OMML %, diagnostics, validity). Conversions run as background jobs and outputs are stored durably in SQLite (they survive restarts).
Projects with a collapsible folder tree (add / rename / delete files in place), multi-file \input/\include resolution, and a LaTeX/Markdown editor with syntax highlighting; a gallery with search, kind/state filters, and per-card favourite / archive / rename / delete.
Live preview with typeset math (native MathML), a structure outline and statistics, in a resizable source⇄preview split with two-way scroll sync.
Figures — upload/import images; \includegraphics embeds them in the .docx, and PDF/EPS and inline TikZ/PGF are rasterised for the preview through tex2word's own backend.
Import from Markdown, an arXiv source bundle, or a .zip/.tar.gz project archive; round-trip a tex2word-generated .docx back to LaTeX.
tex-copilot assistant — fix / polish / explain / ask over the current file and report via a configurable LLM (Anthropic or any OpenAI-compatible), with streaming replies, diff-based accept/reject, @file and /-skill autocomplete, and persisted multi-turn chat.
Coverage dashboard aggregating tex2word metrics across all projects.
Optional accounts ([auth]: scrypt-hashed passwords, RS256 JWT sessions, owner-scoped projects) and hardening (per-IP rate limiting, request body-size cap) — both off/configurable by default.

The GUI depends on this package for the actual conversion, so everything in What works today below applies there too.

What works today

Reference Word templates ★: --reference-doc TEMPLATE.docx adopts a journal/corporate template's styles, theme and page geometry (size + margins), so the output matches the required look — while keeping the live fields below. Our custom styles are merged in so nothing renders unstyled.
Structure & styles: \title/\author/\date/abstract, \section… \subparagraph → Word Title/Heading 1–4 (visible in the Navigation pane), paragraphs, \textbf/\emph/\texttt/\underline/\textsc, quotes, code. Sections are auto-numbered (multilevel 1 / 1.1 / 1.1.1) like LaTeX, with \section* unnumbered; \ref to a section shows its live number. In book/report documents \chapter is the top level (sections nest under it) and \appendix switches to lettered headings (A, A.1).
Math (direct LaTeX→OMML): inline $…$ , display \[…\], equation/align/gather; fractions, sub/superscripts, roots, \sum/\int with limits, accents, \left…\right delimiters, matrices/cases, Greek and hundreds of symbols, \mathbb/\mathcal/\mathbf, functions (\sin, \lim). align*/aligned line up at the & (a column-justified matrix); numbered align keeps a live number per line.
Live fields ★: numbered equations get SEQ Equation fields inside bookmarks; \ref/\eqref/\pageref become REF/PAGEREF fields; figure and table captions get SEQ Figure/SEQ Table. Numbers auto-renumber in Word on field refresh. --number-by-section switches to N.M per-section numbering (STYLEREF + SEQ \s), book/report style.
Table of contents ★: \tableofcontents → a live Word TOC field (rebuilds from heading styles on refresh); \listoffigures/\listoftables → caption- sequence lists. Schema-valid and round-tripping.
Lists, tables, figures: itemize/enumerate, tabular/longtable with booktabs, \multicolumn→column span, \multirow→vertical merge, and repeating header rows; captioned figure/table, \includegraphics (PNG/JPEG embedded directly; PDF figures rasterised to PNG when the optional tex2word[pdf] extra — pypdfium2 — is installed). An \includegraphics in running text (an icon/logo) is embedded inline.
TikZ / PGF figures ★: a tikzpicture/pgfpicture/… is compiled with a TeX engine (xelatex/lualatex/pdflatex) into a cropped standalone PDF and rasterised to an embedded PNG (needs the tex2word[pdf] extra). With no TeX toolchain it degrades to a caption-only figure (the report says why).
Multi-column layout: \documentclass[twocolumn] (and a \twocolumn command or multicols{N} environment) lays the body out in N Word columns; --columns N overrides. Starred floats (figure*/table*) and the title/abstract span the full page width (via continuous section breaks that switch the column count around each spanning region). Limitation: a mid-document \onecolumn/\twocolumn switch is not modelled — the largest column count seen applies to the whole body.
Custom macros: \newcommand/\renewcommand/\def are expanded before parsing. Common mathtools/physics math (\abs, \norm, \dv, \ket, …) and siunitx (\SI{9.81}{\meter\per\second\squared} → 9.81 m/s², \num, \ang) work as built-ins when not user-defined. Acronyms (glossaries): \newacronym + \gls/\acrshort/\acrlong/\acrfull expand with the first-use "long (short)" rule.
CJK / XeLaTeX fonts: \usepackage{xeCJK} with \setmainfont, \setCJKmainfont, \setCJKsansfont and \setCJKmonofont are honoured — the Latin font becomes the Word ascii/hAnsi default and the CJK font the eastAsia default (sans on headings, mono on code), so Chinese/Japanese/Korean text renders in the intended font. The font name is recorded as written, so it must match a font installed on the machine that opens the .docx.
Footnotes: \footnote → native Word footnotes (footnotes.xml), not inlined text; footnote bodies keep their formatting and math.
Inline verbatim & smart refs: \verb|...| → literal monospace; \cref/\Cref/\autoref add cleveref-style type prefixes ("fig. N" / "Figure N").
Theorem environments: theorem/lemma/proof/definition/… render with a bold numbered lead (live SEQ per kind), optional [title], and a QED mark for proofs; \ref to a theorem shows its number.
Algorithms: algorithm + algorithmic/algpseudocode/algorithm2e → numbered, indented pseudocode with bold keywords, inline OMML math, and a live SEQ Algorithm caption.
Graceful degradation: unknown constructs never abort; they pass through best-effort and are logged to the conversion report (math coverage telemetry included). The math decision-cascade (direct OMML → LaTeX→MathML→OMML secondary path → image fallback --math-image-fallback → raw) records which path each equation took.
Round-trip: the IR is embedded as a JSON manifest custom part, so the exact IR can be recovered from the .docx (tex2word.roundtrip.recover_ir) and converted back to LaTeX (tex2word to-latex out.docx); the corpus latex→docx→latex keeps the same block structure. Reconcile (on by default) merges Word edits against the manifest, and Word Track Changes are accepted on read (insertions kept, deletions dropped).
Reports & validation: --report report.json|report.html writes a coverage report; tex2word.validate.validate_docx structurally validates output; tex2word benchmark <dir> reports a quantitative baseline (math-OMML %, validity, warnings, 0-abort) across a paper set (CI-gated on the corpus + UATs: currently 100% native-OMML math, 100% valid, 0 aborts).
Reproducible: set SOURCE_DATE_EPOCH and the same input yields byte-identical output (the .docx ZIP is built deterministically).
Live citations (opt-in --citations zotero): emit ADDIN ZOTERO_ITEM CSL_CITATION / CSL_BIBLIOGRAPHY fields so citations are editable by Zotero/Mendeley in Word (default is static formatted text).
Real CSL styles (opt-in --csl style.csl, needs tex2word[csl]): a genuine citeproc-py engine formats in-text citations and the reference list against any .csl style, with proper sorting; the built-in heuristic is the fallback. \nocite{key}/\nocite{*} are honoured.
Front-end choice: the default pure front-end (pylatexenc-based) is the validated engine — it converts the corpus and three real-paper UATs at 100% native-OMML math, 100% valid output, 0 aborts. --frontend latexml is experimental: it shells out to a real latexml install for genuine TeX expansion, but is not yet proven end-to-end (it silently falls back to pure on any failure; see the advisory real-tool CI lane).

Architecture

LaTeX ─▶ front-end (preprocess, macro-expand, pylatexenc walk) ─▶ IR
      ─▶ transforms (cross-reference resolution) ─▶ IR
      ─▶ back-end (raw OOXML via lxml: document/styles/numbering) ─▶ .docx

The IR (src/tex2word/ir.py) is the format-neutral seam, so a LaTeXML front-end can replace the static parser post-V1 without touching the back-end.

Development

uv run pytest          # tests
uv run ruff check src tests
uv run mypy src
uv run pre-commit install   # optional: run the lint/type gate on every commit

Releases: pushing a vX.Y.Z tag builds the wheel/sdist and publishes to PyPI (via the Release workflow, using PyPI Trusted Publishing). Notable changes are recorded in CHANGELOG.md.

License

MIT — see LICENSE.

Author

Yifan Yang yfyang.86@hotmail.com

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

yifanyang

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.6

Jul 23, 2026

1.0.5

Jul 12, 2026

1.0.4

Jul 3, 2026

1.0.3

Jun 28, 2026

1.0.2

Jun 22, 2026

1.0.1

Jun 8, 2026

1.0.0

Jun 7, 2026

0.9.1

Jun 7, 2026

0.9.0

Jun 7, 2026

0.8.2

Jun 5, 2026

0.8.1

Jun 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tex2word-1.0.6.tar.gz (153.4 kB view details)

Uploaded Jul 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tex2word-1.0.6-py3-none-any.whl (174.7 kB view details)

Uploaded Jul 23, 2026 Python 3

File details

Details for the file tex2word-1.0.6.tar.gz.

File metadata

Download URL: tex2word-1.0.6.tar.gz
Upload date: Jul 23, 2026
Size: 153.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.14

File hashes

Hashes for tex2word-1.0.6.tar.gz
Algorithm	Hash digest
SHA256	`27312870ece2e6928e7ce973769e5ab4c8632003353923ff80b2445d3f5f419b`
MD5	`253c3eeff8e7188379eb4367a39f03e0`
BLAKE2b-256	`313434eb9a87b1c3a3c6b9a58407fd9bbb539ca89b97d5b3cc94a927e7b78429`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tex2word-1.0.6.tar.gz:

Publisher: release.yml on yfyang86/tex2word

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tex2word-1.0.6.tar.gz
- Subject digest: 27312870ece2e6928e7ce973769e5ab4c8632003353923ff80b2445d3f5f419b
- Sigstore transparency entry: 2223866869
- Sigstore integration time: Jul 23, 2026
Source repository:
- Permalink: yfyang86/tex2word@3c278ba82ffe3496d71f61950a693d728dacbf58
- Branch / Tag: refs/tags/v1.0.6
- Owner: https://github.com/yfyang86
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3c278ba82ffe3496d71f61950a693d728dacbf58
- Trigger Event: push

File details

Details for the file tex2word-1.0.6-py3-none-any.whl.

File metadata

Download URL: tex2word-1.0.6-py3-none-any.whl
Upload date: Jul 23, 2026
Size: 174.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.14

File hashes

Hashes for tex2word-1.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`70d3f04c507b3c9f7bdf143281a5334a0ae20bcbdd3f44e8ae52aded201e2852`
MD5	`8c7467efe837ee3f85e1f25eca5d83e6`
BLAKE2b-256	`127caebc476a58290c66411af183c6e2311c774b0ef486c3428b0e61bf7d41e3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tex2word-1.0.6-py3-none-any.whl:

Publisher: release.yml on yfyang86/tex2word

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tex2word-1.0.6-py3-none-any.whl
- Subject digest: 70d3f04c507b3c9f7bdf143281a5334a0ae20bcbdd3f44e8ae52aded201e2852
- Sigstore transparency entry: 2223867358
- Sigstore integration time: Jul 23, 2026
Source repository:
- Permalink: yfyang86/tex2word@3c278ba82ffe3496d71f61950a693d728dacbf58
- Branch / Tag: refs/tags/v1.0.6
- Owner: https://github.com/yfyang86
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3c278ba82ffe3496d71f61950a693d728dacbf58
- Trigger Event: push

tex2word 1.0.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

tex2word

Why

Install & use

Chinese / CJK documents (XeLaTeX)

Web app (GUI)

What works today

Architecture

Development

License

Author

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance