A project to convert LaTeX to DOCX
Project description
LaTeX to Word Conversion Tool
In daily work, colleagues or supervisors unfamiliar with LaTeX may request Word documents for review and collaboration. This project provides a Python script that uses Pandoc and Pandoc-Crossref to automatically convert LaTeX files into Word documents following a specified format. Although there's no perfect method for converting LaTeX to Word, the output generated by this project meets informal review needs. However, around 5% of the content (such as author information) may require manual corrections post-conversion.
Features
- Supports formula conversion
- Automatically numbers and cross-references images, tables, formulas, and citations
- Converts multi-figure LaTeX files
- Outputs Word files in a specified format
- Supports Chinese
Examples are shown below; more results are in tests:
Quick Start
Ensure Pandoc and Pandoc-Crossref are correctly installed (see Install Dependencies). Execute the following command in your terminal:
tex2docx convert --input-texfile <your_texfile> --output-docxfile <your_docxfile>
Replace <...> in the command with the corresponding file path and name.
Install Dependencies
Ensure you have installed Pandoc, Pandoc-Crossref, and related Python libraries.
Pandoc
Install Pandoc as described in the official documentation. It is recommended to download the latest release from Pandoc Releases.
Pandoc-Crossref
Install Pandoc-Crossref by following the official documentation. Ensure compatibility between Pandoc and Pandoc-Crossref and configure the path correctly.
Python Libraries
Install from PyPI:
pip install tex2docx
Usage and Examples
This tool supports both command-line and script-based usage. Ensure the required dependencies are installed.
Command-Line Usage
Run the following command in your terminal:
tex2docx convert --input-texfile <your_texfile> --output-docxfile <your_docxfile> --reference-docfile <your_reference_docfile> --bibfile <your_bibfile> --cslfile <your_cslfile>
Use convert --help to see details on these parameters.
For example, using tests/en:
convert --input-texfile ./tests/en/main.tex --output-docxfile ./tests/en/main_cli.docx --reference-docfile ./my_temp.docx --bibfile ./tests/ref.bib --cslfile ./ieee.csl
This will generate the Word file main_cli.docx in the tests/en directory.
Script Usage
from tex2docx import LatexToWordConverter
config = {
'input_texfile': '<your_texfile>',
'output_docxfile': '<your_docxfile>',
'reference_docfile': '<your_reference_docfile>',
'cslfile': '<your_cslfile>',
'bibfile': '<your_bibfile>',
'fix_table': True,
'debug': False
}
converter = LatexToWordConverter(**config)
converter.convert()
For more examples, refer to tests/test_integration.py.
FAQ
-
Inconsistent Multi-Figure Layout
The relative positions of sub-figures may differ between LaTeX compilation and Word conversion, as shown below:This may result from redefined page size or parameters in the LaTeX file. To address this, adjust the
MULTIFIG_TEXFILE_TEMPLATEvariable. Below is an example for reference:import tex2docx my_multifig_texfile_template = r""" \documentclass[preview,convert,convert={outext=.png,command=\unexpanded{pdftocairo -r 600 -png \infile}}]{standalone} \usepackage{graphicx} \usepackage{subfig} \usepackage{xeCJK} \usepackage{geometry} \newgeometry{ top=25.4mm, bottom=33.3mm, left=20mm, right=20mm, headsep=10.4mm, headheight=5mm, footskip=7.9mm, } \graphicspath{{%s}} \begin{document} \thispagestyle{empty} %s \end{document} """ config = { 'input_texfile': 'tests/en/main.tex', 'output_docxfile': 'tests/en/main.docx', 'reference_docfile': 'my_temp.docx', 'cslfile': 'ieee.csl', 'bibfile': 'tests/ref.bib', 'multifig_texfile_template': my_multifig_texfile_template, } converter = tex2docx.LatexToWordConverter(**config) converter.convert()
-
The Word Output Doesn't Meet Formatting Requirements
Use Word's style management tools to adjust the styles inmy_temp.docx.
Implementation Details
This project relies on Pandoc and Pandoc-Crossref to convert LaTeX files to Word documents. The core command used is:
pandoc texfile -o docxfile \
--lua-filter resolve_equation_labels.lua \
--filter pandoc-crossref \
--reference-doc=temp.docx \
--number-sections \
-M autoEqnLabels \
-M tableEqns \
-M reference-section-title=Reference \
--bibliography=ref.bib \
--citeproc --csl ieee.csl \
-t docx+native_numbering
--lua-filter resolve_equation_labels.luahandles equation numbering and cross-references, inspired by Constantin Ahlmann-Eltze's script.--filter pandoc-crossrefhandles cross-references for other elements.--reference-doc=my_temp.docxapplies the styles frommy_temp.docxto the generated Word file. Two template files are included:TIE-temp.docx(for TIE journal submission, double-column format) andmy_temp.docx(single-column, designed for easier annotation).--number-sectionsadds numbering to section headings.-M autoEqnLabels,-M tableEqnsenable automatic numbering of equations and tables.-M reference-section-title=Referenceadds a section title for references.--bibliography=my_ref.bibgenerates the bibliography fromref.bib.--citeproc --csl ieee.cslformats citations and the bibliography using the IEEE citation style.-t docx+native_numberingimproves captions for images and tables.
The conversion for multi-figure LaTeX content may not be perfect. This project extracts multi-figure code from the LaTeX file and uses the convert and pdftocairo tools to compile the figures into a single large PNG file, replacing the original LaTeX image code and updating references to ensure smooth import into Word.
Known Issues
- Captions for figures and tables in Chinese still start with "Figure" and "Table".
- Author information is not fully converted.
Changelog
v1.3.0
-
Major code refactoring: Complete modular restructuring of the codebase
- Split monolithic
tex2docx.py(1139 lines) into 8 specialized modules for better maintainability - Introduced clear separation of concerns with dedicated modules for configuration, parsing, conversion, etc.
- Enhanced type annotations and error handling throughout the codebase
- Split monolithic
-
Improved testing infrastructure:
- Renamed and reorganized test files for better clarity:
test_tex2docx_refactored.py→test_unit.py(unit tests for individual components)test_tex2docx.py→test_integration.py(end-to-end integration tests)
- Added comprehensive test documentation in
tests/README.md - Enhanced pytest configuration with proper markers and test discovery
- Renamed and reorganized test files for better clarity:
-
Critical bug fixes:
- Fixed LaTeX reference line break issue where
\ref{}commands were incorrectly split as\nef{} - Resolved CLI import errors with proper module structure
- Enhanced reference numbering accuracy for tables, figures, and equations
- Fixed LaTeX reference line break issue where
-
Developer experience improvements:
- Better project structure with clear module boundaries
- Comprehensive documentation updates
- Cleaner development workflow with organized test suites
- Preserved all existing functionality while improving code quality
v1.2.4
-
Add support for
\includein LaTeX texfiles. (#3) -
Enhanced the display of figures and tables for better formatting and presentation.
-
Fixed conflicts between
cmandvarwidthin tables. -
Resolved conflict issues between
subfigandvarwidth.
v1.2.3
- Add feature and option to fix table (issue #2).
v1.2.2
- fix comments bug #1.
v1.2.1
- Improved default value settings, including built-in Word style templates and ieee.csl (used as default values).
v1.2.0
- Fixed module import issues, improving stability.
- Enhanced the command-line tool for a more intuitive and efficient user experience.
- Switched to
pyproject.tomlfor dependency management, replacingsetup.py. - Released on PyPI; users can install via
pip install tex2docx.
Miscellaneous
There are two kinds of people: those who use LaTeX and those who don't. The latter often ask the former for a Word version. Hence, the following command:
pandoc input.tex -o output.docx\
--filter pandoc-crossref \
--reference-doc=my_temp.docx \
--number-sections \
-M autoEqnLabels -M tableEqns \
-M reference-section-title=Reference \
--bibliography=my_ref.bib \
--citeproc --csl ieee.csl \
-t docx+native_numbering
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tex2docx-1.3.0.tar.gz.
File metadata
- Download URL: tex2docx-1.3.0.tar.gz
- Upload date:
- Size: 3.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4700c5fd2352bf5bbd12d1c999fff0d49b22e36c62ab4e8a2e5f815ffa77d87
|
|
| MD5 |
51031bf33094b809512e80db89c04d9b
|
|
| BLAKE2b-256 |
217273464569be6a33cf184c4591f28223f130c8c451927ed80fc1b28e35665a
|
File details
Details for the file tex2docx-1.3.0-py3-none-any.whl.
File metadata
- Download URL: tex2docx-1.3.0-py3-none-any.whl
- Upload date:
- Size: 325.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41d10b1b710400e5d5eb77cbca845db2d5f975f1c010c162431b52e41d8272dd
|
|
| MD5 |
e97a1607a2326210e55a27af8897bb8b
|
|
| BLAKE2b-256 |
e5baf804a1398e0811f8f7ab9f7953ded363101035ebe4b383167a6edae0ddb6
|