Skip to main content

A project to convert LaTeX to DOCX

Project description

LaTeX to Word Conversion Tool

PyPI DownloadsPyPI VersionPyPI License

简体中文

In daily work, colleagues or supervisors unfamiliar with LaTeX may request Word documents for review and collaboration. This project provides a Python script that uses Pandoc and Pandoc-Crossref to automatically convert LaTeX files into Word documents following a specified format. Although there's no perfect method for converting LaTeX to Word, the output generated by this project meets informal review needs. However, around 5% of the content (such as author information) may require manual corrections post-conversion.

Project GitHub

Features

  • Supports formula conversion
  • Automatically numbers and cross-references images, tables, formulas, and citations
  • Converts multi-figure LaTeX files
  • Outputs Word files in a specified format
  • Supports Chinese

Examples are shown below; more results are in tests:

Quick Start

Ensure Pandoc and Pandoc-Crossref are correctly installed (see Install Dependencies). Execute the following command in your terminal:

tex2docx convert --input-texfile <your_texfile> --output-docxfile <your_docxfile> 

Replace <...> in the command with the corresponding file path and name.

Install Dependencies

Ensure you have installed Pandoc, Pandoc-Crossref, and related Python libraries.

Pandoc

Install Pandoc as described in the official documentation. It is recommended to download the latest release from Pandoc Releases.

Pandoc-Crossref

Install Pandoc-Crossref by following the official documentation. Ensure compatibility between Pandoc and Pandoc-Crossref and configure the path correctly.

Python Libraries

Install from PyPI:

pip install tex2docx

Usage and Examples

This tool supports both command-line and script-based usage. Ensure the required dependencies are installed.

Command-Line Usage

Run the following command in your terminal:

tex2docx convert --input-texfile <your_texfile> --output-docxfile <your_docxfile> --reference-docfile <your_reference_docfile> --bibfile <your_bibfile> --cslfile <your_cslfile>

Use convert --help to see details on these parameters.

For example, using tests/en:

convert --input-texfile ./tests/en/main.tex --output-docxfile ./tests/en/main_cli.docx --reference-docfile ./my_temp.docx --bibfile ./tests/ref.bib --cslfile ./ieee.csl

This will generate the Word file main_cli.docx in the tests/en directory.

Script Usage

from tex2docx import LatexToWordConverter

config = {
    'input_texfile': '<your_texfile>',
    'output_docxfile': '<your_docxfile>',
    'reference_docfile': '<your_reference_docfile>',
    'cslfile': '<your_cslfile>',
    'bibfile': '<your_bibfile>',
    'fix_table': True,
    'debug': False
}

converter = LatexToWordConverter(**config)
converter.convert()

For more examples, refer to tests/test_tex2docx.py.

FAQ

  1. Inconsistent Multi-Figure Layout
    The relative positions of sub-figures may differ between LaTeX compilation and Word conversion, as shown below:

    This may result from redefined page size or parameters in the LaTeX file. To address this, adjust the MULTIFIG_TEXFILE_TEMPLATE variable. Below is an example for reference:

    import tex2docx
    
    my_multifig_texfile_template = r"""
    \documentclass[preview,convert,convert={outext=.png,command=\unexpanded{pdftocairo -r 600 -png \infile}}]{standalone}
    \usepackage{graphicx}
    \usepackage{subfig}
    \usepackage{xeCJK}
    \usepackage{geometry}
    \newgeometry{
        top=25.4mm, bottom=33.3mm, left=20mm, right=20mm,
        headsep=10.4mm, headheight=5mm, footskip=7.9mm,
    }
    \graphicspath{{%s}}
    
    \begin{document}
    \thispagestyle{empty}
    %s
    \end{document}
    """
    
    config = {
        'input_texfile': 'tests/en/main.tex',
        'output_docxfile': 'tests/en/main.docx',
        'reference_docfile': 'my_temp.docx',
        'cslfile': 'ieee.csl',
        'bibfile': 'tests/ref.bib',
        'multifig_texfile_template': my_multifig_texfile_template,
    }
    
    converter = tex2docx.LatexToWordConverter(**config)
    converter.convert()
    
  2. The Word Output Doesn't Meet Formatting Requirements
    Use Word's style management tools to adjust the styles in my_temp.docx.

Implementation Details

This project relies on Pandoc and Pandoc-Crossref to convert LaTeX files to Word documents. The core command used is:

pandoc texfile -o docxfile \
    --lua-filter resolve_equation_labels.lua \
    --filter pandoc-crossref \
    --reference-doc=temp.docx \
    --number-sections \
    -M autoEqnLabels \
    -M tableEqns \
    -M reference-section-title=Reference \
    --bibliography=ref.bib \
    --citeproc --csl ieee.csl \
    -t docx+native_numbering
  1. --lua-filter resolve_equation_labels.lua handles equation numbering and cross-references, inspired by Constantin Ahlmann-Eltze's script.
  2. --filter pandoc-crossref handles cross-references for other elements.
  3. --reference-doc=my_temp.docx applies the styles from my_temp.docx to the generated Word file. Two template files are included: TIE-temp.docx (for TIE journal submission, double-column format) and my_temp.docx (single-column, designed for easier annotation).
  4. --number-sections adds numbering to section headings.
  5. -M autoEqnLabels, -M tableEqns enable automatic numbering of equations and tables.
  6. -M reference-section-title=Reference adds a section title for references.
  7. --bibliography=my_ref.bib generates the bibliography from ref.bib.
  8. --citeproc --csl ieee.csl formats citations and the bibliography using the IEEE citation style.
  9. -t docx+native_numbering improves captions for images and tables.

The conversion for multi-figure LaTeX content may not be perfect. This project extracts multi-figure code from the LaTeX file and uses the convert and pdftocairo tools to compile the figures into a single large PNG file, replacing the original LaTeX image code and updating references to ensure smooth import into Word.

Known Issues

  1. Captions for figures and tables in Chinese still start with "Figure" and "Table".
  2. Author information is not fully converted.

Changelog

v1.2.4

  1. Add support for \include in LaTeX texfiles. (#3)

  2. Enhanced the display of figures and tables for better formatting and presentation.

  3. Fixed conflicts between cm and varwidth in tables.

  4. Resolved conflict issues between subfig and varwidth.

v1.2.3

  1. Add feature and option to fix table (issue #2).

v1.2.2

  1. fix comments bug #1.

v1.2.1

  1. Improved default value settings, including built-in Word style templates and ieee.csl (used as default values).

v1.2.0

  1. Fixed module import issues, improving stability.
  2. Enhanced the command-line tool for a more intuitive and efficient user experience.
  3. Switched to pyproject.toml for dependency management, replacing setup.py.
  4. Released on PyPI; users can install via pip install tex2docx.

Miscellaneous

There are two kinds of people: those who use LaTeX and those who don't. The latter often ask the former for a Word version. Hence, the following command:

pandoc input.tex -o output.docx\
  --filter pandoc-crossref \
  --reference-doc=my_temp.docx \
  --number-sections \
  -M autoEqnLabels -M tableEqns \
  -M reference-section-title=Reference \
  --bibliography=my_ref.bib \
  --citeproc --csl ieee.csl \
  -t docx+native_numbering

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tex2docx-1.2.4.tar.gz (12.5 MB view details)

Uploaded Source

Built Distribution

tex2docx-1.2.4-py3-none-any.whl (299.9 kB view details)

Uploaded Python 3

File details

Details for the file tex2docx-1.2.4.tar.gz.

File metadata

  • Download URL: tex2docx-1.2.4.tar.gz
  • Upload date:
  • Size: 12.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for tex2docx-1.2.4.tar.gz
Algorithm Hash digest
SHA256 802863185b31adfc2648824f281f3ef0e13aa13861eb4cfa60bffbe77a20966d
MD5 7abaf04827c4a4495bf0b578c52866d5
BLAKE2b-256 310479211708564276f3c6bdc95b6cf85128b89dfa4dea25bd373f12a4d4785b

See more details on using hashes here.

File details

Details for the file tex2docx-1.2.4-py3-none-any.whl.

File metadata

  • Download URL: tex2docx-1.2.4-py3-none-any.whl
  • Upload date:
  • Size: 299.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for tex2docx-1.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5be46f9789a4c041948feeffe0a7f5b7bde2809ff76d13440d14ee1000dcc3b7
MD5 de1ae38fd70a26cedbd0db31db9e015a
BLAKE2b-256 3d8f20376f3e516050d0c04e0499ddf1efc5b4974fd9d81e3b8ebdc11cce3d74

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page