Skip to main content

A tool to convert Typst project to Microsoft Word format.

Project description

typ2docx: Convert Math-Rich Typst Project to Microsoft Word Format

typ2docx is a command line tool that converts a Typst project to Microsoft Word .docx format, with tables, cross-references, most of the styles, and most importantly the math markups preserved. It combines the mature, comprehensive document conversion of standard PDF-to-Word tools with Pandoc's high-quality mathematical formula export.

You're encouraged to read this document thoroughly before using it, as I employed many non-trivial hacks for this non-trivial problem! (It involves 6 different programming languages!)

If this tool enhanced your workflow, especially if it helped with your academic publication, please consider crediting this project or sponsoring me. :heart:

[!NOTE]

If you don't care about the quality of math export, this tool is no different from other PDF-to-Word converter.

If your project doesn't use any non-basic features in Typst, try Pandoc first.

Installation

[!NOTE]

You may now try this tool without installing through a web app. Note that it runs on my scarce free-tier API quota, so please install this tool if you intend to use it more than once!

Prerequisite

This tool is distributed via PyPI. Installation via uv is recommended. You may also use pipx or other similar tools to install and run this program.

For more details, read uv's guide on using tools.

Tool Installation

You may execute the following command:

uv tool install typ2docx

[!NOTE]

The package installation process is expected to take some time, since it requires compiling a Rust extension. Read along to know why.

If you want to tinker with this program:

git clone git@github.com:sghng/typ2docx.git
cd typ2docx
# do your modifications...
uv tool install .

Runtime Dependencies

The following runtime dependencies are also required:

  • Pandoc, a universal document converter, should be available in PATH.
  • One of the supported engines as specified in this section.

Usage

Once the tool is installed, invoke it with the path to the entry point of your Typst project and specify an engine to convert it into Microsoft Word .docx format. For example:

typ2docx main.typ -e acrobat

Run typ2docx --help to see the help info on how to use this tool.

PDF -> DOCX Engines

You need to specify the engine used to convert a PDF to .docx file. Currently there are two supported engines:

  • Adobe Acrobat: Pass -e acrobat to use this engine. It uses Acrobat desktop app with some GUI automation to export a PDF to .docx. Either the free Acrobat Reader or the paid Acrobat Pro would work. This is only supported on macOS now.

    [!WARNING]

    GUI automation is quite unstable as of now, as it relies on finding and clicking the right button at the right moment. Launching Acrobat before starting this tool can be helpful. If it doesn't work the first time, retrying a few times might solve it. Be sure to close any dialogs that popped up before retrying.

  • Adobe PDFServices API: Pass -e pdfservices to use this engine. It requires internet connection and valid PDFServices API credentials. This service comes with 500 free conversions per month, which should be enough for most people. You will also need to set PDF_SERVICES_CLIENT_ID and PDF_SERVICES_CLIENT_SECRET for this engine to work. For example:

    PDF_SERVICES_CLIENT_ID=xxx PDF_SERVICES_CLIENT_SECRET=xxx typ2docx main.typ -e pdfservices
    

What It Does and Does Not

There are some known issues -- which may or may not be a real issue depending on your use cases. Read the Motivation section to understand why I built this tool.

  • Text in SVG/PDF images are distorted.
  • Some spaces between inline equations and regular text are missing.
  • Not all stylings are preserved. (This is expected, just like for any file format conversion.)

Similar Tools

  • Adobe Acrobat does a great job in converting PDF to .docx, but the math equations are completely messed up.
  • Microsoft Word can also convert a PDF to .docx format. In my experience, it doesn't work as well as Adobe Acrobat.
  • pdf2docx Python library doesn't work for most of my PDF files.
  • Pandoc provides superb support for math markup when converting .typ file to .docx, but its support for Typst is very limited. For example, it doesn't recognize basic functions like #stroke. It also doesn't support latest features in Typst, such as embedding PDF as image.
  • typlite is a tool developed by the author of tinymist. Its support for conversion to .docx is limited, as it relies on HTML as an intermediary. Styles and cross-references are lost, and math are rendered as images.

Motivation

This tool is developed so that a .docx export that meets the basic requirement of academic paper submission can be produced. These requirements are very loose, since the press has their own process for making a manuscript publication ready.

  • Cross-referencing is NOT required.
  • Figures are NOT required. They can be included as standalone attachments, as long as the names are matched.
  • NO typesetting required.

With these said, the only true requirement is the quality of math equations, which must be retained effectively. And in Microsoft Word, it should be in Office Math Markup Language (OMML).

This tool is developed primarily to address the equation output problem.

Solution

The idea is to export with both Adobe Acrobat and Pandoc, and merge the best part in the two exports together.

  • Branch 1
    • A preamble is injected to the Typst project entry point, so that all math are rendered as markers.
    • Typst compiles the project into PDF file.
    • Adobe Acrobat converts this PDF into .docx format. This process is automated with an AppleScript.
  • Branch 2
    • A Rust lib extracts all math source code in a Typst project.
    • The source code are put into a new Typst source file, in order of their appearance in original document.
    • This source file is converted to .docx with Pandoc, they are cleanly formatted with MathML.
  • The two Microsoft Word files are unpacked. A XLST script merges the document.xml files by examining the markers. The result is finally repacked into a .docx file as output.

The math source code can not be extracted with purely static analysis or regex matching, since the location where a content is defined can be different from where it shows up in document, and multiple source files can be involved via #include and #import. This necessitates the use of typst and typst-eval Rust crates for parsing as well as evaluating the Typst project.

Contribution

You are more than welcome to contribute by raising issues or opening pull requests!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

typ2docx-0.6.0.tar.gz (43.8 kB view details)

Uploaded Source

File details

Details for the file typ2docx-0.6.0.tar.gz.

File metadata

  • Download URL: typ2docx-0.6.0.tar.gz
  • Upload date:
  • Size: 43.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for typ2docx-0.6.0.tar.gz
Algorithm Hash digest
SHA256 0432992b5e728f754274690333f5bdb85d3c046752dea41f9573f480b6eac3a0
MD5 d30d734338db3b452f36a3c0fd31f425
BLAKE2b-256 1e92c6651c6db1927a326e7f055a2986be8b426655c2278bceb5fb96ce3aca74

See more details on using hashes here.

Provenance

The following attestation bundles were made for typ2docx-0.6.0.tar.gz:

Publisher: pypi.yml on sghng/typ2docx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page