Skip to main content

A fast DOCX-to-PDF converter powered by Skia, written in Rust

Project description

dxpdf — Fast DOCX to PDF Converter in Rust

Convert Microsoft Word DOCX files to PDF without Microsoft Office, LibreOffice, or any cloud API.

dxpdf is an open-source, standalone DOCX-to-PDF conversion engine written in Rust and powered by Skia. It reads .docx files and produces high-fidelity PDF output — preserving text formatting, tables, images, headers, footers, hyperlinks, and page layout. Available as a CLI tool, a Rust library, and a Python package.

Crates.io Documentation License: MIT

Built by nerdy.pro.


Key Features

  • Blazing fast — converts multi-page documents in under 100 ms on modern hardware
  • High fidelity — Flutter-inspired measure → layout → paint pipeline with pixel-accurate baseline positioning
  • Type-safe — compile-time dimensional type system (Twips, Pt, Emu) prevents unit mixing bugs
  • Standalone — no Office installation, no LibreOffice, no external services needed
  • Cross-platform — runs natively on macOS, Linux, and Windows
  • Three interfaces — use as a CLI tool, Rust library (use dxpdf;), or Python package (import dxpdf)
  • ISO 29500 compliant — validated against the Office Open XML specification

Installation

Command-Line Tool

cargo install dxpdf

Rust Library

Add to your Cargo.toml:

[dependencies]
dxpdf = "0.2.20"

Python Package

pip install dxpdf

Usage

CLI — Convert DOCX to PDF from the Terminal

dxpdf input.docx                  # produces input.pdf
dxpdf input.docx -o output.pdf    # specify output path

Rust — Convert DOCX to PDF Programmatically

let docx_bytes = std::fs::read("document.docx")?;
let pdf_bytes = dxpdf::convert(&docx_bytes)?;
std::fs::write("output.pdf", &pdf_bytes)?;

You can also inspect or transform the parsed document model before conversion:

use dxpdf::{docx, model, render};

let document = docx::parse(&std::fs::read("document.docx")?)?;

for block in &document.blocks {
    match block {
        model::Block::Paragraph(p) => { /* inspect paragraph content */ }
        model::Block::Table(t) => { /* inspect table structure */ }
    }
}

let pdf_bytes = render::render(&document)?;

Python — Convert DOCX to PDF in Python

import dxpdf

# Bytes in, bytes out
pdf_bytes = dxpdf.convert(open("input.docx", "rb").read())

# File path to file path
dxpdf.convert_file("input.docx", "output.pdf")

Supported DOCX Features

dxpdf handles the most common DOCX features found in real-world business documents, reports, and forms:

Category Features
Text formatting Bold, italic, underline, highlighting, font size/family/color, character spacing, superscript/subscript, run shading
Paragraphs Alignment (left/center/right), spacing (before/after/line with auto/exact/atLeast), indentation, tab stops, paragraph borders, paragraph shading
Tables Column widths, cell margins (3-level cascade), merged cells (gridSpan + vMerge), row heights, borders, cell shading, nested tables
Images Inline images (PNG, JPEG, BMP, WebP), floating/anchored images with alignment and percentage-based positioning
Styles Paragraph and character styles, basedOn inheritance, document defaults, theme fonts
Headers & footers Text, images, page numbers via PAGE/NUMPAGES field codes
Lists Bullets, decimal, lower/upper letter, lower/upper roman numbering with counter tracking
Hyperlinks Clickable PDF link annotations with URL resolution
Page layout Multiple page sizes/margins, section breaks, portrait and landscape orientation
Pagination Automatic page breaking, word wrapping, line spacing modes, floating image text flow

Performance Benchmarks

Benchmarked on Apple M3 Max with hyperfine (20 runs, 3 warmup):

Document type Pages Conversion time Memory usage
Short form with tables and images 2 48 ms 20 MB
Multi-page report 7 52 ms 24 MB
Image-heavy document (60+ images) 24 353 ms 76 MB

dxpdf processes most business documents in under 100 ms, making it suitable for batch processing, server-side conversion, and CI/CD pipelines.

Building from Source

Prerequisites

  • Rust toolchain (1.70+)

  • clang (required by skia-safe for building Skia bindings)

  • Linux only: libfontconfig1-dev and libfreetype-dev

    sudo apt-get install -y libfontconfig1-dev libfreetype-dev
    

Build

cargo build --release

The release binary will be at target/release/dxpdf.

Run Tests

cargo test

Architecture

dxpdf follows a measure → layout → paint pipeline inspired by Flutter's rendering model:

DOCX (ZIP) → Parse → Document Model → Measure → Layout → Paint → PDF
             Twips/Emu/HalfPoints       ←── Pt throughout ──→   Skia

Type-safe dimensions flow through the entire pipeline: OOXML units (Twips, Emu, HalfPoints) in the parsed model, Pt (typographic points) in layout, and f32 only at the Skia rendering boundary.

Each layout element (paragraphs, table cells, headers/footers) goes through three phases:

  1. Measure — collect text fragments, fit lines, produce draw commands with relative coordinates
  2. Layout — assign absolute positions, handle page breaks, distribute heights (e.g., vertically merged cells)
  3. Paint — emit draw commands at final positions (shading → content → borders)

Module Overview

Module Purpose
dimension Type-safe dimensional units (Twips, HalfPoints, EighthPoints, Emu, Pt) with compile-time unit safety
geometry Spatial types (Offset, Size, Rect, EdgeInsets, LineSegment) — generic over unit, with Skia interop
model Algebraic data types representing the full document tree (Document, Block, Inline, etc.)
docx DOCX ZIP extraction, declarative serde-based XML parser, style and numbering resolution
render/layout Measure → layout → paint pipeline: fragment-based line fitting, paragraph layout, three-pass table layout, header/footer handling
render/painter Skia canvas operations for PDF output
render/fonts Font resolution with metric-compatible substitution (e.g., Calibri → Carlito, Cambria → Caladea)

OOXML Feature Coverage

Validated against ISO 29500 (Office Open XML). 37 features fully implemented, 9 partial, 13 planned.

Full feature matrix (click to expand)

Text Formatting (w:rPr)

Feature Status
Bold, italic ✅ with toggle support
Underline ✅ font-proportional stroke width
Font size, family, color
Superscript/subscript
Character spacing
Character scaling (w:w horizontal compression/expansion)
Run shading
Strikethrough ⚠️ parsed, not yet rendered
Highlighting ✅ full ST_HighlightColor palette
Caps, smallCaps
Shadow, outline, emboss, imprint
Hidden text

Paragraph Properties (w:pPr)

Feature Status
Alignment (left, center, right)
Alignment (justify) ⚠️ parsed, renders left-aligned
Spacing before/after, line spacing ✅ auto/exact/atLeast
Indentation (left, right, first-line, hanging)
Tab stops (left)
Tab stops (center, right)
Tab stops (decimal) ⚠️ rendered as left-aligned
Paragraph shading
Paragraph borders ✅ with adjacent border merging, w:space offset
Keep with next, widow/orphan control

Styles

Feature Status
Paragraph styles, character styles
basedOn inheritance
Document defaults, theme fonts

Tables

Feature Status
Grid columns, cell widths (dxa)
Cell widths (pct, auto) ⚠️ fall back to grid
Cell margins (3-level cascade)
Merged cells (gridSpan, vMerge)
Row heights ✅ min / ⚠️ exact treated as min
Table borders (per-cell, per-table)
Border styles (single)
Border styles (double, dashed, dotted) ⚠️ render as single
Cell shading (solid)
Cell shading (patterns)
Vertical alignment (top / center / bottom) ✅ incl. vMerge-aware bottom alignment
Row splitting across page breaks ✅ §17.4.1 cantSplit honored
Nested tables

Images

Feature Status
Inline images ✅ PNG, JPEG, BMP, WebP
Floating images ✅ offset, align, wp14:pctPos
Wrap modes ✅ none/square/tight/through
VML images

Page Layout

Feature Status
Page size and orientation
Page margins (all 6)
Section breaks (nextPage)
Section breaks (continuous) ✅ continues on current page
Section breaks (even, odd) ⚠️ treated as nextPage
Multi-column, page borders, doc grid

Headers & Footers

Feature Status
Default header/footer
First page, even/odd, per-section

Lists

Feature Status
Bullet, decimal, letter, roman
Multi-level lists ⚠️ levels parsed, nesting limited

Fields

Feature Status
PAGE, NUMPAGES
Hyperlinks ✅ clickable PDF annotations
Unknown fields ✅ cached value fallback
TOC, MERGEFIELD, DATE

Other

Feature Status
Footnotes/endnotes ❌ warned
Comments, tracked changes ❌ / ⚠️
DrawingML shapes (preset geometry, fill, stroke, outer shadow) ⚠️ Tier 0-1 coverage
Text boxes, SmartArt, charts
RTL text, automatic hyphenation

Dependencies

Crate Purpose
quick-xml Declarative XML parsing via serde deserializers
zip DOCX ZIP archive reading
skia-safe PDF rendering, text measurement, link annotations
clap CLI argument parsing
thiserror Error types
log + env_logger Logging for unsupported features (RUST_LOG=warn)
pyo3 (optional) Python bindings via maturin

Frequently Asked Questions

How do I convert a DOCX file to PDF?

Install dxpdf with cargo install dxpdf, then run dxpdf input.docx. The PDF will be created in the same directory. You can also specify an output path with -o output.pdf.

Does dxpdf require Microsoft Office or LibreOffice?

No. dxpdf is a standalone converter that reads DOCX files directly and renders PDF output using Skia. No Office installation or external service is needed.

Can I use dxpdf as a library in my Rust or Python project?

Yes. In Rust, add dxpdf as a dependency and call dxpdf::convert(&docx_bytes). In Python, install with pip install dxpdf and call dxpdf.convert(bytes) or dxpdf.convert_file("input.docx", "output.pdf").

What DOCX features are supported?

dxpdf supports text formatting, paragraphs, tables (including nested and merged cells), inline and floating images, styles with inheritance, headers/footers, lists, hyperlinks, section breaks, and automatic pagination. See the full feature matrix above.

How fast is dxpdf?

On Apple M3 Max, dxpdf converts a typical multi-page business document in under 100 ms. A 24-page image-heavy document takes about 350 ms. It is designed for batch processing and server-side use.

What platforms does dxpdf support?

dxpdf runs on macOS, Linux, and Windows. On Linux, you need libfontconfig1-dev and libfreetype-dev installed.

Used By

Contributing

Contributions are welcome. Please open an issue before submitting large PRs.

Built by nerdy.pro.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dxpdf-0.2.20.tar.gz (409.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dxpdf-0.2.20-cp38-abi3-win_amd64.whl (4.5 MB view details)

Uploaded CPython 3.8+Windows x86-64

dxpdf-0.2.20-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

dxpdf-0.2.20-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

dxpdf-0.2.20-cp38-abi3-macosx_11_0_arm64.whl (4.6 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

dxpdf-0.2.20-cp38-abi3-macosx_10_12_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file dxpdf-0.2.20.tar.gz.

File metadata

  • Download URL: dxpdf-0.2.20.tar.gz
  • Upload date:
  • Size: 409.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for dxpdf-0.2.20.tar.gz
Algorithm Hash digest
SHA256 447bf4e8dde6b59cb106bf9ad35d03ec1542b3826f59915ce1e7facab5cf06f7
MD5 9884c43ae94d9631cbc3f8045d6da5a4
BLAKE2b-256 766e1ff0381089d430fed59ec090f403de386b56d5b105c4eb3ae54d897a4150

See more details on using hashes here.

File details

Details for the file dxpdf-0.2.20-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: dxpdf-0.2.20-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 4.5 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for dxpdf-0.2.20-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 7aef1f9d9715f52f5703639fac0289ef924ac045864412a0d136a6192be72be6
MD5 38e85dacfafc4c15e5084ebfda55c43d
BLAKE2b-256 fe62c6e9056729f942359335dbb926df1b2b39b6035fe385c09529e720beb1ab

See more details on using hashes here.

File details

Details for the file dxpdf-0.2.20-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dxpdf-0.2.20-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 090f1377bf4b66a400b26148df8959381cb8a8cfb772623a3a8b8d1c70f46cc7
MD5 30ac87ebfdd484ea89a3174bb3457c9e
BLAKE2b-256 2a01e89378914bda9d6ff993bbd8b5e4f81386681e1b5ae4325497db45d876d7

See more details on using hashes here.

File details

Details for the file dxpdf-0.2.20-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for dxpdf-0.2.20-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4991f205ceadd4c478ade877d8ea3840fec49eb46ce3403653c79c402799c8a1
MD5 294966cf961954263ffbfc1e42fd5595
BLAKE2b-256 0badf02eb37d324985a68789aefb50c16116f328ad7688bdff73f6e87dfbc41f

See more details on using hashes here.

File details

Details for the file dxpdf-0.2.20-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dxpdf-0.2.20-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 753e5f2c171f7b9a85a312df097fcc5e683de8e374ccc6a0631a96cc1452d32d
MD5 e680464da90c86cb2c70bfc0986ab171
BLAKE2b-256 49ade504f8a87582f6851af735a5765a29b545773f3842478120cec832d79256

See more details on using hashes here.

File details

Details for the file dxpdf-0.2.20-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dxpdf-0.2.20-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9b3c34cd138f0d9824f7004ffbbd51358c14d789d1ccb67b9f41f524703f7d41
MD5 574c41c0e155bacfbf75d668b97fe881
BLAKE2b-256 1ccd4abb6fa1736401f3a5d9affb6e584294477700dfeeb66463b1634953fd98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page