Skip to main content

A fast DOCX-to-PDF converter powered by Skia, written in Rust

Project description

dxpdf

A fast, lightweight DOCX-to-PDF converter written in Rust, powered by Skia.

Convert Microsoft Word .docx files to high-fidelity PDF documents — from the command line or as a Rust library. No Microsoft Office, LibreOffice, or cloud API required.

Built by nerdy.pro.

Why dxpdf?

  • Fast — converts a 7-page report in ~48ms on Apple Silicon
  • Accurate — Flutter-inspired measure→layout→paint pipeline with proper baseline positioning
  • Type-safe — dimensional type system (Twips, Pt, Emu) prevents unit confusion at compile time
  • Standalone — no external dependencies beyond Skia; no Office installation needed
  • Cross-platform — runs on macOS, Linux, and Windows
  • Dual-use — works as a CLI tool, Rust library (use dxpdf;), or Python package (import dxpdf)

Quick Start

Install

cargo install dxpdf

Convert a file

dxpdf input.docx                  # outputs input.pdf
dxpdf input.docx -o output.pdf    # specify output path

Use as a library

let docx_bytes = std::fs::read("document.docx")?;
let pdf_bytes = dxpdf::convert(&docx_bytes)?;
std::fs::write("output.pdf", &pdf_bytes)?;

You can also inspect or transform the parsed document model:

use dxpdf::{parse, model};

let document = parse::parse(&std::fs::read("document.docx")?)?;

for block in &document.blocks {
    match block {
        model::Block::Paragraph(p) => { /* p: &Box<Paragraph> */ }
        model::Block::Table(t) => { /* t: &Box<Table> */ }
    }
}

let pdf_bytes = dxpdf::convert_document(&document)?;

Use from Python

pip install dxpdf
import dxpdf

# Bytes in, bytes out
pdf_bytes = dxpdf.convert(open("input.docx", "rb").read())

# File to file
dxpdf.convert_file("input.docx", "output.pdf")

What's Supported

dxpdf handles the most common DOCX features used in real-world business documents:

Category Features
Text Bold, italic, underline, font size/family/color, character spacing, superscript/subscript, run shading
Paragraphs Alignment (left/center/right), spacing (before/after/line with auto/exact/atLeast), indentation, tab stops, paragraph borders, paragraph shading
Tables Column widths, cell margins (3-level cascade), merged cells (gridSpan + vMerge with height distribution), row heights, borders, cell shading, nested tables
Images Inline (PNG/JPEG/BMP/WebP), floating/anchored with alignment and percentage-based positioning
Styles Paragraph styles, character styles (including built-in Hyperlink), basedOn inheritance, document defaults, theme fonts
Headers/Footers Text, images, page numbers (PAGE/NUMPAGES field codes)
Lists Bullets, decimal, lower/upper letter, lower/upper roman with counter tracking
Hyperlinks Clickable PDF link annotations with URL resolution from relationships
Sections Multiple page sizes/margins, section breaks, portrait/landscape
Layout Automatic pagination, word wrapping at spaces and hyphens, line spacing modes, floating image text flow

Building from Source

Prerequisites

  • Rust toolchain (1.70+)
  • clang (required by skia-safe for building Skia bindings)
  • Linux only: libfontconfig1-dev and libfreetype-dev (e.g., sudo apt-get install -y libfontconfig1-dev libfreetype-dev)

Build

cargo build --release

The release binary will be at target/release/dxpdf.

Architecture

dxpdf follows a measure→layout→paint pipeline inspired by Flutter's rendering model:

DOCX (ZIP) → Parse → Document Model → Measure → Layout → Paint → Skia PDF
             Twips/Emu/HalfPoints       ←── Pt throughout ──→      f32

Type-safe dimensions flow through the entire pipeline: OOXML units (Twips, Emu, HalfPoints) in the model, Pt (typographic points) in layout, and f32 only at the Skia rendering boundary.

Each layout element (paragraphs, table cells, headers/footers) goes through three phases:

  1. Measure — collect fragments, fit lines, produce draw commands with relative coordinates
  2. Layout — assign absolute positions, handle page breaks, distribute heights (e.g., vMerge spans)
  3. Paint — emit draw commands at final positions (shading → content → borders)

Modules

Module Description
dimension Type-safe dimensional units: Twips, HalfPoints, EighthPoints, Emu, Pt with compile-time unit safety
geometry Spatial types: Offset, Size, Rect, EdgeInsets, LineSegment — generic over unit, with Skia interop
model Algebraic data types representing the document tree (Document, Block, Inline, etc.)
parse DOCX ZIP extraction, event-driven XML parser, style/numbering resolution
render/layout Measure→layout→paint pipeline: fragment (shared line fitting), paragraph, table (three-pass), header_footer
render/painter Skia canvas operations for PDF output — the only f32 unwrap boundary
render/fonts Font resolution: tries requested font first, falls back to metric-compatible substitutes
units String constants and rendering defaults

Running Tests

cargo test

The test suite includes 184 unit tests, 59 API compatibility tests, and 9 integration tests covering dimensions, geometry, layout, tables, lists, floats, headers/footers, hyperlinks, superscript/subscript, field codes, and end-to-end conversion.

Visual regression tests compare rendered PDFs against Word-generated references using pixel matching (see VISUAL_COMPARISON.md).

OOXML Feature Coverage

Validated against ISO 29500 (Office Open XML). 35 features fully implemented, 6 partial, 15 not implemented.

Full feature matrix (click to expand)

Text Formatting (w:rPr)

Feature Status
Bold, italic ✅ with toggle support
Underline ✅ font-proportional stroke width
Font size, family, color
Superscript/subscript
Character spacing
Run shading
Strikethrough
Highlighting
Caps, smallCaps
Shadow, outline, emboss, imprint
Hidden text

Paragraph Properties (w:pPr)

Feature Status
Alignment (left, center, right)
Alignment (justify) ⚠️ parsed, renders left-aligned
Spacing before/after, line spacing ✅ auto/exact/atLeast
Indentation (left, right, first-line, hanging)
Tab stops (left)
Tab stops (center, right, decimal) ⚠️ parsed, render as left
Paragraph shading
Paragraph borders ✅ with adjacent border merging, w:space offset
Keep with next, widow/orphan control

Styles

Feature Status
Paragraph styles, character styles
basedOn inheritance
Document defaults, theme fonts

Tables

Feature Status
Grid columns, cell widths (dxa)
Cell widths (pct, auto) ⚠️ fall back to grid
Cell margins (3-level cascade)
Merged cells (gridSpan, vMerge)
Row heights ✅ min / ⚠️ exact treated as min
Table borders (per-cell, per-table)
Border styles (single)
Border styles (double, dashed, dotted) ⚠️ render as single
Cell shading (solid)
Cell shading (patterns), vertical alignment
Nested tables

Images

Feature Status
Inline images ✅ PNG, JPEG, BMP, WebP
Floating images ✅ offset, align, wp14:pctPos
Wrap modes ✅ none/square/tight/through
VML images

Page Layout

Feature Status
Page size and orientation
Page margins (all 6)
Section breaks (nextPage)
Section breaks (continuous, even, odd) ⚠️ treated as nextPage
Multi-column, page borders, doc grid

Headers/Footers

Feature Status
Default header/footer
First page, even/odd, per-section

Lists

Feature Status
Bullet, decimal, letter, roman
Multi-level lists ⚠️ levels parsed, nesting limited

Fields

Feature Status
PAGE, NUMPAGES
Hyperlinks ✅ clickable PDF annotations
Unknown fields ✅ cached value fallback
TOC, MERGEFIELD, DATE

Other

Feature Status
Footnotes/endnotes ❌ warned
Comments, tracked changes ❌ / ⚠️
Text boxes, shapes, SmartArt, charts
RTL text, automatic hyphenation

Performance

Benchmarked on Apple M3 Max with hyperfine (20 runs, 3 warmup):

Document Pages Mean time Peak RSS
2-page form (11 tables, 2 images) 2 48 ms 20 MB
7-page inspection report 7 52 ms 24 MB
24-page product sheet (61 images) 24 353 ms 76 MB

See BENCHMARKS.md for full history.

Dependencies

Crate Purpose
quick-xml Event-driven XML parsing
zip DOCX ZIP archive reading
skia-safe PDF rendering, text measurement, link annotations
clap CLI argument parsing
thiserror Error types
log + env_logger Warnings for unsupported features (RUST_LOG=warn)
pyo3 (optional) Python bindings via maturin

Contributing

Contributions are welcome. Please open an issue before submitting large PRs.

Built by nerdy.pro.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dxpdf-0.2.0.tar.gz (15.8 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dxpdf-0.2.0-cp312-cp312-macosx_11_0_arm64.whl (2.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

dxpdf-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl (3.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

File details

Details for the file dxpdf-0.2.0.tar.gz.

File metadata

  • Download URL: dxpdf-0.2.0.tar.gz
  • Upload date:
  • Size: 15.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for dxpdf-0.2.0.tar.gz
Algorithm Hash digest
SHA256 205e32b5f30e02496986b8384577744e7dc4bb16d85c0008aa398652b3e6e50e
MD5 61fd18fa3eca43683468f238ea610fd4
BLAKE2b-256 778d7eaf0b31d4e44260a820711f98c03697fb7855452138aebea90903d66f9c

See more details on using hashes here.

File details

Details for the file dxpdf-0.2.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dxpdf-0.2.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5a7063a962a1b5ac9e4ae232eb6460e5224c065948ccff5ae2ba4cf801567a54
MD5 c64bf1691f0cfd2c077dd614e04613e3
BLAKE2b-256 f2c6aac74582fe39668800f61918676b00556a3e260b9c01f0237da851344478

See more details on using hashes here.

File details

Details for the file dxpdf-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dxpdf-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 35a13aafb2fa3de154a3bf97b1fe8fb1156368f524cfe36fd9d3c8387b8b3d15
MD5 faa294677c8d0cc3ec036e14367f8bd5
BLAKE2b-256 8029b13940f48a157d66a7ea8b93eabfddff9b44a32ac6fda946c80f67b46eea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page