Skip to main content

A fast DOCX-to-PDF converter powered by Skia, written in Rust

Project description

dxpdf

A fast, lightweight DOCX-to-PDF converter written in Rust, powered by Skia.

Convert Microsoft Word .docx files to high-fidelity PDF documents — from the command line or as a Rust library. No Microsoft Office, LibreOffice, or cloud API required.

Built by nerdy.pro.

Why dxpdf?

  • Fast — converts a 7-page report in ~48ms on Apple Silicon
  • Accurate — Flutter-inspired measure→layout→paint pipeline with proper baseline positioning
  • Type-safe — dimensional type system (Twips, Pt, Emu) prevents unit confusion at compile time
  • Standalone — no external dependencies beyond Skia; no Office installation needed
  • Cross-platform — runs on macOS, Linux, and Windows
  • Dual-use — works as a CLI tool, Rust library (use dxpdf;), or Python package (import dxpdf)

Quick Start

Install

cargo install dxpdf

Convert a file

dxpdf input.docx                  # outputs input.pdf
dxpdf input.docx -o output.pdf    # specify output path

Use as a library

let docx_bytes = std::fs::read("document.docx")?;
let pdf_bytes = dxpdf::convert(&docx_bytes)?;
std::fs::write("output.pdf", &pdf_bytes)?;

You can also inspect or transform the parsed document model:

use dxpdf::{parse, model};

let document = parse::parse(&std::fs::read("document.docx")?)?;

for block in &document.blocks {
    match block {
        model::Block::Paragraph(p) => { /* p: &Box<Paragraph> */ }
        model::Block::Table(t) => { /* t: &Box<Table> */ }
    }
}

let pdf_bytes = dxpdf::convert_document(&document)?;

Use from Python

pip install dxpdf
import dxpdf

# Bytes in, bytes out
pdf_bytes = dxpdf.convert(open("input.docx", "rb").read())

# File to file
dxpdf.convert_file("input.docx", "output.pdf")

What's Supported

dxpdf handles the most common DOCX features used in real-world business documents:

Category Features
Text Bold, italic, underline, font size/family/color, character spacing, superscript/subscript, run shading
Paragraphs Alignment (left/center/right), spacing (before/after/line with auto/exact/atLeast), indentation, tab stops, paragraph borders, paragraph shading
Tables Column widths, cell margins (3-level cascade), merged cells (gridSpan + vMerge with height distribution), row heights, borders, cell shading, nested tables
Images Inline (PNG/JPEG/BMP/WebP), floating/anchored with alignment and percentage-based positioning
Styles Paragraph styles, character styles (including built-in Hyperlink), basedOn inheritance, document defaults, theme fonts
Headers/Footers Text, images, page numbers (PAGE/NUMPAGES field codes)
Lists Bullets, decimal, lower/upper letter, lower/upper roman with counter tracking
Hyperlinks Clickable PDF link annotations with URL resolution from relationships
Sections Multiple page sizes/margins, section breaks, portrait/landscape
Layout Automatic pagination, word wrapping at spaces and hyphens, line spacing modes, floating image text flow

Building from Source

Prerequisites

  • Rust toolchain (1.70+)
  • clang (required by skia-safe for building Skia bindings)
  • Linux only: libfontconfig1-dev and libfreetype-dev (e.g., sudo apt-get install -y libfontconfig1-dev libfreetype-dev)

Build

cargo build --release

The release binary will be at target/release/dxpdf.

Architecture

dxpdf follows a measure→layout→paint pipeline inspired by Flutter's rendering model:

DOCX (ZIP) → Parse → Document Model → Measure → Layout → Paint → Skia PDF
             Twips/Emu/HalfPoints       ←── Pt throughout ──→      f32

Type-safe dimensions flow through the entire pipeline: OOXML units (Twips, Emu, HalfPoints) in the model, Pt (typographic points) in layout, and f32 only at the Skia rendering boundary.

Each layout element (paragraphs, table cells, headers/footers) goes through three phases:

  1. Measure — collect fragments, fit lines, produce draw commands with relative coordinates
  2. Layout — assign absolute positions, handle page breaks, distribute heights (e.g., vMerge spans)
  3. Paint — emit draw commands at final positions (shading → content → borders)

Modules

Module Description
dimension Type-safe dimensional units: Twips, HalfPoints, EighthPoints, Emu, Pt with compile-time unit safety
geometry Spatial types: Offset, Size, Rect, EdgeInsets, LineSegment — generic over unit, with Skia interop
model Algebraic data types representing the document tree (Document, Block, Inline, etc.)
parse DOCX ZIP extraction, event-driven XML parser, style/numbering resolution
render/layout Measure→layout→paint pipeline: fragment (shared line fitting), paragraph, table (three-pass), header_footer
render/painter Skia canvas operations for PDF output — the only f32 unwrap boundary
render/fonts Font resolution: tries requested font first, falls back to metric-compatible substitutes
units String constants and rendering defaults

Running Tests

cargo test

The test suite includes 184 unit tests, 59 API compatibility tests, and 9 integration tests covering dimensions, geometry, layout, tables, lists, floats, headers/footers, hyperlinks, superscript/subscript, field codes, and end-to-end conversion.

Visual regression tests compare rendered PDFs against Word-generated references using pixel matching (see VISUAL_COMPARISON.md).

OOXML Feature Coverage

Validated against ISO 29500 (Office Open XML). 35 features fully implemented, 6 partial, 15 not implemented.

Full feature matrix (click to expand)

Text Formatting (w:rPr)

Feature Status
Bold, italic ✅ with toggle support
Underline ✅ font-proportional stroke width
Font size, family, color
Superscript/subscript
Character spacing
Run shading
Strikethrough
Highlighting
Caps, smallCaps
Shadow, outline, emboss, imprint
Hidden text

Paragraph Properties (w:pPr)

Feature Status
Alignment (left, center, right)
Alignment (justify) ⚠️ parsed, renders left-aligned
Spacing before/after, line spacing ✅ auto/exact/atLeast
Indentation (left, right, first-line, hanging)
Tab stops (left)
Tab stops (center, right, decimal) ⚠️ parsed, render as left
Paragraph shading
Paragraph borders ✅ with adjacent border merging, w:space offset
Keep with next, widow/orphan control

Styles

Feature Status
Paragraph styles, character styles
basedOn inheritance
Document defaults, theme fonts

Tables

Feature Status
Grid columns, cell widths (dxa)
Cell widths (pct, auto) ⚠️ fall back to grid
Cell margins (3-level cascade)
Merged cells (gridSpan, vMerge)
Row heights ✅ min / ⚠️ exact treated as min
Table borders (per-cell, per-table)
Border styles (single)
Border styles (double, dashed, dotted) ⚠️ render as single
Cell shading (solid)
Cell shading (patterns), vertical alignment
Nested tables

Images

Feature Status
Inline images ✅ PNG, JPEG, BMP, WebP
Floating images ✅ offset, align, wp14:pctPos
Wrap modes ✅ none/square/tight/through
VML images

Page Layout

Feature Status
Page size and orientation
Page margins (all 6)
Section breaks (nextPage)
Section breaks (continuous, even, odd) ⚠️ treated as nextPage
Multi-column, page borders, doc grid

Headers/Footers

Feature Status
Default header/footer
First page, even/odd, per-section

Lists

Feature Status
Bullet, decimal, letter, roman
Multi-level lists ⚠️ levels parsed, nesting limited

Fields

Feature Status
PAGE, NUMPAGES
Hyperlinks ✅ clickable PDF annotations
Unknown fields ✅ cached value fallback
TOC, MERGEFIELD, DATE

Other

Feature Status
Footnotes/endnotes ❌ warned
Comments, tracked changes ❌ / ⚠️
Text boxes, shapes, SmartArt, charts
RTL text, automatic hyphenation

Performance

Benchmarked on Apple M3 Max with hyperfine (20 runs, 3 warmup):

Document Pages Mean time Peak RSS
2-page form (11 tables, 2 images) 2 48 ms 20 MB
7-page inspection report 7 52 ms 24 MB
24-page product sheet (61 images) 24 353 ms 76 MB

See BENCHMARKS.md for full history.

Dependencies

Crate Purpose
quick-xml Event-driven XML parsing
zip DOCX ZIP archive reading
skia-safe PDF rendering, text measurement, link annotations
clap CLI argument parsing
thiserror Error types
log + env_logger Warnings for unsupported features (RUST_LOG=warn)
pyo3 (optional) Python bindings via maturin

Contributing

Contributions are welcome. Please open an issue before submitting large PRs.

Built by nerdy.pro.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dxpdf-0.1.5.tar.gz (90.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dxpdf-0.1.5-cp312-cp312-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

dxpdf-0.1.5-cp311-cp311-manylinux_2_28_x86_64.whl (3.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

File details

Details for the file dxpdf-0.1.5.tar.gz.

File metadata

  • Download URL: dxpdf-0.1.5.tar.gz
  • Upload date:
  • Size: 90.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for dxpdf-0.1.5.tar.gz
Algorithm Hash digest
SHA256 6dce11ce5bb9f789d938218d6c65b445e677f00a6c7d8fbb963c48e3bb0c4104
MD5 f1f1e3f7fed24554c6079a0e8067f820
BLAKE2b-256 931e4f3b3c1d7ebf1ff3f849a9bd861af687b2a4e867ce1fca1c2d63eaae26bd

See more details on using hashes here.

File details

Details for the file dxpdf-0.1.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dxpdf-0.1.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 553433fdeb0eadf6e39a2861e6c07bb3ab591b216cbdfda995501eae59b363b5
MD5 0d142cce704cde9aa56352a9b109f831
BLAKE2b-256 a9d1f2b2f67577004cf381b50752c2b0fc7d43190e4639dd44c664ae80256a43

See more details on using hashes here.

File details

Details for the file dxpdf-0.1.5-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dxpdf-0.1.5-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 35efe8120c5cef0df36ba1726487e8d20871ca4ed56c46ea7a7419b291e43bf8
MD5 7a3e5ebe1d2dfdf8d03df10f2a8beffa
BLAKE2b-256 598caf9aa55c19ccd57df17e24a915297e1d2fc954cecbb3b5c7ca8404ccd8fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page