Skip to main content

A fast DOCX-to-PDF converter powered by Skia, written in Rust

Project description

dxpdf

A fast, lightweight DOCX-to-PDF converter written in Rust, powered by Skia.

Convert Microsoft Word .docx files to high-fidelity PDF documents — from the command line or as a Rust library. No Microsoft Office, LibreOffice, or cloud API required.

Built by nerdy.pro.

Why dxpdf?

  • Fast — converts a 7-page report in ~52ms on Apple Silicon
  • Accurate — Flutter-inspired measure→layout→paint pipeline with pixel-level fidelity
  • Standalone — no external dependencies beyond Skia; no Office installation needed
  • Cross-platform — runs on macOS, Linux, and Windows
  • Dual-use — works as a CLI tool, Rust library (use dxpdf;), or Python package (import dxpdf)

Quick Start

Install

cargo install dxpdf

Convert a file

dxpdf input.docx                  # outputs input.pdf
dxpdf input.docx -o output.pdf    # specify output path

Use as a library

let docx_bytes = std::fs::read("document.docx")?;
let pdf_bytes = dxpdf::convert(&docx_bytes)?;
std::fs::write("output.pdf", &pdf_bytes)?;

You can also inspect or transform the parsed document model:

use dxpdf::{parse, model};

let document = parse::parse(&std::fs::read("document.docx")?)?;

for block in &document.blocks {
    match block {
        model::Block::Paragraph(p) => { /* ... */ }
        model::Block::Table(t) => { /* ... */ }
    }
}

let pdf_bytes = dxpdf::convert_document(&document)?;

Use from Python

pip install dxpdf
import dxpdf

# Bytes in, bytes out
pdf_bytes = dxpdf.convert(open("input.docx", "rb").read())

# File to file
dxpdf.convert_file("input.docx", "output.pdf")

What's Supported

dxpdf handles the most common DOCX features used in real-world business documents:

Category Features
Text Bold, italic, underline, font size/family/color, character spacing, superscript/subscript, run shading
Paragraphs Alignment (left/center/right), spacing (before/after/line with auto/exact/atLeast), indentation, tab stops, paragraph borders, paragraph shading
Tables Column widths, cell margins (3-level cascade), merged cells (gridSpan + vMerge with height distribution), row heights, borders, cell shading, nested tables
Images Inline (PNG/JPEG/BMP/WebP), floating/anchored with alignment and percentage-based positioning
Styles Paragraph styles, character styles (including built-in Hyperlink), basedOn inheritance, document defaults, theme fonts
Headers/Footers Text, images, page numbers (PAGE/NUMPAGES field codes)
Lists Bullets, decimal, lower/upper letter, lower/upper roman with counter tracking
Hyperlinks Clickable PDF link annotations with URL resolution from relationships
Sections Multiple page sizes/margins, section breaks, portrait/landscape
Layout Automatic pagination, word wrapping at spaces and hyphens, line spacing modes, floating image text flow

Building from Source

Prerequisites

  • Rust toolchain (1.70+)
  • clang (required by skia-safe for building Skia bindings)
  • Linux only: libfontconfig1-dev and libfreetype-dev (e.g., sudo apt-get install -y libfontconfig1-dev libfreetype-dev)

Build

cargo build --release

The release binary will be at target/release/dxpdf.

Architecture

dxpdf follows a measure→layout→paint pipeline inspired by Flutter's rendering model:

DOCX (ZIP) → Parse → Document Model (ADT) → Measure → Layout → Paint → Skia PDF

Each layout element (paragraphs, table cells, headers/footers) goes through three phases:

  1. Measure — collect fragments, fit lines, produce draw commands with relative coordinates
  2. Layout — assign absolute positions, handle page breaks, distribute heights (e.g., vMerge spans)
  3. Paint — emit draw commands at final positions (shading → content → borders)

Modules

Module Description
model Algebraic data types representing the document tree (Document, Block, Inline, etc.)
parse DOCX ZIP extraction, event-driven XML parser, style/numbering resolution
render/layout Measure→layout→paint pipeline: fragment (shared line fitting), paragraph, table (three-pass), header_footer
render/painter Skia canvas operations for PDF output
render/fonts Font resolution: tries requested font first, falls back to metric-compatible substitutes
units OOXML unit conversions (twips, EMUs, half-points) — spec-defined constants only

Running Tests

cargo test

The test suite includes 104 unit tests and 9 integration tests covering layout, tables, lists, floats, headers/footers, hyperlinks, superscript/subscript, field codes, and end-to-end conversion.

Visual regression tests compare rendered PDFs against Word-generated references using pixel matching (see VISUAL_COMPARISON.md).

OOXML Feature Coverage

Validated against ISO 29500 (Office Open XML). 34 features fully implemented, 6 partial, 15 not implemented.

Full feature matrix (click to expand)

Text Formatting (w:rPr)

Feature Status
Bold, italic ✅ with toggle support
Underline ✅ font-proportional stroke width
Font size, family, color
Superscript/subscript
Character spacing
Run shading
Strikethrough
Highlighting
Caps, smallCaps
Shadow, outline, emboss, imprint
Hidden text

Paragraph Properties (w:pPr)

Feature Status
Alignment (left, center, right)
Alignment (justify) ⚠️ parsed, renders left-aligned
Spacing before/after, line spacing ✅ auto/exact/atLeast
Indentation (left, right, first-line, hanging)
Tab stops (left)
Tab stops (center, right, decimal) ⚠️ parsed, render as left
Paragraph shading
Paragraph borders ✅ with adjacent border merging
Keep with next, widow/orphan control

Styles

Feature Status
Paragraph styles, character styles
basedOn inheritance
Document defaults, theme fonts

Tables

Feature Status
Grid columns, cell widths (dxa)
Cell widths (pct, auto) ⚠️ fall back to grid
Cell margins (3-level cascade)
Merged cells (gridSpan, vMerge)
Row heights ✅ min / ⚠️ exact treated as min
Table borders (per-cell, per-table)
Border styles (single)
Border styles (double, dashed, dotted) ⚠️ render as single
Cell shading (solid)
Cell shading (patterns), vertical alignment
Nested tables

Images

Feature Status
Inline images ✅ PNG, JPEG, BMP, WebP
Floating images ✅ offset, align, wp14:pctPos
Wrap modes ✅ none/square/tight/through
VML images

Page Layout

Feature Status
Page size and orientation
Page margins (all 6)
Section breaks (nextPage)
Section breaks (continuous, even, odd) ⚠️ treated as nextPage
Multi-column, page borders, doc grid

Headers/Footers

Feature Status
Default header/footer
First page, even/odd, per-section

Lists

Feature Status
Bullet, decimal, letter, roman
Multi-level lists ⚠️ levels parsed, nesting limited

Fields

Feature Status
PAGE, NUMPAGES
Hyperlinks ✅ clickable PDF annotations
Unknown fields ✅ cached value fallback
TOC, MERGEFIELD, DATE

Other

Feature Status
Footnotes/endnotes ❌ warned
Comments, tracked changes ❌ / ⚠️
Text boxes, shapes, SmartArt, charts
RTL text, automatic hyphenation

Performance

Benchmarked on Apple M3 Max with hyperfine (20 runs, 3 warmup):

Document Pages Mean time Peak RSS
3-page form (11 tables, 2 images) 3 53 ms 19 MB
7-page inspection report 7 52 ms 24 MB
24-page product sheet 24 349 ms 76 MB

See BENCHMARKS.md for full history.

Dependencies

Crate Purpose
quick-xml Event-driven XML parsing
zip DOCX ZIP archive reading
skia-safe PDF rendering, text measurement, link annotations
clap CLI argument parsing
thiserror Error types
log + env_logger Warnings for unsupported features (RUST_LOG=warn)
pyo3 (optional) Python bindings via maturin

Contributing

Contributions are welcome. Please open an issue before submitting large PRs.

Built by nerdy.pro.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dxpdf-0.1.4.tar.gz (78.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dxpdf-0.1.4-cp312-cp312-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

dxpdf-0.1.4-cp311-cp311-manylinux_2_28_x86_64.whl (3.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

File details

Details for the file dxpdf-0.1.4.tar.gz.

File metadata

  • Download URL: dxpdf-0.1.4.tar.gz
  • Upload date:
  • Size: 78.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for dxpdf-0.1.4.tar.gz
Algorithm Hash digest
SHA256 1f5d806ba275760633bf630ef244c0f1cea2cccd22512b3c5d93f41d8aee746d
MD5 a221f2aea09f9fd5e62407b1d426b5ef
BLAKE2b-256 227956c2e78f6716c35b183167a5ecc5f440574a44906d99c28a3ca0f7063049

See more details on using hashes here.

File details

Details for the file dxpdf-0.1.4-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dxpdf-0.1.4-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4eef382665a26c601912fc42bd1fed1fd771413009435b27044ff0d1e021b4ce
MD5 9fb758c7d84a1176351c2fa3478ec7e9
BLAKE2b-256 42a12943c041995bdba9933f1e0292316a440e2c5f1ff1b969517245aa530bae

See more details on using hashes here.

File details

Details for the file dxpdf-0.1.4-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dxpdf-0.1.4-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 531e6a7fe9847fbf71c40192040a0d62015032e413a6393556125beb0ec1eb9e
MD5 71c447eeab4652c7686648e2c8e12ce7
BLAKE2b-256 87e7a17fd5d6f98cf1c63920fe52df2cc45172c42909af5127ec97eb5160c040

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page