A fast DOCX-to-PDF converter powered by Skia, written in Rust
Project description
dxpdf
A fast, lightweight DOCX-to-PDF converter written in Rust, powered by Skia.
Convert Microsoft Word .docx files to high-fidelity PDF documents — from the command line or as a Rust library. No Microsoft Office, LibreOffice, or cloud API required.
Built by nerdy.pro.
Why dxpdf?
- Fast — converts a 7-page report in ~48ms on Apple Silicon
- Accurate — Flutter-inspired measure→layout→paint pipeline with proper baseline positioning
- Type-safe — dimensional type system (Twips, Pt, Emu) prevents unit confusion at compile time
- Standalone — no external dependencies beyond Skia; no Office installation needed
- Cross-platform — runs on macOS, Linux, and Windows
- Dual-use — works as a CLI tool, Rust library (
use dxpdf;), or Python package (import dxpdf)
Quick Start
Install
cargo install dxpdf
Convert a file
dxpdf input.docx # outputs input.pdf
dxpdf input.docx -o output.pdf # specify output path
Use as a library
let docx_bytes = std::fs::read("document.docx")?;
let pdf_bytes = dxpdf::convert(&docx_bytes)?;
std::fs::write("output.pdf", &pdf_bytes)?;
You can also inspect or transform the parsed document model:
use dxpdf::{parse, model};
let document = parse::parse(&std::fs::read("document.docx")?)?;
for block in &document.blocks {
match block {
model::Block::Paragraph(p) => { /* p: &Box<Paragraph> */ }
model::Block::Table(t) => { /* t: &Box<Table> */ }
}
}
let pdf_bytes = dxpdf::convert_document(&document)?;
Use from Python
pip install dxpdf
import dxpdf
# Bytes in, bytes out
pdf_bytes = dxpdf.convert(open("input.docx", "rb").read())
# File to file
dxpdf.convert_file("input.docx", "output.pdf")
What's Supported
dxpdf handles the most common DOCX features used in real-world business documents:
| Category | Features |
|---|---|
| Text | Bold, italic, underline, font size/family/color, character spacing, superscript/subscript, run shading |
| Paragraphs | Alignment (left/center/right), spacing (before/after/line with auto/exact/atLeast), indentation, tab stops, paragraph borders, paragraph shading |
| Tables | Column widths, cell margins (3-level cascade), merged cells (gridSpan + vMerge with height distribution), row heights, borders, cell shading, nested tables |
| Images | Inline (PNG/JPEG/BMP/WebP), floating/anchored with alignment and percentage-based positioning |
| Styles | Paragraph styles, character styles (including built-in Hyperlink), basedOn inheritance, document defaults, theme fonts |
| Headers/Footers | Text, images, page numbers (PAGE/NUMPAGES field codes) |
| Lists | Bullets, decimal, lower/upper letter, lower/upper roman with counter tracking |
| Hyperlinks | Clickable PDF link annotations with URL resolution from relationships |
| Sections | Multiple page sizes/margins, section breaks, portrait/landscape |
| Layout | Automatic pagination, word wrapping at spaces and hyphens, line spacing modes, floating image text flow |
Building from Source
Prerequisites
- Rust toolchain (1.70+)
clang(required byskia-safefor building Skia bindings)- Linux only:
libfontconfig1-devandlibfreetype-dev(e.g.,sudo apt-get install -y libfontconfig1-dev libfreetype-dev)
Build
cargo build --release
The release binary will be at target/release/dxpdf.
Architecture
dxpdf follows a measure→layout→paint pipeline inspired by Flutter's rendering model:
DOCX (ZIP) → Parse → Document Model → Measure → Layout → Paint → Skia PDF
Twips/Emu/HalfPoints ←── Pt throughout ──→ f32
Type-safe dimensions flow through the entire pipeline: OOXML units (Twips, Emu, HalfPoints) in the model, Pt (typographic points) in layout, and f32 only at the Skia rendering boundary.
Each layout element (paragraphs, table cells, headers/footers) goes through three phases:
- Measure — collect fragments, fit lines, produce draw commands with relative coordinates
- Layout — assign absolute positions, handle page breaks, distribute heights (e.g., vMerge spans)
- Paint — emit draw commands at final positions (shading → content → borders)
Modules
| Module | Description |
|---|---|
dimension |
Type-safe dimensional units: Twips, HalfPoints, EighthPoints, Emu, Pt with compile-time unit safety |
geometry |
Spatial types: Offset, Size, Rect, EdgeInsets, LineSegment — generic over unit, with Skia interop |
model |
Algebraic data types representing the document tree (Document, Block, Inline, etc.) |
parse |
DOCX ZIP extraction, event-driven XML parser, style/numbering resolution |
render/layout |
Measure→layout→paint pipeline: fragment (shared line fitting), paragraph, table (three-pass), header_footer |
render/painter |
Skia canvas operations for PDF output — the only f32 unwrap boundary |
render/fonts |
Font resolution: tries requested font first, falls back to metric-compatible substitutes |
units |
String constants and rendering defaults |
Running Tests
cargo test
The test suite includes 184 unit tests, 59 API compatibility tests, and 9 integration tests covering dimensions, geometry, layout, tables, lists, floats, headers/footers, hyperlinks, superscript/subscript, field codes, and end-to-end conversion.
Visual regression tests compare rendered PDFs against Word-generated references using pixel matching (see VISUAL_COMPARISON.md).
OOXML Feature Coverage
Validated against ISO 29500 (Office Open XML). 35 features fully implemented, 6 partial, 15 not implemented.
Full feature matrix (click to expand)
Text Formatting (w:rPr)
| Feature | Status |
|---|---|
| Bold, italic | ✅ with toggle support |
| Underline | ✅ font-proportional stroke width |
| Font size, family, color | ✅ |
| Superscript/subscript | ✅ |
| Character spacing | ✅ |
| Run shading | ✅ |
| Strikethrough | ❌ |
| Highlighting | ❌ |
| Caps, smallCaps | ❌ |
| Shadow, outline, emboss, imprint | ❌ |
| Hidden text | ❌ |
Paragraph Properties (w:pPr)
| Feature | Status |
|---|---|
| Alignment (left, center, right) | ✅ |
| Alignment (justify) | ⚠️ parsed, renders left-aligned |
| Spacing before/after, line spacing | ✅ auto/exact/atLeast |
| Indentation (left, right, first-line, hanging) | ✅ |
| Tab stops (left) | ✅ |
| Tab stops (center, right, decimal) | ⚠️ parsed, render as left |
| Paragraph shading | ✅ |
| Paragraph borders | ✅ with adjacent border merging, w:space offset |
| Keep with next, widow/orphan control | ❌ |
Styles
| Feature | Status |
|---|---|
| Paragraph styles, character styles | ✅ |
basedOn inheritance |
✅ |
| Document defaults, theme fonts | ✅ |
Tables
| Feature | Status |
|---|---|
| Grid columns, cell widths (dxa) | ✅ |
| Cell widths (pct, auto) | ⚠️ fall back to grid |
| Cell margins (3-level cascade) | ✅ |
| Merged cells (gridSpan, vMerge) | ✅ |
| Row heights | ✅ min / ⚠️ exact treated as min |
| Table borders (per-cell, per-table) | ✅ |
| Border styles (single) | ✅ |
| Border styles (double, dashed, dotted) | ⚠️ render as single |
| Cell shading (solid) | ✅ |
| Cell shading (patterns), vertical alignment | ❌ |
| Nested tables | ✅ |
Images
| Feature | Status |
|---|---|
| Inline images | ✅ PNG, JPEG, BMP, WebP |
| Floating images | ✅ offset, align, wp14:pctPos |
| Wrap modes | ✅ none/square/tight/through |
| VML images | ❌ |
Page Layout
| Feature | Status |
|---|---|
| Page size and orientation | ✅ |
| Page margins (all 6) | ✅ |
| Section breaks (nextPage) | ✅ |
| Section breaks (continuous, even, odd) | ⚠️ treated as nextPage |
| Multi-column, page borders, doc grid | ❌ |
Headers/Footers
| Feature | Status |
|---|---|
| Default header/footer | ✅ |
| First page, even/odd, per-section | ❌ |
Lists
| Feature | Status |
|---|---|
| Bullet, decimal, letter, roman | ✅ |
| Multi-level lists | ⚠️ levels parsed, nesting limited |
Fields
| Feature | Status |
|---|---|
| PAGE, NUMPAGES | ✅ |
| Hyperlinks | ✅ clickable PDF annotations |
| Unknown fields | ✅ cached value fallback |
| TOC, MERGEFIELD, DATE | ❌ |
Other
| Feature | Status |
|---|---|
| Footnotes/endnotes | ❌ warned |
| Comments, tracked changes | ❌ / ⚠️ |
| Text boxes, shapes, SmartArt, charts | ❌ |
| RTL text, automatic hyphenation | ❌ |
Performance
Benchmarked on Apple M3 Max with hyperfine (20 runs, 3 warmup):
| Document | Pages | Mean time | Peak RSS |
|---|---|---|---|
| 2-page form (11 tables, 2 images) | 2 | 48 ms | 20 MB |
| 7-page inspection report | 7 | 52 ms | 24 MB |
| 24-page product sheet (61 images) | 24 | 353 ms | 76 MB |
See BENCHMARKS.md for full history.
Dependencies
| Crate | Purpose |
|---|---|
quick-xml |
Event-driven XML parsing |
zip |
DOCX ZIP archive reading |
skia-safe |
PDF rendering, text measurement, link annotations |
clap |
CLI argument parsing |
thiserror |
Error types |
log + env_logger |
Warnings for unsupported features (RUST_LOG=warn) |
pyo3 (optional) |
Python bindings via maturin |
Contributing
Contributions are welcome. Please open an issue before submitting large PRs.
Built by nerdy.pro.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dxpdf-0.1.5.tar.gz.
File metadata
- Download URL: dxpdf-0.1.5.tar.gz
- Upload date:
- Size: 90.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6dce11ce5bb9f789d938218d6c65b445e677f00a6c7d8fbb963c48e3bb0c4104
|
|
| MD5 |
f1f1e3f7fed24554c6079a0e8067f820
|
|
| BLAKE2b-256 |
931e4f3b3c1d7ebf1ff3f849a9bd861af687b2a4e867ce1fca1c2d63eaae26bd
|
File details
Details for the file dxpdf-0.1.5-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: dxpdf-0.1.5-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.1 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
553433fdeb0eadf6e39a2861e6c07bb3ab591b216cbdfda995501eae59b363b5
|
|
| MD5 |
0d142cce704cde9aa56352a9b109f831
|
|
| BLAKE2b-256 |
a9d1f2b2f67577004cf381b50752c2b0fc7d43190e4639dd44c664ae80256a43
|
File details
Details for the file dxpdf-0.1.5-cp311-cp311-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dxpdf-0.1.5-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35efe8120c5cef0df36ba1726487e8d20871ca4ed56c46ea7a7419b291e43bf8
|
|
| MD5 |
7a3e5ebe1d2dfdf8d03df10f2a8beffa
|
|
| BLAKE2b-256 |
598caf9aa55c19ccd57df17e24a915297e1d2fc954cecbb3b5c7ca8404ccd8fc
|