Skip to main content

Pure Python Word (DOCX) ↔ HTML conversion with guaranteed round-trip fidelity

Project description

docwow

Pure Python Word (DOCX) ↔ HTML conversion with guaranteed round-trip fidelity.

docwow converts Word documents to a self-contained HTML representation and back again — without losing a single paragraph indent, table merge, list level, or inline image.

Why docwow?

Existing libraries solve half the problem:

Library DOCX → HTML HTML → DOCX Round-trip
mammoth good
python-docx basic
docwow yes yes guaranteed

The key insight: docwow embeds every piece of Word metadata into data-dw-* HTML attributes alongside the visual CSS. The browser renders the CSS; when you convert back to DOCX, docwow reads the data attributes and reconstructs the original Word XML exactly.

Install

pip install docwow

Quick Start

import docwow

# DOCX → HTML
html = docwow.to_html("document.docx")

# HTML → DOCX (round-trip)
docwow.to_docx(html, "output.docx")

# Or use the Document object for programmatic editing
doc = docwow.open("document.docx")
para = doc.paragraphs.add_paragraph()
para.runs.add_text("Hello world", bold=True)
doc.to_docx("output.docx")

Feature Support

✅ Supported

Feature Notes
Paragraphs Text, alignment, indentation, spacing, keep-together/with-next, page-break-before
Run formatting Bold, italic, underline, strikethrough, font name/size, colour, highlight, superscript/subscript
Inline images PNG, JPEG, GIF, BMP, TIFF, WebP, SVG, EMF, WMF
Tables Column spans, row spans (vMerge), column/row widths, table-level styles
Lists Bullet and numbered, up to 9 nesting levels, decimal/lowerLetter/upperLetter/lowerRoman/upperRoman formats
Hyperlinks External URLs, mailto links
Paragraph styles Style ID round-trip, Heading 1–9 and custom styles
Page geometry Page size, margins
Programmatic API Open, edit, and save documents in pure Python

🚧 In Progress

Nothing currently — check back soon.

🗓 Planned

Feature Notes
Headers & footers Including page numbers
Table of contents Requires bookmark support
Bookmarks In-document anchor links and TOC targets
Comments Annotations / review marks
Track changes Accept/reject revision marks
Footnotes & endnotes
General HTML → DOCX Best-effort conversion of arbitrary HTML (not just docwow HTML)

Documentation

Full documentation at docwow.readthedocs.io.

Requirements

  • Python 3.10+
  • lxml
  • Pillow

Built with Claude Code

This library was vibe coded using Claude Code. Community suggestions, bug reports, and PRs are very welcome.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docwow-0.3.0.tar.gz (51.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docwow-0.3.0-py3-none-any.whl (62.2 kB view details)

Uploaded Python 3

File details

Details for the file docwow-0.3.0.tar.gz.

File metadata

  • Download URL: docwow-0.3.0.tar.gz
  • Upload date:
  • Size: 51.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for docwow-0.3.0.tar.gz
Algorithm Hash digest
SHA256 4032b95ba531e30aa89708361a8a57aabf441f59b5f6e750565d1e0f6128b770
MD5 bd9c198376ba565a4afcb5bb4be11188
BLAKE2b-256 7d4720216c7a8386b38073220ba53712de48f2e6381fbae681a3906d6c9cf25c

See more details on using hashes here.

File details

Details for the file docwow-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: docwow-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 62.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for docwow-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 105589bd91c3be001cdcf63a2a323fe84a48d613ea72455e81f967721b7d71bb
MD5 b87c9c3b73e9d494cb57ff52a193896c
BLAKE2b-256 9f3b599b034a47ca8f85a542a79cbc064d76c82ffb93d32da4db581e361c2867

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page