Skip to main content

Pure Python Word (DOCX) ↔ HTML conversion with guaranteed round-trip fidelity

Project description

docwow

Pure Python Word (DOCX) ↔ HTML conversion with guaranteed round-trip fidelity.

docwow converts Word documents to a self-contained HTML representation and back again — without losing a single paragraph indent, table merge, list level, or inline image.

Why docwow?

Existing libraries solve half the problem:

Library DOCX → HTML HTML → DOCX Round-trip
mammoth good
python-docx basic
docwow yes yes guaranteed

The key insight: docwow embeds every piece of Word metadata into data-dw-* HTML attributes alongside the visual CSS. The browser renders the CSS; when you convert back to DOCX, docwow reads the data attributes and reconstructs the original Word XML exactly.

Install

pip install docwow

Quick Start

import docwow

# DOCX → HTML
html = docwow.to_html("document.docx")

# HTML → DOCX (round-trip)
docwow.to_docx(html, "output.docx")

# Or use the Document object
doc = docwow.open("document.docx")
html = doc.to_html()
doc.to_docx("output.docx")

Documentation

Full documentation at docwow.readthedocs.io.

Requirements

  • Python 3.10+
  • lxml
  • Pillow

Built with Claude Code

This library was vibe coded using Claude Code. Community suggestions, bug reports, and PRs are very welcome.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docwow-0.2.0.tar.gz (47.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docwow-0.2.0-py3-none-any.whl (60.0 kB view details)

Uploaded Python 3

File details

Details for the file docwow-0.2.0.tar.gz.

File metadata

  • Download URL: docwow-0.2.0.tar.gz
  • Upload date:
  • Size: 47.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for docwow-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bce2ced91c32e6e146b3b0ecedb9f947c80f19751b4799308a3207a57f95918c
MD5 ea273f1fd5ad25672e948f9c3d59a893
BLAKE2b-256 fa4d2fe3fbc1cdc297389e39a3b8661faaad05104ae56e667a1eb5c27b90ac93

See more details on using hashes here.

File details

Details for the file docwow-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: docwow-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 60.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for docwow-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3d92a21972195c06ede552fd756664b0e456e8d692329a34eed4ea3e09b8427f
MD5 a495c25b6a39e365fa733dbf9c22a4d0
BLAKE2b-256 fe2effd7d132ad556884f71a35f2c0ae0e5b6b5359776e2eeb5d2a9cf85c330a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page