Skip to main content

A package to convert DOCX to HTML and HTML to DOCX with formatting preservation.

Project description


DOCX-HTML Converter

This package provides tools to convert DOCX documents to HTML and HTML back to DOCX, while preserving formatting such as tables, lists, and paragraphs. Now supports conversions from binary input and output via BytesIO objects, enabling in-memory processing.

Features

  • Convert DOCX to HTML with support for paragraphs, lists, tables, and inline formatting.
  • Convert HTML to DOCX with support for lists, tables, inline styles (bold, italic), and more.
  • New support for in-memory conversions using binary inputs/outputs (BytesIO).
  • Preserve complex formatting such as text alignment, indentation, and font styles during conversion.

Installation

Install the package via pip after uploading it to PyPI:

pip install docxhtml-converter

Usage

Convert DOCX to HTML

Use the htmlifier function to convert a DOCX file into HTML:

from docxhtml_converter.docxhtml import htmlifier

docx_path = "document.docx"
output_html = "output.html"
htmlifier(docx_path, output_html)

Convert HTML to DOCX

Use the docxifier function to convert an HTML file back into DOCX:

from docxhtml_converter.htmldocx import docxifier

input_html = "output.html"
output_docx = "regenerated.docx"
docxifier(input_html, output_docx)

Convert DOCX (Binary) to HTML String

For in-memory conversions, use the get_html_from_docx_binary function to convert a binary DOCX object (like a BytesIO object) into an HTML string:

from docxhtml_converter.docxhtml import get_html_from_docx_binary

with open("document.docx", "rb") as f:
    docx_binary = f.read()

html_content = get_html_from_docx_binary(docx_binary)
print(html_content)

Convert HTML String to DOCX (Binary Output)

To convert an HTML string directly to a DOCX binary (useful for working with in-memory files), use the docxifier_from_html_string function:

from docxhtml_converter.htmldocx import docxifier_from_html_string

html_string = "<html><body><p>Hello, World!</p></body></html>"
docx_binary = docxifier_from_html_string(html_string)

# Save to a file
with open("output.docx", "wb") as f:
    f.write(docx_binary.read())

Example Script

Here’s an example script demonstrating both file-based and in-memory conversions:

from docxhtml_converter.docxhtml import htmlifier, get_html_from_docx_binary
from docxhtml_converter.htmldocx import docxifier, docxifier_from_html_string

# Convert DOCX to HTML file
docx_path = "document.docx"
output_html = "output.html"
htmlifier(docx_path, output_html)

# Convert HTML file back to DOCX
input_html = "output.html"
output_docx = "regenerated.docx"
docxifier(input_html, output_docx)

# Convert DOCX binary to HTML string
with open(docx_path, "rb") as f:
    docx_binary = f.read()

html_string = get_html_from_docx_binary(docx_binary)
print(html_string)

# Convert HTML string to DOCX binary and save to file
docx_binary_output = docxifier_from_html_string(html_string)
with open("new_output.docx", "wb") as f:
    f.write(docx_binary_output.read())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docxhtml-converter-0.1.2.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

docxhtml_converter-0.1.2-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file docxhtml-converter-0.1.2.tar.gz.

File metadata

  • Download URL: docxhtml-converter-0.1.2.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.11

File hashes

Hashes for docxhtml-converter-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3057dac45e00ed62960f830347285c3cc430639c8bbb0c647682a4c99843dfcd
MD5 c9902fcf9323aefe9e2052cd5ebce3a0
BLAKE2b-256 d5a815b87e7e271c98e06ea8a0473bdb1770746120020aaa65cc13b2c556e830

See more details on using hashes here.

File details

Details for the file docxhtml_converter-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for docxhtml_converter-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 53e01afc719c02fd549e56ed34a774eb128d595662028235cc3269889b478db0
MD5 e34957c56628632bb9f1f91af5381aba
BLAKE2b-256 0bc01f2447df6162c2c3248b1b0a16d6b06e859ab93b97845af1e322dc868bb2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page