A package to convert DOCX to HTML and HTML to DOCX with formatting preservation.
Project description
DOCX-HTML Converter
This package provides tools to convert DOCX documents to HTML and HTML back to DOCX, while preserving formatting such as tables, lists, and paragraphs. Now supports conversions from binary input and output via BytesIO
objects, enabling in-memory processing.
Features
- Convert DOCX to HTML with support for paragraphs, lists, tables, and inline formatting.
- Convert HTML to DOCX with support for lists, tables, inline styles (bold, italic), and more.
- New support for in-memory conversions using binary inputs/outputs (
BytesIO
). - Preserve complex formatting such as text alignment, indentation, and font styles during conversion.
Installation
Install the package via pip after uploading it to PyPI:
pip install docxhtml-converter
Usage
Convert DOCX to HTML
Use the htmlifier
function to convert a DOCX file into HTML:
from docxhtml_converter.docxhtml import htmlifier
docx_path = "document.docx"
output_html = "output.html"
htmlifier(docx_path, output_html)
Convert HTML to DOCX
Use the docxifier
function to convert an HTML file back into DOCX:
from docxhtml_converter.htmldocx import docxifier
input_html = "output.html"
output_docx = "regenerated.docx"
docxifier(input_html, output_docx)
Convert DOCX (Binary) to HTML String
For in-memory conversions, use the get_html_from_docx_binary
function to convert a binary DOCX object (like a BytesIO
object) into an HTML string:
from docxhtml_converter.docxhtml import get_html_from_docx_binary
with open("document.docx", "rb") as f:
docx_binary = f.read()
html_content = get_html_from_docx_binary(docx_binary)
print(html_content)
Convert HTML String to DOCX (Binary Output)
To convert an HTML string directly to a DOCX binary (useful for working with in-memory files), use the docxifier_from_html_string
function:
from docxhtml_converter.htmldocx import docxifier_from_html_string
html_string = "<html><body><p>Hello, World!</p></body></html>"
docx_binary = docxifier_from_html_string(html_string)
# Save to a file
with open("output.docx", "wb") as f:
f.write(docx_binary.read())
Example Script
Here’s an example script demonstrating both file-based and in-memory conversions:
from docxhtml_converter.docxhtml import htmlifier, get_html_from_docx_binary
from docxhtml_converter.htmldocx import docxifier, docxifier_from_html_string
# Convert DOCX to HTML file
docx_path = "document.docx"
output_html = "output.html"
htmlifier(docx_path, output_html)
# Convert HTML file back to DOCX
input_html = "output.html"
output_docx = "regenerated.docx"
docxifier(input_html, output_docx)
# Convert DOCX binary to HTML string
with open(docx_path, "rb") as f:
docx_binary = f.read()
html_string = get_html_from_docx_binary(docx_binary)
print(html_string)
# Convert HTML string to DOCX binary and save to file
docx_binary_output = docxifier_from_html_string(html_string)
with open("new_output.docx", "wb") as f:
f.write(docx_binary_output.read())
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file docxhtml-converter-0.1.2.tar.gz
.
File metadata
- Download URL: docxhtml-converter-0.1.2.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3057dac45e00ed62960f830347285c3cc430639c8bbb0c647682a4c99843dfcd |
|
MD5 | c9902fcf9323aefe9e2052cd5ebce3a0 |
|
BLAKE2b-256 | d5a815b87e7e271c98e06ea8a0473bdb1770746120020aaa65cc13b2c556e830 |
File details
Details for the file docxhtml_converter-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: docxhtml_converter-0.1.2-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53e01afc719c02fd549e56ed34a774eb128d595662028235cc3269889b478db0 |
|
MD5 | e34957c56628632bb9f1f91af5381aba |
|
BLAKE2b-256 | 0bc01f2447df6162c2c3248b1b0a16d6b06e859ab93b97845af1e322dc868bb2 |