Convert HTML to Docx easily and fastly

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dfop02

These details have not been verified by PyPI

Project description

HTML FOR DOCX

Tests Version Supported Versions

Convert html to docx, this project is a fork from descontinued pqzx/html2docx.

How install

pip install html-for-docx

Usage

The basic usage

Add HTML-formatted content to an existing .docx document

from html4docx import HtmlToDocx

parser = HtmlToDocx()
html_string = '<h1>Hello world</h1>'
parser.add_html_to_document(html_string, filename_docx)

You can use python-docx to manipulate directly the file, here an example

from docx import Document
from html4docx import HtmlToDocx

document = Document()
parser = HtmlToDocx()

html_string = '<h1>Hello world</h1>'
parser.add_html_to_document(html_string, document)

document.save('your_file_name.docx')

or incrementally add new html to document and save it when finished, new content will always be added at the end

from docx import Document
from html4docx import HtmlToDocx

document = Document()
parser = HtmlToDocx()

for part in ['First', 'Second', 'Third']:
    parser.add_html_to_document(f'<h1>{part} Part</h1>', document)

parser.save('your_file_name.docx')

When you pass a Document object, you can either use document.save() from python-docx or parser.save() from html4docx, both works well.

Both supports saving it in-memory, using BytesIO.

from io import BytesIO
from docx import Document
from html4docx import HtmlToDocx

buffer = BytesIO()
document = Document()
parser = HtmlToDocx()

html_string = '<h1>Hello world</h1>'
parser.add_html_to_document(html_string, document)

# Save the document to the in-memory buffer
parser.save(buffer)

# If you need to read from the buffer again after saving,
# you might need to reset its position to the beginning
buffer.seek(0)

Convert files directly

from html4docx import HtmlToDocx

parser = HtmlToDocx()
parser.parse_html_file(input_html_file_path, output_docx_file_path)
# You can also define a encoding, by default is utf-8
parser.parse_html_file(input_html_file_path, output_docx_file_path, 'utf-8')

Convert files from a string

from html4docx import HtmlToDocx

parser = HtmlToDocx()
docx = parser.parse_html_string(input_html_file_string)

Change table styles

Tables are not styled by default. Use the table_style attribute on the parser to set a table style before convert html. The style is used for all tables.

from html4docx import HtmlToDocx

parser = HtmlToDocx()
parser.table_style = 'Light Shading Accent 4'
docx = parser.parse_html_string(input_html_file_string)

To add borders to tables, use the Table Grid style:

parser.table_style = 'Table Grid'

All table styles we support can be found here.

Options

There is 5 options that you can use to personalize your execution:

Disable Images: Ignore all images.
Disable Tables: If you do it, it will render just the raw tables content
Disable Styles: Ignore all CSS styles. Also disables Style-Map.
Disable Fix-HTML: Use BeautifulSoap to Fix possible HTML missing tags.
Disable Style-Map: Ignore CSS classes to word styles mapping
Disable Tag-Override: Ignore html tag overrides.
Disable HTML-Comments: Ignore all "" comments from HTML.

This is how you could disable them if you want:

from html4docx import HtmlToDocx

parser = HtmlToDocx()
parser.options['images'] = False # Default True
parser.options['tables'] = False # Default True
parser.options['styles'] = False # Default True
parser.options['fix-html'] = False # Default True
parser.options['html-comments'] = False # Default False
parser.options['style-map'] = False # Default True
parser.options['tag-override'] = False # Default True
docx = parser.parse_html_string(input_html_file_string)

Extended Styling Features

CSS Class to Word Style Mapping

Map HTML CSS classes to Word document styles:

from html4docx import HtmlToDocx

style_map = {
    'code-block': 'Code Block',
    'numbered-heading-1': 'Heading 1 Numbered',
    'finding-critical': 'Finding Critical'
}

parser = HtmlToDocx(style_map=style_map)
parser.add_html_to_document(html, document)

Tag Style Overrides

Override default tag-to-style mappings:

tag_overrides = {
    'h1': 'Custom Heading 1',  # All <h1> use this style
    'pre': 'Code Block'        # All <pre> use this style
}

parser = HtmlToDocx(tag_style_overrides=tag_overrides)

Custom styles from a Word template: Use a document created from a .docx that already defines the styles (e.g. "Code Block", "Custom Markdown"). Pass that same document to the parser and save it so the custom styles are preserved:

from docx import Document
from html4docx import HtmlToDocx

doc = Document("path/to/template.docx")  # template has Code Block, Custom Markdown, etc.
parser = HtmlToDocx(tag_style_overrides={"code": "Custom Markdown", "pre": "Code Block"})
parser.add_html_to_document(html, doc)
doc.save("output.docx")  # save the template-based doc so custom styles are preserved

If you save a different document (for example, by creating a new Document() instead of loading your template), the output file will not contain the template’s custom styles.

If a referenced custom style does not exist in the document at generation time, a warning will be logged to help you detect the missing style.

Default Paragraph Style

Set custom default paragraph style:

# Use 'Body' as default (default behavior)
parser = HtmlToDocx(default_paragraph_style='Body')

# Use Word's default 'Normal' style
parser = HtmlToDocx(default_paragraph_style=None)

Inline CSS Styles

Full support for inline CSS styles on any element:

<p style="color: red; font-size: 14pt">Red 14pt paragraph</p>
<span style="font-weight: bold; color: blue">Bold blue text</span>

Supported CSS properties:

color
font-size
font-weight (bold)
font-style (italic)
text-decoration (underline, line-through)
font-family
text-align
background-color
Border (for tables)
Verticial Align (for tables)

!important Flag Support

Proper CSS precedence with !important:

<span style="color: gray">
  Gray text with <span style="color: red !important">red important</span>.
</span>

The !important flag ensures highest priority.

Style Precedence Order

Styles are applied in this order (lowest to highest priority):

Base HTML tag styles (<b>, <em>, <code>)
Parent span styles
CSS class-based styles (from style_map)
Inline CSS styles (from style attribute)
!important inline CSS styles (highest priority)

Metadata

You're able to read or set docx metadata:

from docx import Document
from html4docx import HtmlToDocx

document = Document()
parser = HtmlToDocx()
parser.set_initial_attrs(document)
metadata = parser.metadata

# You can get metadata as dict
metadata_json = metadata.get_metadata()
print(metadata_json['author']) # Jane
# or just print all metadata if if you want
metadata.get_metadata(print_result=True)

# Set new metadata
metadata.set_metadata(author="Jane", created="2025-07-18T09:30:00")
document.save('your_file_name.docx')

You can find all available metadata attributes here.

Logging

html4docx uses Python's standard logging module with named, hierarchical loggers (for example, html4docx.h4d). The library never logs directly to the root logger and installs a NullHandler by default, so it remains silent unless your application configures logging.

Silence all html4docx logs:

import logging

logging.getLogger("html4docx").setLevel(logging.ERROR) # suppresses WARNING and below errors

Enable debug logging:

import logging

logging.basicConfig(level=logging.DEBUG)
logging.getLogger("html4docx").setLevel(logging.DEBUG)

Django / framework LOGGING dict — add an entry for the html4docx parent and it applies to all sub-loggers:

LOGGING = {
    "version": 1,
    "loggers": {
        "html4docx": {
            "level": "ERROR",   # suppresses WARNING and below
            "propagate": False,
        },
    },
}

Note: Unrecognised CSS properties (e.g. letter-spacing, margin, padding) are intentionally logged at DEBUG level because they are expected skips for any real-world HTML, not problems. Only genuinely unexpected situations (missing styles, unsupported colour formats, etc.) are logged at WARNING.

Why

My goal in forking and fixing/updating this package was to complete my current task at work, which involves converting HTML to DOCX. The original package lacked a few features and had some bugs, preventing me from completing the task. Instead of creating a new package from scratch, I preferred to update this one.

Differences (fixes and new features)

Fixes

Fix table_style not working | Dfop02 from Issue
Handle missing run for leading br tag | dashingdove from PR
Fix base64 images | djplaner from Issue
Handle img tag without src attribute | johnjor from PR
Fix bug when any style has !important | Dfop02
Fix 'style lookup by style_id is deprecated.' | Dfop02
Fix background-color not working | Dfop02
Fix crashes when img or bookmark is created without paragraph | Dfop02
Fix Ordered and Unordered Lists | TaylorN15 from PR
Fixed styles was only being applied to span tag. | Dfop02 from Issue
Fixed bug on styles parsing when style contains multiple colon. | Dfop02
Fixed highlighting a single word | Lynuxen
Fix color parsing failing due to invalid colors, falling back to black. | dfop02 from Issue
Fix logging noise: replace root-logger calls with named module loggers so consumers can silence or configure html4docx output independently. | dfop02 from Issue

New Features

Add Witdh/Height style to images | maifeeulasad from PR
Support px, cm, pt, in, rem, em, mm, pc and % units for styles | Dfop02
Improve performance on large tables | dashingdove from PR
Support for HTML Pagination | Evilran from PR
Support Table style | Evilran from PR
Support alternative encoding | HebaElwazzan from PR
Support colors by name | Dfop02
Support font_size when text, ex.: small, medium, etc. | Dfop02
Support to internal links (Anchor) | Dfop02
Support to rowspan and colspan in tables. | Dfop02 from Issue
Support to 'vertical-align' in table cells. | Dfop02
Support to metadata | Dfop02
Add support to table cells style (border, background-color, width, height, margin) | Dfop02
Being able to use inline images on same paragraph. | Dfop02
Refactory Tests to be more consistent and less 'human validation' | Dfop02
Support for common CSS properties for text | Lynuxen
Support for CSS classes to Word Styles | raithedavion
Support for HTML tag style overrides | raithedavion

To-Do

These are the ideas I'm planning to work on in the future to make this project even better:

Add support for the <style> tag, including all classes, and ensure they are correctly applied throughout the file.
Add support for the <link> tag to load external CSS and apply it properly across the file.

Known Issues

Maximum Nesting Depth: Ordered lists support up to 3 nested levels. Any additional depth beyond level 3 will be treated as level 3.
Counter Reset Behavior:
- At level 1, starting a new ordered list will reset the counter.
- At levels 2 and 3, the counter will continue from the previous item unless explicitly reset.

Project Guidelines

This project is primarily designed for compatibility with Microsoft Word, but it currently works well with LibreOffice and Google Docs, based on our testing. The goal is to maintain this cross-platform harmony while continuing to implement fixes and updates.

⚠️ However, please note that Microsoft Word is the priority. Bugs or issues specific to other editors (e.g., LibreOffice or Google Docs) may be considered, but fixing them is secondary to maintaining full compatibility with Word.

License

This project is licensed under the MIT License - see the LICENSE file for details

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dfop02

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.1.6

Jun 24, 2026

1.1.5

Apr 17, 2026

1.1.4

Feb 27, 2026

1.1.3

Dec 16, 2025

1.1.2

Dec 8, 2025

1.1.1

Nov 26, 2025

1.1.0

Nov 2, 2025

1.0.10

Aug 20, 2025

1.0.9

Jul 18, 2025

1.0.8

Jul 4, 2025

1.0.7

Jun 17, 2025

1.0.6

Apr 2, 2025

1.0.5

Feb 24, 2025

1.0.4

Aug 7, 2024

1.0.3

Feb 27, 2024

1.0.2

Feb 20, 2024

1.0.1

Feb 5, 2024

1.0.0

Feb 5, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html_for_docx-1.1.6.tar.gz (61.0 kB view details)

Uploaded Jun 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

html_for_docx-1.1.6-py3-none-any.whl (57.5 kB view details)

Uploaded Jun 24, 2026 Python 3

File details

Details for the file html_for_docx-1.1.6.tar.gz.

File metadata

Download URL: html_for_docx-1.1.6.tar.gz
Upload date: Jun 24, 2026
Size: 61.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for html_for_docx-1.1.6.tar.gz
Algorithm	Hash digest
SHA256	`b3dd650de95bc433b15ee818584b31b2b229ade3f70c8e532914640554056c87`
MD5	`0c36ccddf83ee884d99551c3b084ac4d`
BLAKE2b-256	`99fe1a0ac13e4f92fa60f92bb3a241bbc8f4d261acc53d7d6d8ef37f15074ed3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for html_for_docx-1.1.6.tar.gz:

Publisher: pypi.yml on dfop02/html4docx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: html_for_docx-1.1.6.tar.gz
- Subject digest: b3dd650de95bc433b15ee818584b31b2b229ade3f70c8e532914640554056c87
- Sigstore transparency entry: 1937973017
- Sigstore integration time: Jun 24, 2026
Source repository:
- Permalink: dfop02/html4docx@0edb311053536ba0eda902e53b1017c53c38059c
- Branch / Tag: refs/tags/1.1.6
- Owner: https://github.com/dfop02
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@0edb311053536ba0eda902e53b1017c53c38059c
- Trigger Event: push

File details

Details for the file html_for_docx-1.1.6-py3-none-any.whl.

File metadata

Download URL: html_for_docx-1.1.6-py3-none-any.whl
Upload date: Jun 24, 2026
Size: 57.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for html_for_docx-1.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5d3977e843f9d914ec65fdd7a749be0f4beebe451613399091a0de1a9d0bff07`
MD5	`9b64622372b4aa3c87a76ffb6fd1e2ab`
BLAKE2b-256	`4a3ec1c73f0c654b79ccf3bd5d9e1a30c50e4fd9525d917796a15929aab6e798`

See more details on using hashes here.

Provenance

The following attestation bundles were made for html_for_docx-1.1.6-py3-none-any.whl:

Publisher: pypi.yml on dfop02/html4docx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: html_for_docx-1.1.6-py3-none-any.whl
- Subject digest: 5d3977e843f9d914ec65fdd7a749be0f4beebe451613399091a0de1a9d0bff07
- Sigstore transparency entry: 1937973263
- Sigstore integration time: Jun 24, 2026
Source repository:
- Permalink: dfop02/html4docx@0edb311053536ba0eda902e53b1017c53c38059c
- Branch / Tag: refs/tags/1.1.6
- Owner: https://github.com/dfop02
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@0edb311053536ba0eda902e53b1017c53c38059c
- Trigger Event: push

html-for-docx 1.1.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

HTML FOR DOCX

How install

Usage

The basic usage

Convert files directly

Convert files from a string

Change table styles

Options

Extended Styling Features

CSS Class to Word Style Mapping

Tag Style Overrides

Default Paragraph Style

Inline CSS Styles

!important Flag Support

Style Precedence Order

Metadata

Logging

Why

Differences (fixes and new features)

To-Do

Known Issues

Project Guidelines

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance