Skip to main content

Convert HTML to Docx easily and fastly

Project description

HTML FOR DOCX

Convert html to docx, this project is a fork from descontinued pqzx/html2docx.

How install

pip install html-for-docx

Usage

The basic usage: Add HTML formatted to an existing Docx

from html4docx import HtmlToDocx

parser = HtmlToDocx()
html_string = '<h1>Hello world</h1>'
parser.add_html_to_document(html_string, filename_docx)

You can use python-docx to manipulate the file as well, here an example

from docx import Document
from html4docx import HtmlToDocx

document = Document()
new_parser = HtmlToDocx()

html_string = '<h1>Hello world</h1>'
new_parser.add_html_to_document(html_string, document)

document.save('your_file_name')

Convert files directly

from html4docx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.parse_html_file(input_html_file_path, output_docx_file_path)

Convert files from a string

from html4docx import HtmlToDocx

new_parser = HtmlToDocx()
docx = new_parser.parse_html_string(input_html_file_string)

Change table styles

Tables are not styled by default. Use the table_style attribute on the parser to set a table style. The style is used for all tables.

from html4docx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.table_style = 'Light Shading Accent 4'

To add borders to tables, use the TableGrid style:

new_parser.table_style = 'TableGrid'

Default table styles can be found here: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#table-styles-in-default-template

Why

My goal to fork and fix/update this package was to complete my current task at work that envolves manipulating a html to docs which the original couldn't complete because was lacking of few features and bugs, so instead creating a package from zero, I prefer update this one.

Differences (fixes and new features)

Fixes

  • Handle missing run for leading br tag | dashingdove from PR
  • Fix base64 images | djplaner from Issue
  • Handle img tag without src attribute | johnjor from PR
  • Fix bug when any style has !important | Dfop02
  • Fix 'style lookup by style_id is deprecated.' | Dfop02

New Features

  • Add Witdh/Height style to images | maifeeulasad from PR
  • Support px, cm, pt and % for style margin-left to paragraphs | Dfop02
  • Improve performance on large tables | dashingdove from PR
  • Support for HTML Pagination | Evilran from PR
  • Support Table style | Evilran from PR
  • Support alternative encoding | HebaElwazzan from PR
  • Support colors by name | Dfop02
  • Support font_size when text, ex.: small, medium, etc. | Dfop02
  • Support to internal links (Anchor) | Dfop02
  • Refactory Tests to be more consistent and less 'human validation' | Dfop02

License

This project is licensed under the MIT License - see the LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html_for_docx-1.0.4.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

html_for_docx-1.0.4-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file html_for_docx-1.0.4.tar.gz.

File metadata

  • Download URL: html_for_docx-1.0.4.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.4

File hashes

Hashes for html_for_docx-1.0.4.tar.gz
Algorithm Hash digest
SHA256 91997baf1d0b3fe5e6213b7966a736d2ac388a4537c58c19a4ba0e420ca050de
MD5 b8cbdeb913bf266e959c2c95e4790ccc
BLAKE2b-256 afad62c2b12aa48426b0ec8bc6718cd246bf581118953d0beaa9bf340e157884

See more details on using hashes here.

File details

Details for the file html_for_docx-1.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for html_for_docx-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9b648fe94e9a38f0530ef4fc8296fc0677840f223f9926800b7f4470a222fdab
MD5 dfd5e25c99ae31d8ad22ecf87d30fffa
BLAKE2b-256 cdb5d6887d485dd8d480652eb884c903ebd82a5805422ed7fe3d146259b0c45c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page