Skip to main content

Convert HTML to Docx easily and fastly

Project description

HTML FOR DOCX

Convert html to docx, this project is a fork from descontinued pqzx/html2docx.

How install

pip install html-for-docx

Usage

The basic usage: Add HTML formatted to an existing Docx

from html4docx import HtmlToDocx

parser = HtmlToDocx()
html_string = '<h1>Hello world</h1>'
parser.add_html_to_document(html_string, filename_docx)

You can use python-docx to manipulate the file as well, here an example

from docx import Document
from html4docx import HtmlToDocx

document = Document()
new_parser = HtmlToDocx()

html_string = '<h1>Hello world</h1>'
new_parser.add_html_to_document(html_string, document)

document.save('your_file_name')

Convert files directly

from html4docx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.parse_html_file(input_html_file_path, output_docx_file_path)

Convert files from a string

from html4docx import HtmlToDocx

new_parser = HtmlToDocx()
docx = new_parser.parse_html_string(input_html_file_string)

Change table styles

Tables are not styled by default. Use the table_style attribute on the parser to set a table style. The style is used for all tables.

from html4docx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.table_style = 'Light Shading Accent 4'

To add borders to tables, use the TableGrid style:

new_parser.table_style = 'TableGrid'

Default table styles can be found here: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#table-styles-in-default-template

Why

My goal to fork and fix/update this package was to complete my current task at work that envolves manipulating a html to docs which the original couldn't complete because was lacking of few features and bugs, so instead creating a package from zero, I prefer update this one.

Differences (fixes and new features)

Fixes

  • Handle missing run for leading br tag | dashingdove from PR
  • Fix base64 images | djplaner from Issue
  • Handle img tag without src attribute | johnjor from PR
  • Fix bug when any style has !important | Dfop02
  • Fix 'style lookup by style_id is deprecated.' | Dfop02

New Features

  • Add Witdh/Height style to images | maifeeulasad from PR
  • Support px, cm, pt and % for style margin-left to paragraphs | Dfop02
  • Improve performance on large tables | dashingdove from PR
  • Support for HTML Pagination | Evilran from PR
  • Support Table style | Evilran from PR
  • Support alternative encoding | HebaElwazzan from PR
  • Refactory Tests to be more consistent and less 'human validation' | Dfop02

License

This project is licensed under the MIT License - see the LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html-for-docx-1.0.2.tar.gz (17.7 kB view details)

Uploaded Source

File details

Details for the file html-for-docx-1.0.2.tar.gz.

File metadata

  • Download URL: html-for-docx-1.0.2.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for html-for-docx-1.0.2.tar.gz
Algorithm Hash digest
SHA256 a2cd761feeb4809b3068394d94989707cc36799f3bb6dfb6a732ee51b961dba2
MD5 8fdf520d85eedbe1fb4d08420b17cb83
BLAKE2b-256 e31ae15d5b75acb86f9fe1c9d3996b89dfd1c09574e9d051161e8750a594df1f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page