Convert HTML to Docx easily and fastly
Project description
HTML FOR DOCX
Convert html to docx, this project is a fork from descontinued pqzx/html2docx.
How install
pip install html-for-docx
Usage
The basic usage: Add HTML formatted to an existing Docx
from html4docx import HtmlToDocx
parser = HtmlToDocx()
html_string = '<h1>Hello world</h1>'
parser.add_html_to_document(html_string, filename_docx)
You can use python-docx
to manipulate the file as well, here an example
from docx import Document
from html4docx import HtmlToDocx
document = Document()
new_parser = HtmlToDocx()
html_string = '<h1>Hello world</h1>'
new_parser.add_html_to_document(html_string, document)
document.save('your_file_name')
Convert files directly
from html4docx import HtmlToDocx
new_parser = HtmlToDocx()
new_parser.parse_html_file(input_html_file_path, output_docx_file_path)
Convert files from a string
from html4docx import HtmlToDocx
new_parser = HtmlToDocx()
docx = new_parser.parse_html_string(input_html_file_string)
Change table styles
Tables are not styled by default. Use the table_style
attribute on the parser to set a table style. The style is used for all tables.
from html4docx import HtmlToDocx
new_parser = HtmlToDocx()
new_parser.table_style = 'Light Shading Accent 4'
To add borders to tables, use the TableGrid
style:
new_parser.table_style = 'TableGrid'
Default table styles can be found here: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#table-styles-in-default-template
Why
My goal to fork and fix/update this package was to complete my current task at work that envolves manipulating a html to docs which the original couldn't complete because was lacking of few features and bugs, so instead creating a package from zero, I prefer update this one.
Differences (fixes and new features)
Fixes
- Handle missing run for leading br tag | dashingdove from PR
- Fix base64 images | djplaner from Issue
- Handle img tag without src attribute | johnjor from PR
- Fix bug when any style has
!important
| Dfop02 - Fix 'style lookup by style_id is deprecated.' | Dfop02
New Features
- Add Witdh/Height style to images | maifeeulasad from PR
- Support px, cm, pt and % for style margin-left to paragraphs | Dfop02
- Improve performance on large tables | dashingdove from PR
- Support for HTML Pagination | Evilran from PR
- Support Table style | Evilran from PR
- Support alternative encoding | HebaElwazzan from PR
- Refactory Tests to be more consistent and less 'human validation' | Dfop02
License
This project is licensed under the MIT License - see the LICENSE file for details
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file html-for-docx-1.0.2.tar.gz
.
File metadata
- Download URL: html-for-docx-1.0.2.tar.gz
- Upload date:
- Size: 17.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2cd761feeb4809b3068394d94989707cc36799f3bb6dfb6a732ee51b961dba2 |
|
MD5 | 8fdf520d85eedbe1fb4d08420b17cb83 |
|
BLAKE2b-256 | e31ae15d5b75acb86f9fe1c9d3996b89dfd1c09574e9d051161e8750a594df1f |