A library for converting DOCX documents to HTML and plain text
Project description
Docx Parser and Converter 📄✨
A powerful library for converting DOCX documents into HTML and plain text, with detailed parsing of document properties and styles.
Table of Contents
- Introduction 🌟
- Project Overview 🛠️
- Key Features 🌟
- Installation 💾
- Usage 🚀
- Quick Start Guide 📖
- Examples 📚
- API Reference 📜
Introduction 🌟
Welcome to the DOCX-HTML-TXT Converter project! This library allows you to easily convert DOCX documents into HTML and plain text formats, extracting detailed properties and styles using Pydantic models.
Project Overview 🛠️
The project is structured to parse DOCX files, convert their content into structured data using Pydantic models, and provide conversion utilities to transform this data into HTML or plain text.
Key Features 🌟
- Convert DOCX documents to HTML or plain text.
- Parse and extract detailed document properties and styles.
- Structured data representation using Pydantic models.
Installation 💾
To install the library, you can use pip. (Add the pip install command manually)
pip install docx-parser-converter
Usage 🚀
Importing the Library
To start using the library, import the necessary modules:
from docx_html_txt.docx_to_html import DocxToHtmlConverter
from docx_html_txt.docx_to_txt import DocxToTxtConverter
from docx_html_txt.docx_parsers.utils import read_binary_from_file_path
Quick Start Guide 📖
-
Convert to HTML:
from docx_html_txt.docx_to_html import DocxToHtmlConverter from docx_html_txt.docx_parsers.utils import read_binary_from_file_path docx_path = "path_to_your_docx_file.docx" html_output_path = "output.html" docx_file_content = read_binary_from_file_path(docx_path) converter = DocxToHtmlConverter(docx_file_content, use_default_values=True) html_output = converter.convert_to_html() converter.save_html_to_file(html_output, html_output_path)
-
Convert to Plain Text:
from docx_html_txt.docx_to_txt import DocxToTxtConverter from docx_html_txt.docx_parsers.utils import read_binary_from_file_path docx_path = "path_to_your_docx_file.docx" txt_output_path = "output.txt" docx_file_content = read_binary_from_file_path(docx_path) converter = DocxToTxtConverter(docx_file_content, use_default_values=True) txt_output = converter.convert_to_txt(indent=True) converter.save_txt_to_file(txt_output, txt_output_path)
Examples 📚
Original DOCX File
Converted to HTML
Converted to Plain Text
API Reference 📜
For detailed API documentation, please visit our Read the Docs page.
Enjoy using DOCX-HTML-TXT Converter! 🚀✨
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file docx_parser_converter-0.3.1-py3-none-any.whl
.
File metadata
- Download URL: docx_parser_converter-0.3.1-py3-none-any.whl
- Upload date:
- Size: 114.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59070de7b203db3e258ed063d90363d28cd1cbfc5dd7e00baaa8e05c890dc1c8 |
|
MD5 | bcb94e5eefa1a14e8079a963b7e58958 |
|
BLAKE2b-256 | d6f40785affa01d058046697600ceff0de98c5afc3e221be4bde15e39b677823 |