Skip to main content

A library for converting DOCX documents to HTML and plain text

Project description

Docx Parser and Converter 📄✨

A powerful library for converting DOCX documents into HTML and plain text, with detailed parsing of document properties and styles.

Table of Contents

Introduction 🌟

Welcome to the DOCX-HTML-TXT Converter project! This library allows you to easily convert DOCX documents into HTML and plain text formats, extracting detailed properties and styles using Pydantic models.

Project Overview 🛠️

The project is structured to parse DOCX files, convert their content into structured data using Pydantic models, and provide conversion utilities to transform this data into HTML or plain text.

Key Features 🌟

  • Convert DOCX documents to HTML or plain text.
  • Parse and extract detailed document properties and styles.
  • Structured data representation using Pydantic models.

Installation 💾

To install the library, you can use pip. (Add the pip install command manually)

pip install docx-parser-converter

Usage 🚀

Importing the Library

To start using the library, import the necessary modules:

from docx_html_txt.docx_to_html import DocxToHtmlConverter
from docx_html_txt.docx_to_txt import DocxToTxtConverter
from docx_html_txt.docx_parsers.utils import read_binary_from_file_path

Quick Start Guide 📖

  1. Convert to HTML:

    from docx_html_txt.docx_to_html import DocxToHtmlConverter
    from docx_html_txt.docx_parsers.utils import read_binary_from_file_path
    
     docx_path = "path_to_your_docx_file.docx"
     html_output_path = "output.html"
    
     docx_file_content = read_binary_from_file_path(docx_path)
    
     converter = DocxToHtmlConverter(docx_file_content, use_default_values=True)
     html_output = converter.convert_to_html()
     converter.save_html_to_file(html_output, html_output_path)
    
  2. Convert to Plain Text:

    from docx_html_txt.docx_to_txt import DocxToTxtConverter
    from docx_html_txt.docx_parsers.utils import read_binary_from_file_path
    
     docx_path = "path_to_your_docx_file.docx"
     txt_output_path = "output.txt"
    
     docx_file_content = read_binary_from_file_path(docx_path)
    
     converter = DocxToTxtConverter(docx_file_content, use_default_values=True)
     txt_output = converter.convert_to_txt(indent=True)
     converter.save_txt_to_file(txt_output, txt_output_path)
    

Examples 📚

Original DOCX File

Original DOCX File in LibreOffice Original DOCX File in LibreOffice

Converted to HTML

Converted HTML Output Converted HTML Output

Converted to Plain Text

Converted TXT Output

API Reference 📜

For detailed API documentation, please visit our Read the Docs page.

Enjoy using DOCX-HTML-TXT Converter! 🚀✨

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

docx_parser_converter-0.3.1-py3-none-any.whl (114.4 kB view details)

Uploaded Python 3

File details

Details for the file docx_parser_converter-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for docx_parser_converter-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 59070de7b203db3e258ed063d90363d28cd1cbfc5dd7e00baaa8e05c890dc1c8
MD5 bcb94e5eefa1a14e8079a963b7e58958
BLAKE2b-256 d6f40785affa01d058046697600ceff0de98c5afc3e221be4bde15e39b677823

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page