Skip to main content

A library for converting DOCX documents to HTML and plain text

Project description

Docx Parser and Converter 📄✨

A powerful library for converting DOCX documents into HTML and plain text, with detailed parsing of document properties and styles.

Table of Contents

Introduction 🌟

Welcome to the DOCX-HTML-TXT Converter project! This library allows you to easily convert DOCX documents into HTML and plain text formats, extracting detailed properties and styles using Pydantic models.

Project Overview 🛠️

The project is structured to parse DOCX files, convert their content into structured data using Pydantic models, and provide conversion utilities to transform this data into HTML or plain text.

Key Features 🌟

  • Convert DOCX documents to HTML or plain text.
  • Parse and extract detailed document properties and styles.
  • Structured data representation using Pydantic models.

Installation 💾

To install the library, you can use pip. (Add the pip install command manually)

pip install docx-parser-converter

Usage 🚀

Importing the Library

To start using the library, import the necessary modules:

from docx_html_txt.docx_to_html import DocxToHtmlConverter
from docx_html_txt.docx_to_txt import DocxToTxtConverter
from docx_html_txt.docx_parsers.utils import read_binary_from_file_path

Quick Start Guide 📖

  1. Convert to HTML:

    from docx_html_txt.docx_to_html import DocxToHtmlConverter
    from docx_html_txt.docx_parsers.utils import read_binary_from_file_path
    
     docx_path = "path_to_your_docx_file.docx"
     html_output_path = "output.html"
    
     docx_file_content = read_binary_from_file_path(docx_path)
    
     converter = DocxToHtmlConverter(docx_file_content, use_default_values=True)
     html_output = converter.convert_to_html()
     converter.save_html_to_file(html_output, html_output_path)
    
  2. Convert to Plain Text:

    from docx_html_txt.docx_to_txt import DocxToTxtConverter
    from docx_html_txt.docx_parsers.utils import read_binary_from_file_path
    
     docx_path = "path_to_your_docx_file.docx"
     txt_output_path = "output.txt"
    
     docx_file_content = read_binary_from_file_path(docx_path)
    
     converter = DocxToTxtConverter(docx_file_content, use_default_values=True)
     txt_output = converter.convert_to_txt(indent=True)
     converter.save_txt_to_file(txt_output, txt_output_path)
    

Examples 📚

Original DOCX File

Original DOCX File in LibreOffice Original DOCX File in LibreOffice

Converted to HTML

Converted HTML Output Converted HTML Output

Converted to Plain Text

Converted TXT Output

API Reference 📜

For detailed API documentation, please visit our Read the Docs page.

Enjoy using DOCX-HTML-TXT Converter! 🚀✨

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docx-parser-converter-0.4.tar.gz (487.1 kB view details)

Uploaded Source

Built Distribution

docx_parser_converter-0.4-py3-none-any.whl (61.9 kB view details)

Uploaded Python 3

File details

Details for the file docx-parser-converter-0.4.tar.gz.

File metadata

  • Download URL: docx-parser-converter-0.4.tar.gz
  • Upload date:
  • Size: 487.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.0

File hashes

Hashes for docx-parser-converter-0.4.tar.gz
Algorithm Hash digest
SHA256 c45cad80417cadc1397e90de177e76139637d35575f8103f9aeddc2cf0c503ec
MD5 95955b97c65849a281644853b44ada56
BLAKE2b-256 54e724bdaf366012cb7a967c38df732c2499d57763c1310bc488a916e6ec81c7

See more details on using hashes here.

File details

Details for the file docx_parser_converter-0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for docx_parser_converter-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8d30d36d8e6e3db25dba7f12e33237c883937b082d4d2b7201106e87830745c6
MD5 299b061650f64b5137453f261cab4aee
BLAKE2b-256 5ce317f6376673d7212a5fb4c6cb9b218b9e28145447c99a5252001f6e8f6ae1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page