Skip to main content

A library for converting DOCX documents to HTML and plain text

Project description

Docx Parser and Converter 📄✨

A powerful library for converting DOCX documents into HTML and plain text, with detailed parsing of document properties and styles.

Table of Contents

Introduction 🌟

Welcome to the DOCX-HTML-TXT Converter project! This library allows you to easily convert DOCX documents into HTML and plain text formats, extracting detailed properties and styles using Pydantic models.

Project Overview 🛠️

The project is structured to parse DOCX files, convert their content into structured data using Pydantic models, and provide conversion utilities to transform this data into HTML or plain text.

Key Features 🌟

  • Convert DOCX documents to HTML or plain text.
  • Parse and extract detailed document properties and styles.
  • Structured data representation using Pydantic models.

Installation 💾

To install the library, you can use pip. (Add the pip install command manually)

pip install docx-parser-converter

Usage 🚀

Importing the Library

To start using the library, import the necessary modules:

from docx_parser_converter.docx_to_html import DocxToHtmlConverter
from docx_parser_converter.docx_to_txt import DocxToTxtConverter
from docx_parser_converter.docx_parsers.utils import read_binary_from_file_path

Quick Start Guide 📖

  1. Convert to HTML:

    from docx_parser_converter.docx_to_html import DocxToHtmlConverter
    from docx_parser_converter.docx_parsers.utils import read_binary_from_file_path
    
     docx_path = "path_to_your_docx_file.docx"
     html_output_path = "output.html"
    
     docx_file_content = read_binary_from_file_path(docx_path)
    
     converter = DocxToHtmlConverter(docx_file_content, use_default_values=True)
     html_output = converter.convert_to_html()
     converter.save_html_to_file(html_output, html_output_path)
    
  2. Convert to Plain Text:

    from docx_parser_converter.docx_to_txt import DocxToTxtConverter
    from docx_parser_converter.docx_parsers.utils import read_binary_from_file_path
    
     docx_path = "path_to_your_docx_file.docx"
     txt_output_path = "output.txt"
    
     docx_file_content = read_binary_from_file_path(docx_path)
    
     converter = DocxToTxtConverter(docx_file_content, use_default_values=True)
     txt_output = converter.convert_to_txt(indent=True)
     converter.save_txt_to_file(txt_output, txt_output_path)
    

Examples 📚

Original DOCX File

Original DOCX File in LibreOffice Original DOCX File in LibreOffice

Converted to HTML

Converted HTML Output Converted HTML Output

Converted to Plain Text

Converted TXT Output

API Reference 📜

For detailed API documentation, please visit our Read the Docs page.

Enjoy using DOCX-HTML-TXT Converter! 🚀✨

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docx-parser-converter-0.5.tar.gz (487.4 kB view details)

Uploaded Source

Built Distribution

docx_parser_converter-0.5-py3-none-any.whl (62.2 kB view details)

Uploaded Python 3

File details

Details for the file docx-parser-converter-0.5.tar.gz.

File metadata

  • Download URL: docx-parser-converter-0.5.tar.gz
  • Upload date:
  • Size: 487.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.0

File hashes

Hashes for docx-parser-converter-0.5.tar.gz
Algorithm Hash digest
SHA256 efdd8bf81031d69a228e9e1ee437badd418d91578bdeeebb79777afc0f5caa89
MD5 23194da5d10ad97a782fd41974c8caa1
BLAKE2b-256 751c2f54c43c2207ea2281077ccc7e6e7123f719b590ac3b33fbd00687641b2c

See more details on using hashes here.

File details

Details for the file docx_parser_converter-0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for docx_parser_converter-0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 324eea537f392c385919cbe3eb7bec019de1624eca5f3b6a44c32c6e9bba9965
MD5 97e52b2043cdb05b0aa4ffeccaf1cd97
BLAKE2b-256 db4b826be9289e6f241e10bd872ff308f72bf5027f1aeb1453d0bd35dad2e11f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page