Skip to main content

A Python package to convert .doc files to .txt using the Antiword binary.

Project description

Doc2txt-antiword

Overview

Antiword Converter is a Python package designed to convert Microsoft Word .doc files into plain text format using the Antiword binary. This package provides a simple interface for converting .doc files to .txt format, either returning the converted text or writing it to an output file.

Features:

  • Converts .doc files to plain .txt format.
  • Uses the Antiword binary for high-accuracy extraction.
  • Simple API for converting documents with just one function.
  • Error handling for common issues like missing files or incorrect formats.

Requirements

  • Python 3.6+
  • Antiword binary (included in the package)

Installation

To install doc2txt-antiword python package using pip:

pip install doc2txt-antiword

pypi.org

To install the doc2txt-antiword package, follow these steps:

  1. Clone the repository:

    git clone https://github.com/Bhola-kumar/doc2txt-antiword.git
    
  2. Navigate into the directory:

    cd doc2txt-antiword
    
  3. Install the package using pip:

    pip install .
    

    This will install the doc2txt-antiword package and its dependencies.

Usage

Once installed, you can use the AntiwordConverter class to convert .doc files into .txt format.

Example: Converting a .doc File and Returning the Text

 from doc2txt.converter import AntiwordConverter

# Initialize the converter
converter = AntiwordConverter()

# Path to the input .doc file
doc_path = '/path/to/your/document.doc'

# Convert the .doc file to text and print the result
converted_text = converter.convert_doc_to_txt(doc_path)

if converted_text:
    print("Converted Text:")
    print(converted_text)
else:  # converted_text is None
    print("Conversion failed.")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc2txt_antiword-0.1.3.tar.gz (183.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doc2txt_antiword-0.1.3-py3-none-any.whl (255.2 kB view details)

Uploaded Python 3

File details

Details for the file doc2txt_antiword-0.1.3.tar.gz.

File metadata

  • Download URL: doc2txt_antiword-0.1.3.tar.gz
  • Upload date:
  • Size: 183.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for doc2txt_antiword-0.1.3.tar.gz
Algorithm Hash digest
SHA256 a7b5648d7fe8f5c734d40460ff147fb27153543a3845e1978640b345b130c27c
MD5 41549d226c7eca446a1a39e04bec9818
BLAKE2b-256 ff1c001e8798f6fceb5163c2b57beb0bc1eb22c2c4a5f9761a6065007090588d

See more details on using hashes here.

File details

Details for the file doc2txt_antiword-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for doc2txt_antiword-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f0327eef7ebf1c4e7d8c7a85cef0c66c1a3f4abfe5da56de1b2855627b645a14
MD5 40979f08c2bb53ab0a51114676abfaf5
BLAKE2b-256 2d0f627cd82eb19a058e6639db11a4127d83c2adb7af821c4fffb3f65603bfae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page