Skip to main content

A Python package to convert .doc files to .txt using the Antiword binary.

Project description

Doc2txt-antiword

Overview

Antiword Converter is a Python package designed to convert Microsoft Word .doc files into plain text format using the Antiword binary. This package provides a simple interface for converting .doc files to .txt format, either returning the converted text or writing it to an output file.

Features:

  • Converts .doc files to plain .txt format.
  • Uses the Antiword binary for high-accuracy extraction.
  • Simple API for converting documents with just one function.
  • Error handling for common issues like missing files or incorrect formats.

Requirements

  • Python 3.6+
  • Antiword binary (included in the package)

Installation

To install doc2txt-antiword python package using pip:

pip install doc2txt-antiword

pypi.org

To install the doc2txt-antiword package, follow these steps:

  1. Clone the repository:

    git clone https://github.com/Bhola-kumar/doc2txt-antiword.git
    
  2. Navigate into the directory:

    cd doc2txt-py
    
  3. Install the package using pip:

    pip install .
    

    This will install the doc2txt-antiword package and its dependencies.

Usage

Once installed, you can use the AntiwordConverter class to convert .doc files into .txt format.

Example: Converting a .doc File and Returning the Text

 from doc2txt.converter import AntiwordConverter

# Initialize the converter
converter = AntiwordConverter()

# Path to the input .doc file
doc_path = '/path/to/your/document.doc'

# Convert the .doc file to text and print the result
converted_text = converter.convert_doc_to_txt(doc_path)

if converted_text:
    print("Converted Text:")
    print(converted_text)
else:  # converted_text is None
    print("Conversion failed.")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc2txt_antiword-0.1.2.tar.gz (183.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doc2txt_antiword-0.1.2-py3-none-any.whl (255.1 kB view details)

Uploaded Python 3

File details

Details for the file doc2txt_antiword-0.1.2.tar.gz.

File metadata

  • Download URL: doc2txt_antiword-0.1.2.tar.gz
  • Upload date:
  • Size: 183.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for doc2txt_antiword-0.1.2.tar.gz
Algorithm Hash digest
SHA256 07c9a04f59f5b237129c5a87eb8b2e5b62066d4abb7e41a979357063186fe8c0
MD5 8d05364a14161df534e1400da5fedc3c
BLAKE2b-256 8c1e724351f4bf1bf587e4278de4b44323909cb19bf2e9b0dbbdc891bb57d612

See more details on using hashes here.

File details

Details for the file doc2txt_antiword-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for doc2txt_antiword-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b113f54dcc18001bdfb266f6f8d06dfcdffd4450175c13aba7bbd56dad7dad6a
MD5 cbd260924da5d28c5c0335ce618d49a8
BLAKE2b-256 4faabc1261eb162d39480fc47f8b2981717861afc86319b4926ab7172214cf74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page