A Python package to convert .doc files to .txt using the Antiword binary.
Project description
Doc2txt-antiword
Overview
Antiword Converter is a Python package designed to convert Microsoft Word .doc files into plain text format using the Antiword binary. This package provides a simple interface for converting .doc files to .txt format, either returning the converted text or writing it to an output file.
Features:
- Converts
.docfiles to plain.txtformat. - Uses the Antiword binary for high-accuracy extraction.
- Simple API for converting documents with just one function.
- Error handling for common issues like missing files or incorrect formats.
Requirements
- Python 3.6+
- Antiword binary (included in the package)
Installation
To install doc2txt-antiword python package using pip:
pip install doc2txt-antiword
To install the doc2txt-antiword package, follow these steps:
-
Clone the repository:
git clone https://github.com/Bhola-kumar/doc2txt-antiword.git
-
Navigate into the directory:
cd doc2txt-py
-
Install the package using
pip:pip install .
This will install the
doc2txt-antiwordpackage and its dependencies.
Usage
Once installed, you can use the AntiwordConverter class to convert .doc files into .txt format.
Example: Converting a .doc File and Returning the Text
from doc2txt.converter import AntiwordConverter
# Initialize the converter
converter = AntiwordConverter()
# Path to the input .doc file
doc_path = '/path/to/your/document.doc'
# Convert the .doc file to text and print the result
converted_text = converter.convert_doc_to_txt(doc_path)
if converted_text:
print("Converted Text:")
print(converted_text)
else: # converted_text is None
print("Conversion failed.")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doc2txt_antiword-0.1.2.tar.gz.
File metadata
- Download URL: doc2txt_antiword-0.1.2.tar.gz
- Upload date:
- Size: 183.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07c9a04f59f5b237129c5a87eb8b2e5b62066d4abb7e41a979357063186fe8c0
|
|
| MD5 |
8d05364a14161df534e1400da5fedc3c
|
|
| BLAKE2b-256 |
8c1e724351f4bf1bf587e4278de4b44323909cb19bf2e9b0dbbdc891bb57d612
|
File details
Details for the file doc2txt_antiword-0.1.2-py3-none-any.whl.
File metadata
- Download URL: doc2txt_antiword-0.1.2-py3-none-any.whl
- Upload date:
- Size: 255.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b113f54dcc18001bdfb266f6f8d06dfcdffd4450175c13aba7bbd56dad7dad6a
|
|
| MD5 |
cbd260924da5d28c5c0335ce618d49a8
|
|
| BLAKE2b-256 |
4faabc1261eb162d39480fc47f8b2981717861afc86319b4926ab7172214cf74
|