Python wrapper for antiword with bundled binary and data files
Project description
pyantiword
A Python wrapper for antiword with the antiword binary and data files bundled for easy use in any environment.
Features
- Bundles the
antiwordbinary and required data files - No external dependencies or system requirements
- Simple Python interface to extract text from
.doc(Microsoft Word) files
Installation
pip install pyantiword
Usage
from pyantiword.antiword_wrapper import extract_text
# Extract text from a .doc file
text = extract_text("path/to/document.doc")
print(text)
Requirements
- Python 3.6+
- No external dependencies
How it works
This package includes the antiword binary and its required data files. When you install pyantiword, everything you need is included—no need to install antiword separately.
Building and Publishing
To be able to build, for now, the build_antiword.sh script only only guaranteed to work on Ubuntu, with Python 3.6 and above.
In the future, we might make this more stable with a mandatory virtual environment step, or a docker container.
To build and publish pyantiword to PyPI, follow these steps:
1. Build the antiword binary
Before packaging, ensure the antiword binary and data files are present by running:
./build_antiword.sh
2. Install build tools
If you haven't already, install the required Python packaging tools:
pip install --upgrade build twine
3. Build the package
This will create both a source distribution and a wheel in the dist/ directory:
python -m build
4. Publish to PyPI
Upload the package to PyPI using Twine:
twine upload dist/*
You will need a PyPI account and your credentials to complete this step. Your credentials being, most likely, your API token.
Note:
Always ensure the antiword binary and data files are up-to-date in the pyantiword/ directory before building the package.
License
Author
Vitor Hugo Moreira Reis
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyantiword-0.1.2.tar.gz.
File metadata
- Download URL: pyantiword-0.1.2.tar.gz
- Upload date:
- Size: 144.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e64b1994b4e059ed8bbf25b49880df4d761e515a3b6fa0934023d325cb33eb3
|
|
| MD5 |
5956f824883e83a776ce836011c58bef
|
|
| BLAKE2b-256 |
91ac42b884dfc042459cb056172b39afe005c9f17f9a85fdd608e6e8518f52f0
|
File details
Details for the file pyantiword-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pyantiword-0.1.2-py3-none-any.whl
- Upload date:
- Size: 208.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da2f8810f14b4f48ca7fd95e8580cb057b0a30147b4f33bb48d12f0a97069b6b
|
|
| MD5 |
c289c6afaad1708347990a21f15b1db6
|
|
| BLAKE2b-256 |
4872319d8589e8f93e654bcbb26d8b5391767f4ef347681bb90467adee191dc9
|