Skip to main content

Python wrapper for antiword with bundled binary and data files

Project description

pyantiword

A Python wrapper for antiword with the antiword binary and data files bundled for easy use in any environment.

Features

  • Bundles the antiword binary and required data files
  • No external dependencies or system requirements
  • Simple Python interface to extract text from .doc (Microsoft Word) files

Installation

pip install pyantiword

Usage

from pyantiword.antiword_wrapper import extract_text

# Extract text from a .doc file
text = extract_text("path/to/document.doc")
print(text)

Requirements

  • Python 3.6+
  • No external dependencies

How it works

This package includes the antiword binary and its required data files. When you install pyantiword, everything you need is included—no need to install antiword separately.

Building and Publishing

To be able to build, for now, the build_antiword.sh script only only guaranteed to work on Ubuntu, with Python 3.6 and above. In the future, we might make this more stable with a mandatory virtual environment step, or a docker container.

To build and publish pyantiword to PyPI, follow these steps:

1. Build the antiword binary

Before packaging, ensure the antiword binary and data files are present by running:

./build_antiword.sh

2. Install build tools

If you haven't already, install the required Python packaging tools:

pip install --upgrade build twine

3. Build the package

This will create both a source distribution and a wheel in the dist/ directory:

python -m build

4. Publish to PyPI

Upload the package to PyPI using Twine:

twine upload dist/*

You will need a PyPI account and your credentials to complete this step. Your credentials being, most likely, your API token.


Note:
Always ensure the antiword binary and data files are up-to-date in the pyantiword/ directory before building the package.

License

MIT License

Author

Vitor Hugo Moreira Reis

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyantiword-0.1.2.tar.gz (144.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyantiword-0.1.2-py3-none-any.whl (208.5 kB view details)

Uploaded Python 3

File details

Details for the file pyantiword-0.1.2.tar.gz.

File metadata

  • Download URL: pyantiword-0.1.2.tar.gz
  • Upload date:
  • Size: 144.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for pyantiword-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4e64b1994b4e059ed8bbf25b49880df4d761e515a3b6fa0934023d325cb33eb3
MD5 5956f824883e83a776ce836011c58bef
BLAKE2b-256 91ac42b884dfc042459cb056172b39afe005c9f17f9a85fdd608e6e8518f52f0

See more details on using hashes here.

File details

Details for the file pyantiword-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pyantiword-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 208.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for pyantiword-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 da2f8810f14b4f48ca7fd95e8580cb057b0a30147b4f33bb48d12f0a97069b6b
MD5 c289c6afaad1708347990a21f15b1db6
BLAKE2b-256 4872319d8589e8f93e654bcbb26d8b5391767f4ef347681bb90467adee191dc9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page