Skip to main content

A simple .doc to string converter for python

Project description

doc2python

Extracts the text from .doc files as a string. This Project is early in development and only has very limited functionality

Installation

    pip install doc2python

Use

from doc2python import reader

text = reader.toString('path/to/file.doc')

'doc2python' reads the UTF-8 encoded bitstream contained in the file and converts it to a readable string. At this point in time some special characters are not supported and metadata might get extracted alongside the text.

Roadmap

  • support for more special characters
  • add a parameter, which allows for user input of byte -> character conversion sheets

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc2python-0.0.7.tar.gz (76.5 kB view hashes)

Uploaded Source

Built Distribution

doc2python-0.0.7-py3-none-any.whl (3.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page