A simple .doc to string converter for python
Project description
doc2python
Extracts the text from .doc files as a string. This Project is early in development and only has very limited functionality
Installation
pip install doc2python
Use
from doc2python import reader
text = reader.toString('path/to/file.doc')
'doc2python' reads the UTF-8 encoded bitstream contained in the file and converts it to a readable string. At this point in time some special characters are not supported and metadata might get extracted alongside the text.
Roadmap
- support for more special characters
- add a parameter, which allows for user input of byte -> character conversion sheets
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
doc2python-0.0.7.tar.gz
(76.5 kB
view hashes)
Built Distribution
Close
Hashes for doc2python-0.0.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f821d901b9f6c951f25a06a950211c336f71a5e61a252788c5aea44a59e26e85 |
|
MD5 | 4e1e419e89e5fc1470c24f233e20ff3f |
|
BLAKE2b-256 | 7710462db23b8a85819771d4622ad5063c7b252817bb559a8b5c9bd55e96e3bb |