GoText is a universal text extraction and preprocessing tool for python which supportss wide variety of document formats.
Project description
GoText v0.9
GoText is a universal text extraction and preprocessing tool for python which supportss wide variety of document formats.
Install
pip install gotext
How To Use
from gotext import GoDocument
# process single document
doc_path='docs/test.docx'
go_obj=GoDocument(doc_path=doc_path)
print(go_obj._text) #returns text extracted from document
print(go_obj.preprocess()) #preprocess document and returns preprocessed text
#process all the documents within a directory
docs_dir='docs/'
go_obj=GoDocument(docs_dir=docs_dir)
print(go_obj._text) #returns a list of texts extracted from all the documents
print(go_obj.preprocess()) #preprocess documents and returns a list of preprocessed text
Feedback / Queries
For any queries or feedback feel free to write to vaibhavhaswani@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gotext-0.9.5.tar.gz
(4.9 kB
view details)
File details
Details for the file gotext-0.9.5.tar.gz.
File metadata
- Download URL: gotext-0.9.5.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7209793f12d5641bf415adae0ec1f840682b1ba0deebbd0928c9e85ea1b001b8
|
|
| MD5 |
2732d0981aa873b2994a82cfa27a50cd
|
|
| BLAKE2b-256 |
0fa5271e711fad20c7b2c515ffbb2b6a778cfa599602ef3a8dff5d2ec361eeb3
|