Skip to main content

GoText is a universal text extraction and preprocessing tool for python which supportss wide variety of document formats.

Project description

GoText v0.9

GoText is a universal text extraction and preprocessing tool for python which supportss wide variety of document formats.

Install

pip install gotext

How To Use

from gotext import GoDocument

# process single document
doc_path='docs/test.docx'
go_obj=GoDocument(doc_path=doc_path)
print(go_obj._text) #returns text extracted from document
print(go_obj.preprocess()) #preprocess document and returns preprocessed text

#process all the documents within a directory
docs_dir='docs/'
go_obj=GoDocument(docs_dir=docs_dir)
print(go_obj._text) #returns a list of texts extracted from all the documents
print(go_obj.preprocess()) #preprocess documents and returns a list of preprocessed text

Feedback / Queries

For any queries or feedback feel free to write to vaibhavhaswani@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gotext-0.9.5.tar.gz (4.9 kB view details)

Uploaded Source

File details

Details for the file gotext-0.9.5.tar.gz.

File metadata

  • Download URL: gotext-0.9.5.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for gotext-0.9.5.tar.gz
Algorithm Hash digest
SHA256 7209793f12d5641bf415adae0ec1f840682b1ba0deebbd0928c9e85ea1b001b8
MD5 2732d0981aa873b2994a82cfa27a50cd
BLAKE2b-256 0fa5271e711fad20c7b2c515ffbb2b6a778cfa599602ef3a8dff5d2ec361eeb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page