Skip to main content

A python package to OCR data and extract text with insights too.

Project description

BUILD

Table of Contents

About The Project

PYOSTIE is short for Python Open Source Text Information Extractor.

A very elegant and simple library to extract text from many file formats.

This module can extract text from PDfs, Office files, text files, Image files. Also, we generate an excel file which gives you some deeper insights into the text. We are now only extracting insights for Image formats only.( More to come soon.)

Installation

  1. Clone the repo
git clone https://github.com/anirudhpnbb/Pyostie.git
  1. Install using pip or pip3
pip3 install pyostie

(or)

pip install pyostie

Usage

import pyostie

# For image files with insights.

output = pyostie.extract(filename, insights=True, extension="jpg") #### Format of the extension can also be "tif" or "pnb"
df, text = output.start()

# For image files without insights.

output = pyostie.extract(filename, insights=False, extension="jpg")
text = output.start()

# For PDF files:

output = pyostie.extract(filename, extension="pdf")
text = output.start()


# For Excel files

output = pyostie.extract(filename, extension="xlsx")
text = output.start() 

# For word files

output = pyostie.extract(filename, extension="docx")
text = output.start()

Future Work

In this version we are only able to extract text from PDFs, Excel, TXT and CSV formats only. Soon, we will be adding doc, ppt, pptx and many more. Watch this space for more updates.

Contact

Anirudh Palaparthi - @anirudh8889 - pnbbanirudh - aniruddhapnbb@gmail.com

NSK - nskpramod - nsk.pramod@gmail.com

Project Link: https://github.com/anirudhpnbb/Pyostie

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Pyostie-2.3.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

Pyostie-2.3-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file Pyostie-2.3.tar.gz.

File metadata

  • Download URL: Pyostie-2.3.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for Pyostie-2.3.tar.gz
Algorithm Hash digest
SHA256 7a7cecc17e993ec10efce64efffb4ded2745c31c6935e05c17eb3642e6f87200
MD5 a043b381ababdf3c60adeee14eea5cdf
BLAKE2b-256 96a658a63f2b9cac205854122e99fe0bf9fc42da3b282d49dedb731df22262c3

See more details on using hashes here.

File details

Details for the file Pyostie-2.3-py3-none-any.whl.

File metadata

  • Download URL: Pyostie-2.3-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for Pyostie-2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 358a4c0764d4cb65bd749a30d093c92bcdff790558392373c1e047e67dc25eaf
MD5 caebb91ce919e0fe8fbe14ab15d6f7f6
BLAKE2b-256 af89d31e337a49ec1a32fddf480ec0279e75fc2dda671d5dc51f365e5215a30d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page