Skip to main content

A python package to OCR data and extract text with insights too.

Project description

Table of Contents

About The Project

PYOSTIE is short for Python Open Source Text Information Extractor.

A very elegant and simple library to extract text from many file formats.

This module can extract text from PDfs, Office files, text files, Image files. Also, we generate an excel file which gives you some deeper insights into the text. We are now only extracting insights for Image formats only.( More to come soon.)

Installation

  1. Clone the repo
git clone https://github.com/anirudhpnbb/Pyostie.git
  1. Install using pip or pip3
pip3 install pyostie

(or)

pip install pyostie

Usage

import pyostie

# For image files with insights.

output = pyostie.extract(filename, insights=True, extension="jpg") #### Format of the extension can also be "tif" or "pnb"
df, text = output.start()

# For image files without insights.

output = pyostie.extract(filename, insights=False, extension="jpg")
text = output.start()

# For PDF files:

output = pyostie.extract(filename, extension="pdf")
text = output.start()


# For Excel files

output = pyostie.extract(filename, extension="xlsx")
text = output.start() 

# For word files

output = pyostie.extract(filename, extension="docx")
text = output.start()

Future Work

In this version we are only able to extract text from PDFs, Excel, TXT and CSV formats only. Soon, we will be adding doc, ppt, pptx and many more. Watch this space for more updates.

Contact

Anirudh Palaparthi - @anirudh8889 - pnbbanirudh - aniruddhapnbb@gmail.com

NSK - nskpramod - nsk.pramod@gmail.com

Project Link: https://github.com/anirudhpnbb/Pyostie

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Pyostie-1.0.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

Pyostie-1.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file Pyostie-1.0.tar.gz.

File metadata

  • Download URL: Pyostie-1.0.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9

File hashes

Hashes for Pyostie-1.0.tar.gz
Algorithm Hash digest
SHA256 f8ff40679bfd93c752d4acb8073a9c6b2564124582300b9f3b0fb887a716354a
MD5 ecf724b2fa6618603d871e50d92808f7
BLAKE2b-256 ccffd57c7f5c89f0c021878ac0fa5bd5814d7007ff3d6a68fe486d1719733642

See more details on using hashes here.

File details

Details for the file Pyostie-1.0-py3-none-any.whl.

File metadata

  • Download URL: Pyostie-1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9

File hashes

Hashes for Pyostie-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 02fe13aa3a135cd38e661e8011eea04e875559ae5fe4ae5c01e879bdb2c91b38
MD5 bd2691714396bd39dc3aaf63128aefb9
BLAKE2b-256 d4a6d385df75307c1891db7bf7fe232f875adb0165f753e6b0bb0bc7867f9394

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page