Skip to main content

A python package to OCR data and extract text with insights too.

Project description

Table of Contents

About The Project

PYOSTIE is short for Python Open Source Text Information Extractor.

A very elegant and simple library to extract text from many file formats.

This module can extract text from PDfs, Office files, text files, Image files. Also, we generate an excel file which gives you some deeper insights into the text. We are now only extracting insights for Image formats only.( More to come soon.)

Installation

  1. Clone the repo
git clone https://github.com/anirudhpnbb/Pyostie.git
  1. Install using pip or pip3
pip3 install pyostie

(or)

pip install pyostie

Usage

import pyostie

# For image files with insights.

output = pyostie.extract(filename, insights=True, extension="jpg") #### Format of the extension can also be "tif" or "pnb"
df, text = output.start()

# For image files without insights.

output = pyostie.extract(filename, insights=False, extension="jpg")
text = output.start()

# For PDF files:

output = pyostie.extract(filename, extension="pdf")
text = output.start()


# For Excel files

output = pyostie.extract(filename, extension="xlsx")
text = output.start() 

# For word files

output = pyostie.extract(filename, extension="docx")
text = output.start()

Future Work

In this version we are only able to extract text from PDFs, Excel, TXT and CSV formats only. Soon, we will be adding doc, ppt, pptx and many more. Watch this space for more updates.

Contact

Anirudh Palaparthi - @anirudh8889 - pnbbanirudh - aniruddhapnbb@gmail.com

NSK - nskpramod - nsk.pramod@gmail.com

Project Link: https://github.com/anirudhpnbb/Pyostie

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

Pyostie-1.1-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file Pyostie-1.1-py3-none-any.whl.

File metadata

  • Download URL: Pyostie-1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9

File hashes

Hashes for Pyostie-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8b9429523736a4ffb1f7330fb656dc412168d42a6151649fc3fbd148dcb18601
MD5 554bee022f53c30d7b2891bf48c539a5
BLAKE2b-256 8a6e441336039f122ea405f0e7fc8fdef2c282f641765515df0323769454cd74

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page