Skip to main content

A python package to OCR data and extract text with insights too.

Project description

Upload Python Package

Table of Contents

About The Project

PYOSTIE is short for Python Open Source Text Information Extractor.

A very elegant and simple library to extract text from many file formats.

This module can extract text from PDfs, Office files, text files, Image files. Also, we generate an excel file which gives you some deeper insights into the text. We are now only extracting insights for Image and PDF formats only.( More to come soon.)

Installation

  1. Clone the repo
git clone https://github.com/anirudhpnbb/Pyostie.git
  1. Install using pip or pip3
pip3 install Pyostie

(or)

pip install Pyostie

Usage

import pyostie

# For image files with insights.

output = pyostie.extract(filename, insights=True, extension="jpg") #### Format of the extension can also be "tif" or "pnb"
df, text = output.start()

# For image files without insights.

output = pyostie.extract(filename, insights=False, extension="jpg")
text = output.start()

# For PDF files:

output = pyostie.extract(filename, extension="pdf")
text = output.start()

# For PDF files with insights:
output = pyostie.extract(filename, insights=True, extension="pdf")
text = output.start()


# For Excel files

output = pyostie.extract(filename, extension="xlsx")
text = output.start() 

# For word files

output = pyostie.extract(filename, extension="docx")
text = output.start()

Future Work

In this version we are only able to extract text from PDFs, Excel, TXT and CSV formats only. Soon, we will be adding doc, ppt, pptx and many more. Watch this space for more updates.

Contact

Anirudh Palaparthi - @anirudh8889 - pnbbanirudh - aniruddhapnbb@gmail.com

NSK - nskpramod - nsk.pramod@gmail.com

Project Link: https://github.com/anirudhpnbb/Pyostie

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Pyostie-2.4.1.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

Pyostie-2.4.1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file Pyostie-2.4.1.tar.gz.

File metadata

  • Download URL: Pyostie-2.4.1.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for Pyostie-2.4.1.tar.gz
Algorithm Hash digest
SHA256 96c99f594a02577d95babc9779ff82695914595dacdab7867a9b2d9bcea591ef
MD5 69f26762d34f0c0dbd48a8e63661f375
BLAKE2b-256 6a52e5102139ec3525b6bf9b3156d44cc159a7a2d7208d07de957dc002259d0a

See more details on using hashes here.

File details

Details for the file Pyostie-2.4.1-py3-none-any.whl.

File metadata

  • Download URL: Pyostie-2.4.1-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for Pyostie-2.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1b2969adafab9b0c784a326745af4cc35fa2ca2b7adb64e8b8aaa40320b8b520
MD5 75ca41506105a881c2840a82e15149cd
BLAKE2b-256 184677313f8e13b62036995b1a1c7674fdc16083accc0daac669364bd0487624

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page