Skip to main content

A python package to OCR data and extract text with insights too.

Project description

Upload Python Package

Table of Contents

About The Project

PYOSTIE is short for Python Open Source Text Information Extractor.

A very elegant and simple library to extract text from many file formats.

This module can extract text from PDfs, Office files, text files, Image files. Also, we generate an excel file which gives you some deeper insights into the text. We are now only extracting insights for Image formats only.( More to come soon.)

Installation

  1. Clone the repo
git clone https://github.com/anirudhpnbb/Pyostie.git
  1. Install using pip or pip3
pip3 install pyostie

(or)

pip install pyostie

Usage

import pyostie

# For image files with insights.

output = pyostie.extract(filename, insights=True, extension="jpg") #### Format of the extension can also be "tif" or "pnb"
df, text = output.start()

# For image files without insights.

output = pyostie.extract(filename, insights=False, extension="jpg")
text = output.start()

# For PDF files:

output = pyostie.extract(filename, extension="pdf")
text = output.start()


# For Excel files

output = pyostie.extract(filename, extension="xlsx")
text = output.start() 

# For word files

output = pyostie.extract(filename, extension="docx")
text = output.start()

Future Work

In this version we are only able to extract text from PDFs, Excel, TXT and CSV formats only. Soon, we will be adding doc, ppt, pptx and many more. Watch this space for more updates.

Contact

Anirudh Palaparthi - @anirudh8889 - pnbbanirudh - aniruddhapnbb@gmail.com

NSK - nskpramod - nsk.pramod@gmail.com

Project Link: https://github.com/anirudhpnbb/Pyostie

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Pyostie-2.4.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

Pyostie-2.4-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file Pyostie-2.4.tar.gz.

File metadata

  • Download URL: Pyostie-2.4.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for Pyostie-2.4.tar.gz
Algorithm Hash digest
SHA256 e01a9a4ba3bf0992757a36ca3457ea7d0b7b5653c5c5a4c2ff006dd859a7e0ef
MD5 03caac58eb6205b67c2d543845022f97
BLAKE2b-256 492b09397da09ee5f8ff1ddc02e6c4cab2c14299c7cbe7761cc3acc562f95424

See more details on using hashes here.

File details

Details for the file Pyostie-2.4-py3-none-any.whl.

File metadata

  • Download URL: Pyostie-2.4-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for Pyostie-2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5d8dec21f4d50e7d09ebc81aff7aa8aab92da1e3a154c744a2582a6a395e00d0
MD5 f339c64ed4993560a3b582737972651e
BLAKE2b-256 3a943afa0e921014fe0c5905444be16533c335858c0f5f5816057af7dac05ea3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page