A python package to OCR data and extract text with insights too.
Project description
Table of Contents
About The Project
PYOSTIE is short for Python Open Source Text Information Extractor.
A very elegant and simple library to extract text from many file formats.
This module can extract text from PDfs, Office files, text files, Image files. Also, we generate an excel file which gives you some deeper insights into the text. We are now only extracting insights for Image formats only.( More to come soon.)
Installation
- Clone the repo
git clone https://github.com/anirudhpnbb/Pyostie.git
- Install using pip or pip3
pip3 install pyostie
(or)
pip install pyostie
Usage
import pyostie
# For image files with insights.
output = pyostie.extract(filename, insights=True, extension="jpg") #### Format of the extension can also be "tif" or "pnb"
df, text = output.start()
# For image files without insights.
output = pyostie.extract(filename, insights=False, extension="jpg")
text = output.start()
# For PDF files:
output = pyostie.extract(filename, extension="pdf")
text = output.start()
# For Excel files
output = pyostie.extract(filename, extension="xlsx")
text = output.start()
# For word files
output = pyostie.extract(filename, extension="docx")
text = output.start()
Future Work
In this version we are only able to extract text from PDFs, Excel, TXT and CSV formats only. Soon, we will be adding doc, ppt, pptx and many more. Watch this space for more updates.
Contact
Anirudh Palaparthi - @anirudh8889 - pnbbanirudh - aniruddhapnbb@gmail.com
NSK - nskpramod - nsk.pramod@gmail.com
Project Link: https://github.com/anirudhpnbb/Pyostie
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Pyostie-2.3.tar.gz
.
File metadata
- Download URL: Pyostie-2.3.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a7cecc17e993ec10efce64efffb4ded2745c31c6935e05c17eb3642e6f87200 |
|
MD5 | a043b381ababdf3c60adeee14eea5cdf |
|
BLAKE2b-256 | 96a658a63f2b9cac205854122e99fe0bf9fc42da3b282d49dedb731df22262c3 |
File details
Details for the file Pyostie-2.3-py3-none-any.whl
.
File metadata
- Download URL: Pyostie-2.3-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 358a4c0764d4cb65bd749a30d093c92bcdff790558392373c1e047e67dc25eaf |
|
MD5 | caebb91ce919e0fe8fbe14ab15d6f7f6 |
|
BLAKE2b-256 | af89d31e337a49ec1a32fddf480ec0279e75fc2dda671d5dc51f365e5215a30d |