Kannada OCR with column Separation

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

AksharaJaana

AksharaJaana is the package which uses tesseract ocr in backend to convert the kannada text to editable format.You can use following sample code in ubuntu.The Special feature of this is it can separate columns in page

The Requirements [ Conda envirnment is prefered for the smooth use ]

python >= 3.6.9
opencv-python >= 4.2.0.32
numpy==1.18.4
pdf2image==1.13.1
pytesseract
tesseract
poppler

Details for installation (Complete installation including requirements)

--------------------------------------------------------------Ubuntu------------------------------------------------------------

Installing tesseract-ocr in the sysytem ---> open terminal ---> type-> sudo apt-get update -y -> press enter ---> type-> sudo apt-get install -y tesseract-ocr -> press enter
Installing poppler in the system ---> open terminal ---> type-> sudo apt-get install -y poppler-utils -> press enter
Installing python and pip (if pip is not installed) ---> open terminal ---> type-> sudo apt install python==3.6.9
Installing packages for AksharaJaana ---> open terminal ---> type-> pip install opencv-python==4.2.0.32 --> press enter ---> type-> pip install numpy==1.18.4 --> press enter ---> type-> pip install pdf2image==1.13.1 --> press enter ---> type-> pip install AksharaJaana==0.1.2.9 --> press enter ---> type-> pip install pytesseract --> press enter

------------------------------------------------------------- Windows-----------------------------------------------------------

Installing tesseract-ocr in the system ---> go to the website --> https://github.com/UB-Mannheim/tesseract/wiki ---> click on --> tesseract-ocr-w64-setup-v5.0.0-alpha.20200328.exe (64 bit) resp. ---> after downloading open that file --> hit enter --> you will give an option to choose the languages. --> choose kannada in both script and lang ---> next after installation of that files. check if this folder is present C:\Program Files\Tesseract-OCR\ ---> if yes, then search for follow below procedure

Add C:\Program Files\Tesseract-OCR\ to your system PATH by doing the following: 1---> Click on the Windows start button, search for Edit the system environment variables, click on Environment Variables..., 2---> under System variables, look for and double-click on PATH, click on New, 3---> then add C:\Program Files\Tesseract-OCR, click OK.

---> if no, manually add the folder tesseract-ocr to the Program Files in the C drive which must be present at the download section (after extraction) and follow the same procedure

Installing poppler in the system ---> go to this page, http://blog.alivate.com.au/poppler-windows/ ---> click on --> poppler-0.54_x86 ---> after downloading open that file --> hit enter --> you will give an option to choose the languages. --> choose kannada in both script and lang ---> next after installation of that files. check if this folder is present C:\Program Files\Tesseract-OCR\ ---> if yes, then search for follow below procedure Add C:\Program Files\poppler-0.68.0_x86\bin to your system PATH by doing the following: 1 ---> Click on the Windows start button, search for Edit the system environment variables, click on Environment Variables..., 2 ---> under System variables, look for and double-click on PATH, click on New, 3 ---> then add C:\Users\Program Files\poppler-0.68.0_x86\bin, click OK. ---> if no, manually add the folder poppler-0.68.0_x86 to the Program Files in the C drive which must be present at the download section (after extraction) and follow the same procedure
Installing python and pip in the system (If pip is not installed) ---> go to this page and download python (any version >= 3.6.9) ---> now click allow changes as soon as it pops up a window for allowing changes. ---> now click next and accept and finally ok. here the pip is installed
Installing packages for AksharaJaana ---> open command prompt ---> type-> pip install opencv-python==4.2.0.32 --> press enter ---> type-> pip install numpy==1.18.4 --> press enter ---> type-> pip install pdf2image==1.13.1 --> press enter ---> type-> pip install AksharaJaana==0.1.2.9 --> press enter ---> type-> pip install pytesseract --> press enter
Reboot the system before starting to use

Python Script

Its in test.py in Github Repo

import AksharaJaana.main as ak

text = ak.ocr_engine('/home/navaneeth/Desktop/NandD/OCR_kannada/CamScanner 06-28-2020 12.12.10.pdf')

from AksharaJaana.utils import utils

u = utils()

u.write_as_RTF(text, saving_path='/home/navaneeth/Desktop/1.rtf')

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.0.1.3

Nov 8, 2023

1.0.1.2

Nov 8, 2023

1.0.1.1

Apr 15, 2023

1.0.0.1

Jun 13, 2022

0.2.1.0

Apr 15, 2023

0.1.9.9

Sep 14, 2021

0.1.9.3

Jun 1, 2021

0.1.8

Feb 5, 2021

0.1.7

Dec 23, 2020

0.1.6

Nov 3, 2020

0.1.5

Oct 29, 2020

This version

0.1.2.9

Oct 29, 2020

0.1.2.8

Oct 29, 2020

0.1.2.7

Oct 29, 2020

0.1.2.6

Oct 29, 2020

0.1.2.4

Aug 24, 2020

0.1.2.3

Aug 24, 2020

0.1.2.2

Aug 20, 2020

0.1.2.1

Aug 20, 2020

0.1.0

Aug 6, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

AksharaJaana-0.1.2.9.tar.gz (5.8 kB view hashes)

Uploaded Oct 29, 2020 Source

Hashes for AksharaJaana-0.1.2.9.tar.gz

Hashes for AksharaJaana-0.1.2.9.tar.gz
Algorithm	Hash digest
SHA256	`426900a7a3f5ea0dd45d3b7f4018d5b47e0cddadf14162a6c57c6c2278f90c69`
MD5	`5245df35c89e732737e8335e0e0dbd9a`
BLAKE2b-256	`75c42f35de27bd35035d0c18c6ef9c378d6e44d49bf77d3c557aafacb863c1e5`