Skip to main content

Kannada OCR with column Separation

Project description

AksharaJaana

AksharaJaana is an Optical Character Recognition software that is jointly developed by the Department of Electronics and Communication, NMAM Institute of Technology, Nitte, Karnataka, India and Kannada GaNaka Parishattu, Bengaluru. The focus is to add more features to the engine and make it more user friendly

The Requirements [ Conda environment is prefered for the smooth use ]

python >= 3.6.9
opencv-python >= 4.2.0.32
numpy==1.18.4
pdf2image==1.13.1
pytesseract
tesseract
poppler

Details for installation (Complete installation including requirements)

--------------------------------------------------------------Ubuntu------------------------------------------------------------

  1. Installing tesseract-ocr in the sysytem ---> open terminal ---> type-> sudo apt-get update -y -> press enter ---> type-> sudo apt-get install -y tesseract-ocr -> press enter

  2. Installing poppler in the system ---> open terminal ---> type-> sudo apt-get install -y poppler-utils -> press enter

  3. Installing python and pip (if pip is not installed) ---> open terminal ---> type-> sudo apt install python==3.6.9

  4. Installing packages for AksharaJaana ---> open terminal ---> type-> pip install opencv-python==4.2.0.32 --> press enter ---> type-> pip install numpy==1.18.4 --> press enter ---> type-> pip install pdf2image==1.13.1 --> press enter ---> type-> pip install AksharaJaana==0.1.5 --> press enter ---> type-> pip install pytesseract --> press enter

------------------------------------------------------------- Windows-----------------------------------------------------------

  1. Installing tesseract-ocr in the system ---> go to the website --> https://github.com/UB-Mannheim/tesseract/wiki ---> click on --> tesseract-ocr-w64-setup-v5.0.0-alpha.20200328.exe (64 bit) resp. ---> after downloading open that file --> hit enter --> you will give an option to choose the languages. --> choose kannada in both script and lang ---> next after installation of that files. check if this folder is present C:\Program Files\Tesseract-OCR\ ---> if yes, then search for follow below procedure

    Add C:\Program Files\Tesseract-OCR\ to your system PATH by doing the following: 1---> Click on the Windows start button, search for Edit the system environment variables, click on Environment Variables..., 2---> under System variables, look for and double-click on PATH, click on New, 3---> then add C:\Program Files\Tesseract-OCR, click OK.

---> if no, manually add the folder tesseract-ocr to the Program Files in the C drive which must be present at the download section (after extraction) and follow the same procedure

  1. Installing poppler in the system ---> go to this page, http://blog.alivate.com.au/poppler-windows/ ---> click on --> poppler-0.54_x86 ---> after downloading open that file --> hit enter --> you will give an option to choose the languages. --> choose kannada in both script and lang ---> next after installation of that files. check if this folder is present C:\Program Files\Tesseract-OCR\ ---> if yes, then search for follow below procedure Add C:\Program Files\poppler-0.68.0_x86\bin to your system PATH by doing the following: 1 ---> Click on the Windows start button, search for Edit the system environment variables, click on Environment Variables..., 2 ---> under System variables, look for and double-click on PATH, click on New, 3 ---> then add C:\Users\Program Files\poppler-0.68.0_x86\bin, click OK. ---> if no, manually add the folder poppler-0.68.0_x86 to the Program Files in the C drive which must be present at the download section (after extraction) and follow the same procedure

  2. Installing python and pip in the system (If pip is not installed) ---> go to this page and download python (any version >= 3.6.9) ---> now click allow changes as soon as it pops up a window for allowing changes. ---> now click next and accept and finally ok. here the pip is installed

  3. Installing packages for AksharaJaana ---> open command prompt ---> type-> pip install opencv-python==4.2.0.32 --> press enter ---> type-> pip install numpy==1.18.4 --> press enter ---> type-> pip install pdf2image==1.13.1 --> press enter ---> type-> pip install AksharaJaana==0.1.5 --> press enter ---> type-> pip install pytesseract --> press enter

  4. Reboot the system before starting to use

Python Script

Its in test.py in Github Repo

import AksharaJaana.main as ak

text = ak.ocr_engine('/home/navaneeth/Desktop/NandD/OCR_kannada/CamScanner 06-28-2020 12.12.10.pdf')

from AksharaJaana.utils import utils

u = utils()

u.write_as_RTF(text, saving_path='/home/navaneeth/Desktop/1.rtf')

README.md Displaying README.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

AksharaJaana-0.1.6.tar.gz (5.9 kB view details)

Uploaded Source

File details

Details for the file AksharaJaana-0.1.6.tar.gz.

File metadata

  • Download URL: AksharaJaana-0.1.6.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.6.9

File hashes

Hashes for AksharaJaana-0.1.6.tar.gz
Algorithm Hash digest
SHA256 0621212a218ffec9ddd23fd1eac82d082d3ee47c56a4bf2520f065a244ea9a09
MD5 e01836022ad5c30fbe01cba299242ebf
BLAKE2b-256 d9bde8880a365edf4bac90431b3ac606609a4995cc0034d91777fe818a3e770f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page