Kannada OCR with column Separation
Project description
AksharaJaana
AksharaJaana is an Optical Character Recognition software that is jointly developed by the Department of Electronics and Communication, NMAM Institute of Technology, Nitte, Karnataka, India and Kannada GaNaka Parishattu, Bengaluru. The focus is to add more features to the engine and make it more user friendly
The Requirements [ Conda environment is prefered for the smooth use ]
python >= 3.6.9
opencv-python >= 4.2.0.32
numpy==1.18.4
pdf2image==1.13.1
pytesseract
tesseract
poppler
Details for installation (Complete installation including requirements)
--------------------------------------------------------------Ubuntu------------------------------------------------------------
-
Installing tesseract-ocr in the sysytem ---> open terminal ---> type-> sudo apt-get update -y -> press enter ---> type-> sudo apt-get install -y tesseract-ocr -> press enter
-
Installing poppler in the system ---> open terminal ---> type-> sudo apt-get install -y poppler-utils -> press enter
-
Installing python and pip (if pip is not installed) ---> open terminal ---> type-> sudo apt install python==3.6.9
-
Installing packages for AksharaJaana ---> open terminal ---> type-> pip install opencv-python==4.2.0.32 --> press enter ---> type-> pip install numpy==1.18.4 --> press enter ---> type-> pip install pdf2image==1.13.1 --> press enter ---> type-> pip install AksharaJaana==0.1.5 --> press enter ---> type-> pip install pytesseract --> press enter
------------------------------------------------------------- Windows-----------------------------------------------------------
-
Installing tesseract-ocr in the system ---> go to the website --> https://github.com/UB-Mannheim/tesseract/wiki ---> click on --> tesseract-ocr-w64-setup-v5.0.0-alpha.20200328.exe (64 bit) resp. ---> after downloading open that file --> hit enter --> you will give an option to choose the languages. --> choose kannada in both script and lang ---> next after installation of that files. check if this folder is present C:\Program Files\Tesseract-OCR\ ---> if yes, then search for follow below procedure
Add C:\Program Files\Tesseract-OCR\ to your system PATH by doing the following: 1---> Click on the Windows start button, search for Edit the system environment variables, click on Environment Variables..., 2---> under System variables, look for and double-click on PATH, click on New, 3---> then add C:\Program Files\Tesseract-OCR, click OK.
---> if no, manually add the folder tesseract-ocr to the Program Files in the C drive which must be present at the download section (after extraction) and follow the same procedure
-
Installing poppler in the system ---> go to this page, http://blog.alivate.com.au/poppler-windows/ ---> click on --> poppler-0.54_x86 ---> after downloading open that file --> hit enter --> you will give an option to choose the languages. --> choose kannada in both script and lang ---> next after installation of that files. check if this folder is present C:\Program Files\Tesseract-OCR\ ---> if yes, then search for follow below procedure Add C:\Program Files\poppler-0.68.0_x86\bin to your system PATH by doing the following: 1 ---> Click on the Windows start button, search for Edit the system environment variables, click on Environment Variables..., 2 ---> under System variables, look for and double-click on PATH, click on New, 3 ---> then add C:\Users\Program Files\poppler-0.68.0_x86\bin, click OK. ---> if no, manually add the folder poppler-0.68.0_x86 to the Program Files in the C drive which must be present at the download section (after extraction) and follow the same procedure
-
Installing python and pip in the system (If pip is not installed) ---> go to this page and download python (any version >= 3.6.9) ---> now click allow changes as soon as it pops up a window for allowing changes. ---> now click next and accept and finally ok. here the pip is installed
-
Installing packages for AksharaJaana ---> open command prompt ---> type-> pip install opencv-python==4.2.0.32 --> press enter ---> type-> pip install numpy==1.18.4 --> press enter ---> type-> pip install pdf2image==1.13.1 --> press enter ---> type-> pip install AksharaJaana==0.1.5 --> press enter ---> type-> pip install pytesseract --> press enter
-
Reboot the system before starting to use
Python Script
Its in test.py in Github Repo
import AksharaJaana.main as ak
text = ak.ocr_engine('/home/navaneeth/Desktop/NandD/OCR_kannada/CamScanner 06-28-2020 12.12.10.pdf')
from AksharaJaana.utils import utils
u = utils()
u.write_as_RTF(text, saving_path='/home/navaneeth/Desktop/1.rtf')
README.md Displaying README.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file AksharaJaana-0.1.7.tar.gz
.
File metadata
- Download URL: AksharaJaana-0.1.7.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5509f432ec1aec2b933d74f26fbe1735dd5df86627030d551b747b4f930d81a0 |
|
MD5 | ced721989ccfa97d7b06d2fcacfea03d |
|
BLAKE2b-256 | 886cfb8cc22d11c5e3cd859ec86c79fb41fa1981b545cce92f2bfa2be622f088 |