Skip to main content

Package for Chinese OCR, which can be used after installed without training yourself OCR model

Project description

cnocr

A python package for Chinese OCR with available trained models. So it can be used directly after installed.

The accuracy of the current crnn model is about 98.7%.

The project originates from our own (爱因互动 Ein+) internal needs. Thanks for the internal supports.

Changes

Most of the codes are adapted from crnn-mxnet-chinese-text-recognition. Much thanks to the author.

Some changes are:

  • use raw MXNet CTC Loss instead of WarpCTC Loss. No more complicated installation.
  • public pre-trained model for anyone. No more a-few-days training.
  • add online predict function and script. Easy to use.

Installation

pip install cnocr

Please use Python3 (3.4, 3.5, 3.6 should work). Python2 is not tested.

Usage

Predict

from cnocr import CnOcr
ocr = CnOcr()
res = ocr.ocr_for_single_line('examples/rand_cn1.png')
print("Predicted Chars:", res)

When you run the previous codes, the model files will be downloaded automatically from Dropbox to ~/.cnocr. The zip file will be extracted and you can find the resulting model files in ~/.cnocr/models by default. In case the automatic download can't perform well, you can download the zip file manually from Baidu NetDisk with extraction code pg26, and put the zip file to ~/.cnocr. The code will do else.

Try the predict command for examples/rand_cn1.png:

examples/rand_cn1.png

python scripts/cnocr_predict.py --file examples/rand_cn1.png

You will get:

Predicted Chars: ['笠', '淡', '嘿', '骅', '谧', '鼎', '皋', '姚', '歼', '蠢', '驼', '耳', '胬', '挝', '涯', '狗', '蒽', '子', '犷']

(No NECESSARY) Train

You can use the package without any train. But if you really really want to train your own models, follow this:

python scripts/cnocr_train.py --cpu 2 --num_proc 4 --loss ctc --dataset cn_ocr

Future Work

  • Support space recognition
  • Bugfixes
  • Add Tests
  • Maybe use no symbol to rewrite the model
  • Try other models such as DenseNet, ResNet

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cnocr-0.1.1.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

cnocr-0.1.1-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file cnocr-0.1.1.tar.gz.

File metadata

  • Download URL: cnocr-0.1.1.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.23.3 CPython/3.6.5

File hashes

Hashes for cnocr-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d58adb8d340c55a9bce7e54ed07985f3f3449b7880ad3bae4baf0a6d21ced58d
MD5 48fce165f81dda0461a015c17d69c2ac
BLAKE2b-256 38e984fc884b33b87ea8d376395db804c33149d9edc0887d78d623b50f6b796e

See more details on using hashes here.

File details

Details for the file cnocr-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cnocr-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.23.3 CPython/3.6.5

File hashes

Hashes for cnocr-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7d5754f9bdbd93e283e6893b9153f5b224fb07787f28ec2b887b51809fc57e41
MD5 17778ce84a31339b349a8bf41195efad
BLAKE2b-256 f5f84da355ec579d61b756ab1bd355b78cbc7697e1c4f5fc1b9dec8057737325

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page