Library for extract infomation from thai personal identity card
Project description
ThaiPersonalCardExtract
Library for extract infomation from thai personal identity card. imprement from easyocr and tesseract
New Feature v1.3 🎁
- Increase performance.
- Support Thai Driving License (Beta) สามารถสกัดข้อมูลจากภาพถ่ายใบขับขี่ได้บางรูปแบบ เนื่องจาก กรมทางขนส่งทางบก มีรูปแบบบัตรหลากหลายรูปแบบ และแต่ละรูปแบบมีตำแหน่งข้อมูลที่แตกต่างกัน จึงทำให้ประสิทธิภาพต่ำ
- ปรับเปลี่ยนรูปแบบไฟล์ระบบ
- Support Thai Government Lottery (Further works)
Examples
Real image file.
wrapPerpective image crop.
keypoint of image detected.
Resutls of library extract region of interest
|
|
---|---|
|
|
|
|
|
|
|
|
|
|
Recommend ⚠
- Image quality lowest should be 600x350
- Images with minimal reflections should be used. for good results
- Identity Card should be size in the image about 75%, if the image doesn't cropped that to be left only Identity Card area.
- For faster, please resize image and usage CUDA GPU.
Installation
Install using pip
for stable release,
pip install thai-personal-card-extract
For latest development release,
pip install git+git://github.com/ggafiled/ThaiPersonalCardExtrac.git
Note 1: for Windows, please install tesseract first by following the official instruction here https://medium.com/@navapat.tpb/734dae2fb4d3 On medium website, be sure to setup already.
Note 2: for Linux os, please install tesseract by following the official instruction https://github.com/tesseract-ocr/tesseract
Usage
# With build-in Config Options.
import ThaiPersonalCardExtract as card
reader = card.PersonalCard(
lang=card.THAI,
provider=card.DEFAULT,
tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract",
save_extract_result=True,
path_to_save="D:/dev/ThaiPersonalCardExtract/examples/extract")
result = reader.extractInfo('examples/card.jpg')
print(result)
# With free-style
from ThaiPersonalCardExtract import PersonalCard
reader = PersonalCard(lang="mix", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')
print(result)
# With free-style
from ThaiPersonalCardExtract import DrivingLicense
reader = PersonalCard(lang="mix", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')
print(result)
Output will be in list format, each item represents result of library can extract, respectively.
{
"Identification_Number": "9999999999999",
"FullNameTH": "นาย อายุมฺมุราเสะ",
"PrefixTH": "นาย",
"NameTH": "อายุมฺมุราเสะ",
"LastNameTH": "อายุมฺมุราเสะ",
"PrefixEN": "Me",
"NameEN": "Shoys",
"LastNameEN": "Hinata",
"BirthdayTH": "21 มี.ย. 2539",
"BirthdayEN": "21 Jun..1996",
"Religion": "พุทธ",
"Address": "ท๒ 99/1 มิชีโฮะ เขตฮานามิกาวา อำเภอชิบ;",
"DateOfIssueTH": "11 ส.ค. 2554",
"DateOfIssueEN": "~11 Ang. 2021",
"DateOfExpiryTH": "11 ส.ค. 2574",
"DateOfExpiryEN": "21 ug. 2092"
}
For set lang
attribute to tha
from ThaiPersonalCardExtract import PersonalCard
reader = PersonalCard(lang="tha", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')
print(result)
Output will be in list format, each item represents result of library can extract, respectively.
{
"Identification_Number": "9999999999999",
"FullNameTH": "นาย อายุมฺมุราเสะ",
"PrefixTH": "นาย",
"NameTH": "อายุมฺมุราเสะ",
"LastNameTH": "อายุมฺมุราเสะ",
"BirthdayTH": "21 มี.ย. 2539",
"Religion": "พุทธ",
"Address": "ท๒ 99/1 มิชีโฮะ เขตฮานามิกาวา อำเภอชิบ;",
"DateOfIssueTH": "11 ส.ค. 2554",
"DateOfExpiryTH": "11 ส.ค. 2574"
}
And you can set ocr provider following below default #used both easyocr and tesseract **Recommend
Or easyocr
Or tesseract
from ThaiPersonalCardExtract import PersonalCard
reader = PersonalCard(lang="tha", provider="default", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')
print(result)
Config Options
you can set options to Instance by below keyword
Parameter name | Value Type | Example |
---|---|---|
lang | String | Expected Results Language bash mix #get all area both tha and eng Or bash tha Or bash eng *Default is 'mix' |
provider | String | OCR Provider have bash default #used both easyocr and tesseract **Recommend Or bash easyocr Or bash tesseract *Default is 'default' |
template_threshold | Double | Rate to cals similarity of template *Default is 0.7 |
sift_rate | Int | Feature Keypoint rate *Default is 25,000 |
tesseract_cmd | String | Path of your tesseract command **For windows only. |
save_extract_result | Boolean | Set True if you want to save extracted image *Default is False |
path_to_save | String | Path that you given it save extracted image, relative with save_extract_result=True |
Donate Me ☕
Mr.Nattapol Krobklang
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for thai-personal-card-extract-1.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9dac20467f47125b50d2f47fe5b6813cab79998a2f42bf464bf978b72a9d9b8f |
|
MD5 | c9447be7387d08a5f3f724ae03fab5c9 |
|
BLAKE2b-256 | 751a51a5c39d612f203c01c69a683900ee87d2cd0f6aa18115d068feda441c05 |
Hashes for thai_personal_card_extract-1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecd80522ec29e51154b84474cfc52c5ad1102957d6aafa08df7edabbebce7430 |
|
MD5 | fc57cb64261c5b4119714627f4094b73 |
|
BLAKE2b-256 | feb22615d8a0ac38f21d84e25c87f854464f9cae0bf5781f0d222154f6b4547f |