Use to build server end-point and client end-point of OCR service.
Project description
KOCR
Introduction
PDF 中的資訊難以提取? 試試 OCR 吧!🤩
PDF 文件中的文字雖然易於閱讀,但想要提取其中的資訊卻常常讓人頭痛😓。別擔心,OCR (光學字符識別) 來救你啦!🙌
OCR 能將 PDF 文件內的圖像轉換為可編輯的文本內容,並提供每個文字的位置資訊。 🤯 這意味着你可以輕鬆的:
- 將 PDF 文档中的文字複製到其他應用程式中 📑
- 搜尋 PDF 文件中的特定關鍵字🔎
- 自動整理表格数据📊
- 更有效率地分析和處理文本信息📈
這個專案旨在提供一個方便又實用的解決方案,讓你快速、高效地提取 PDF 文件中的資訊。🚀
採用了 PaddleOCR 開源預訓練模型 💪,並搭建了 server 端和 client 端:
- 完全本地運行! 你不需要連網,任何時候都可以使用它!🌎
- 簡單易用! 輕鬆架設完成,讓你快速上手 🚀
解鎖 PDF 文件的無限潛力吧!✨
Pre-require
Usage - Server
要啟動server有兩種方式:
- pip
- (推薦)Docker
pip
conda create -n kocr python=3.11 -y -q
conda activate kocr
pip install kocr
啟動server前可以先設定模型目錄以及要運行的port
export OCR_MODEL_ROOT=/data/models/paddleocr
export DET_MODEL=/det/en/en_PP-OCRv3_det_infer
export REC_MODEL=/rec/en/en_PP-OCRv4_rec_infer
export CLS_MODEL=/cls/ch/ch_ppocr_mobile_v2.0_cls_slim_infer
export PORT=8868
python -m kocr.api_server
正常啟動server的話應該會看到
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8868 (Press CTRL+C to quit)
Docker
- Build Docker image
docker pull kime541200/kocr:1.0
- Create Docker container
sudo docker run -d \
--gpus='"device=0"' \
-v /data/models/paddleocr:/data/models/paddleocr \
-e OCR_MODEL_ROOT=/data/models/paddleocr \
-e DET_MODEL=/det/en/en_PP-OCRv3_det_infer \
-e REC_MODEL=/rec/en/en_PP-OCRv4_rec_infer \
-e CLS_MODEL=/cls/ch/ch_ppocr_mobile_v2.0_cls_slim_infer \
-e OCR_PORT=8868 \
-p 8868:8868 \
-w /usr/src/app/kocr \
--restart unless-stopped \
--name kocr \
kocr:latest \
python api_server.py
其中
OCR_MODEL_ROOT
是存放OCR模型的根目錄DET_MODEL
是存放檢測模型的目錄REC_MODEL
是存放識別模型的目錄CLS_MODEL
是存放文本方向分類模型的目錄
Server端預設情況下會去 {OCR_MODEL_ROOT}{DET_MODEL}
讀取檢測模型, 沒有的話就會直接下載到該目錄(須連網), 其他兩個模型依此類推。
這邊提供建立容器的範例中以 -v /data/models/paddleocr:/data/models/paddleocr
將本機的目錄掛載進容器中, 是因為我將模型放在本機的 /data/models/paddleocr
底下, 實際情況可依個人需求進行調整。
-e OCR_PORT
則可用來設置server運行的port。
容器建立後server會在背景運行, 例如: http://0.0.0.0:8868。
Usage - Client
PDF OCR
from kocr.app.client.classes.OcrClient import OcrClient
from kocr.app.ocr.utils.utils import decode_base64_image, draw_text_box
ocr_client = OcrClient(host='http://127.0.0.1:8868') # change IP and port if needed
def run():
# leave `specific_pages` to `None` will stream every pages in the PDF file
for result in ocr_client.send_pdf(pdf_path='/path/to/file.pdf', specific_pages=[1, 3, 21]):
img = decode_base64_image(result['base64_img'])
draw_text_box(img=img, ocr_results=result['result'])
if __name__ == "__main__":
run()
圖片OCR
from kocr.app.client.classes.OcrClient import OcrClient
from PIL import Image
from kocr.app.ocr.utils.utils import image_to_base64, decode_base64_image, draw_text_box
ocr_client = OcrClient(host='http://127.0.0.1:8868') # change IP and port if needed
def run():
# 載入本地影像
image = Image.open("/path/to/image.jpg")
img_base64 = image_to_base64(image)
response = ocr_client.send_image(img_base64=img_base64)
# 輸出伺服器的回應
if response.status_code == 200:
img = decode_base64_image(base64_str=response.json()['base64_img'])
ocr_result = response.json()['result']
draw_text_box(img=img, ocr_results=ocr_result)
else:
print(f"Failed to send image. Status code: {response.status_code}")
if __name__ == "__main__":
run()
滑動視窗OCR (處理較大圖片)
from PIL import Image
from kocr.app.client.classes.OcrClient import OcrClient
from kocr.app.ocr.classes import OcrConfig
from kocr.app.ocr.utils.utils import image_to_base64, decode_base64_image, draw_text_box
ocr_client = OcrClient(host='http://127.0.0.1:8868')
def run():
# 載入本地影像
image = Image.open("/home/kim/workspace/myproject/kocr/test/large.jpg")
img_base64 = image_to_base64(image)
# must set the slide window's size
config = {
"slice":{'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
}
response = ocr_client.send_image(img_base64=img_base64, config=OcrConfig(**config))
# 輸出伺服器的回應
if response.status_code == 200:
img = decode_base64_image(base64_str=response.json()['base64_img'])
ocr_result = response.json()['result']
draw_text_box(img=img, ocr_results=ocr_result)
else:
print(f"Failed to send image. Status code: {response.status_code}")
if __name__ == "__main__":
run()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kocr-0.1.3.tar.gz
.
File metadata
- Download URL: kocr-0.1.3.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 512129f5d2ee472023378ec4d46595da115a6b6045c80dd7a8d428eb847c14bf |
|
MD5 | 1238a2131a836f0f87db80b226cb80a1 |
|
BLAKE2b-256 | fa7f695975a24ecf61c6fa25bff25f78a67770b24c507c5bfc2640d865b104fe |
File details
Details for the file kocr-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: kocr-0.1.3-py3-none-any.whl
- Upload date:
- Size: 16.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | db2b1a1a4ccd75b230a093620e2e7a3e075d5a93658c2f3a1fddb0883997f40d |
|
MD5 | 22c893b2bbb1c420e34d6a69b9579a3f |
|
BLAKE2b-256 | 7eab60e65e2b67610c230c20e8f20aeffb94da099a1263cee248dd9b18959e7d |