Skip to main content

Awesome OCR toolkit based on PaddlePaddle

Project description

PaddleOCR Banner

中文 | English | 日本語

stars Downloads python os

Website AI Studio AI Studio AI Studio

🚀 简介

PaddleOCR自发布以来凭借学术前沿算法和产业落地实践,受到了产学研各方的喜爱,并被广泛应用于众多知名开源项目,例如:Umi-OCR、OmniParser、MinerU、RAGFlow等,已成为广大开发者心中的开源OCR领域的首选工具。2025年5月20日,飞桨团队发布PaddleOCR 3.0,全面适配飞桨框架3.0正式版,进一步提升文字识别精度,支持多文字类型识别手写体识别,满足大模型应用对复杂文档高精度解析的旺盛需求,结合文心大模型4.5 Turbo显著提升关键信息抽取精度,并新增对昆仑芯、昇腾等国产硬件的支持。

PaddleOCR 3.0新增三大特色能力::

  • 全场景文字识别模型PP-OCRv5:单模型支持五种文字类型和复杂手写体识别;整体识别精度相比上一代提升13个百分点
  • 通用文档解析方案PP-StructureV3:支持多场景、多版式 PDF 高精度解析,在公开评测集中领先众多开源和闭源方案
  • 智能文档理解方案PP-ChatOCRv4:原生支持文心大模型4.5 Turbo,精度相比上一代提升15个百分点

PaddleOCR 3.0除了提供优秀的模型库外,还提供好学易用的工具,覆盖模型训练、推理和服务化部署,方便开发者快速落地AI应用。

PaddleOCR Architecture

📣 最新动态

🔥🔥2025.05.20: PaddleOCR 3.0 正式发布,包含:

  • PP-OCRv5: 全场景高精度文字识别

    1. 🌐 单模型支持五种文字类型(简体中文繁体中文中文拼音英文日文)。
    2. ✍️ 支持复杂手写体识别:复杂连笔、非规范字迹识别性能显著提升。
    3. 🎯 整体识别精度提升 - 多种应用场景达到 SOTA 精度, 相比上一版本PP-OCRv4,识别精度提升13个百分点
  • PP-StructureV3: 通用文档解析方案

    1. 🧮 支持多场景 PDF 高精度解析,在 OmniDocBench 基准测试中领先众多开源和闭源方案
    2. 🧠 多项专精能力: 印章识别图表转表格嵌套公式/图片的表格识别竖排文本解析复杂表格结构分析等。
  • PP-ChatOCRv4: 智能文档理解方案

    1. 🔥 文档图像(PDF/PNG/JPG)关键信息提取精度相比上一代提升15个百分点
    2. 💻 原生支持文心大模型4.5 Turbo,还兼容 PaddleNLP、Ollama、vLLM 等工具部署的大模型。
    3. 🤝 集成 PP-DocBee2,支持印刷文字、手写体文字、印章信息、表格、图表等常见的复杂文档信息抽取和理解的能力。

⚡ 快速开始

1. 在线体验

AI Studio AI Studio AI Studio

2. 本地安装

请参考安装指南完成PaddlePaddle 3.0的安装,然后安装paddleocr。

# 安装 paddleocr
pip install paddleocr

3. 命令行方式推理

# 运行 PP-OCRv5 推理
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png

# 运行 PP-StructureV3 推理
paddleocr PP-StructureV3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png

# 运行 PP-ChatOCRv4 推理前,需要先获得千帆KPI Key
paddleocr pp_chatocrv4_doc -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png -k 驾驶室准乘人数 --qianfan_api_key your_api_key

# 查看 "paddleocr ocr" 详细参数
paddleocr ocr --help

4. API方式推理

4.1 PP-OCRv5 示例

from paddleocr import PaddleOCR
# 初始化 PaddleOCR 实例
ocr = PaddleOCR()
# 对示例图像执行 OCR 推理 
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")
# 可视化结果并保存 json 结果
for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")
4.2 PP-StructureV3 示例
from pathlib import Path
from paddleocr import PPStructureV3

pipeline = PPStructureV3()

# For Image
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png")

# 可视化结果并保存 json 结果
for res in output:
    res.print() 
    res.save_to_json(save_path="output") 
    res.save_to_markdown(save_path="output") 

# For PDF File
input_file = "./your_pdf_file.pdf"
output_path = Path("./output")

output = pipeline.predict(input_file)

markdown_list = []
markdown_images = []

for res in output:
    md_info = res.markdown
    markdown_list.append(md_info)
    markdown_images.append(md_info.get("markdown_images", {}))

markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)

mkd_file_path = output_path / f"{Path(input_file).stem}.md"
mkd_file_path.parent.mkdir(parents=True, exist_ok=True)

with open(mkd_file_path, "w", encoding="utf-8") as f:
    f.write(markdown_texts)

for item in markdown_images:
    if item:
        for path, image in item.items():
            file_path = output_path / path
            file_path.parent.mkdir(parents=True, exist_ok=True)
            image.save(file_path)
4.3 PP-ChatOCRv4 示例
from paddleocr import PPChatOCRv4Doc

chat_bot_config = {
    "module_name": "chat_bot",
    "model_name": "ernie-3.5-8k",
    "base_url": "https://qianfan.baidubce.com/v2",
    "api_type": "openai",
    "api_key": "api_key",  # your api_key
}

retriever_config = {
    "module_name": "retriever",
    "model_name": "embedding-v1",
    "base_url": "https://qianfan.baidubce.com/v2",
    "api_type": "qianfan",
    "api_key": "api_key",  # your api_key
}

mllm_chat_bot_config = {
    "module_name": "chat_bot",
    "model_name": "PP-DocBee",
    "base_url": "http://127.0.0.1:8080/",  # your local mllm service url
    "api_type": "openai",
    "api_key": "api_key",  # your api_key
}

pipeline = PPChatOCRv4Doc()

visual_predict_res = pipeline.visual_predict(
    input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png",
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_common_ocr=True,
    use_seal_recognition=True,
    use_table_recognition=True,
)

visual_info_list = []
for res in visual_predict_res:
    visual_info_list.append(res["visual_info"])
    layout_parsing_result = res["layout_parsing_result"]

vector_info = pipeline.build_vector(
    visual_info_list, flag_save_bytes_vector=True, retriever_config=retriever_config
)
mllm_predict_res = pipeline.mllm_pred(
    input="vehicle_certificate-1.png",
    key_list=["驾驶室准乘人数"],
    mllm_chat_bot_config=mllm_chat_bot_config,
)
mllm_predict_info = mllm_predict_res["mllm_res"]
chat_result = pipeline.chat(
    key_list=["驾驶室准乘人数"],
    visual_info=visual_info_list,
    vector_info=vector_info,
    mllm_predict_info=mllm_predict_info,
    chat_bot_config=chat_bot_config,
    retriever_config=retriever_config,
)
print(chat_result)

5. 国产化硬件使用

⛰️ 进阶指南

🔄 效果展示

PP-OCRv5 Demo

PP-StructureV3 Demo

👩‍👩‍👧‍👦 开发者社区

扫码关注飞桨公众号 扫码加入技术交流群

🏆 使用 PaddleOCR 的优秀项目

PaddleOCR 的发展离不开社区贡献!💗衷心感谢所有开发者、合作伙伴与贡献者!

项目名称 简介
RAGFlow 基于RAG的AI工作流引擎
MinerU 多类型文档转换Markdown工具
Umi-OCR 开源批量离线OCR软件
OmniParser 基于纯视觉的GUI智能体屏幕解析工具
QAnything 基于任意内容的问答系统
PDF-Extract-Kit 高效复杂PDF文档提取工具包
Dango-Translator 屏幕实时翻译工具
更多项目

👩‍👩‍👧‍👦 贡献者

🌟 Star

Star History Chart

📄 许可协议

本项目的发布受Apache 2.0 license许可认证。

🎓 学术引用

@misc{paddleocr2020,
title={PaddleOCR, Awesome multilingual OCR toolkits based on PaddlePaddle.},
author={PaddlePaddle Authors},
howpublished = {\url{https://github.com/PaddlePaddle/PaddleOCR}},
year={2020}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paddleocr-0.0.0.tar.gz (37.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paddleocr-0.0.0-py3-none-any.whl (63.6 kB view details)

Uploaded Python 3

File details

Details for the file paddleocr-0.0.0.tar.gz.

File metadata

  • Download URL: paddleocr-0.0.0.tar.gz
  • Upload date:
  • Size: 37.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for paddleocr-0.0.0.tar.gz
Algorithm Hash digest
SHA256 d656440fc4b5bb9b7aea5c564437138fdefb5dbdf009bd13a2db09f25643605e
MD5 9cc28523d66aaa6d01b265b0eb14db9a
BLAKE2b-256 75db45cbd48197cf7c6093d863b353e3ce03014c00e096fface3a2027c4be89c

See more details on using hashes here.

File details

Details for the file paddleocr-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: paddleocr-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 63.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for paddleocr-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 28bbff7d571ab4def722e8bee18d9d137f776aabe5f50073c7dee1264bd4c8c5
MD5 a94a1d170f2a4f7151075391a335dffb
BLAKE2b-256 3c4f100207810c80fb3566f6ae538734aef7c65f4ecfe81e0879adfc5ae1295f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page