A practical tool for converting PDF to Markdown

These details have not been verified by PyPI

Project links

Project description

RapidDoc - 高速文档解析系统

English | 中文

😺 项目介绍

RapidDoc 是一个轻量级、专注于文档解析的开源框架，支持 OCR、版面分析、公式识别、表格识别和阅读顺序恢复 等多种功能，支持将复杂 PDF 文档转换为 Markdown、JSON、WORD、HTML 格式。

支持docx/doc、pptx/ppt、xlsx/xls的原生解析（不使用模型）。

框架基于 Mineru 二次开发，移除 VLM，专注于 Pipeline 产线下的高效文档解析，在 CPU 上也能保持不错的解析速度。

本项目所使用的默认模型主要来源于 PaddleOCR 的 PP-StructureV3 系列（OCR、版面分析、公式识别、阅读顺序恢复，以及部分表格识别模型），并已全部转换为 ONNX 格式，支持在 CPU/GPU 上高效推理。

同时支持自定义OCR、公式、表格模型，需实现 CustomBaseModel 的 batch_predict 方法，目前内置 PaddleOCRVL 系列模型的集成。

KittyDoc 已经成为 RapidAI 开源家族成员

✨如果该项目对您有帮助，您的star是我不断优化的动力！！！

github点击前往

gitee点击前往

👏 项目特点

OCR 识别
- 使用 RapidOCR 支持多种推理引擎
- CPU 下默认使用 OpenVINO（速度快，内存占用较高），GPU 下默认使用 torch
版面识别
- 模型使用 PP-DocLayout 系列 ONNX 模型（v2、plus-L、L、M、S）
  - PP-DocLayoutV3：自带阅读顺序，支持异形框，默认使用
  - PP-DocLayoutV2：自带阅读顺序
  - PP-DocLayout_plus-L：效果好运行稳定
  - PP-DocLayout-L：速度快，效果也不错
  - PP-DocLayout-S：速度极快，存在部分漏检
公式识别
- 使用 PP-FormulaNet_plus 系列 ONNX 模型（L、M、S）
  - PP-FormulaNet_plus-L：速度慢，支持onnx
  - PP-FormulaNet_plus-M：默认使用，支持onnx和torch
  - PP-FormulaNet_plus-S：速度最快，支持onnx，复杂公式精度不够
- 支持配置只识别行间公式
- cuda环境，默认使用torch推理，公式模型onnx gpu推理会报错，暂时无人解决 PaddleOCR/issues/15125, PaddleX/issues/4238, Paddle2ONNX/issues/1593
表格识别
- 基于 rapid_table_self 增强，在原有基础上增强为多模型串联方案：
  - 表格分类（区分有线/无线表格）
  - 有线表格识别UNET + SLANET_plus/UNITABLE（作为无线表格识别）
阅读顺序恢复
- PP-DocLayoutV2和PP-DocLayoutV3使用版面模型自带的阅读顺序
- 其余版面模型，使用 PP-StructureV3 阅读顺序恢复算法，基于xycut算法和版面的结果
推理方式
- 所有模型通过 ONNXRuntime 推理，OCR可配置其他推理引擎
- 除了 OCR 和 PP-DocLayout-M/S 模型，OpenVINO推理会报错，暂时难以解决。PaddleOCR/issues/16277

基准测试结果

1. OmniDocBench

以下是RapidDoc在 OmniDocBench v1.6 上的评估结果。

Pipeline 模型使用 PP-DocLayoutV3、PP-OCRv6-small、PP-FormulaNet_plus-M、UNET_SLANET_PLUS。

Comprehensive evaluation of document parsing on OmniDocBench (v1.6_full)
Model Type	Methods	Size	Overall↑	Text^Edit↓	Formula^CDM↑	Table^TEDS↑	Table^TEDS-S↑	Read Order^Edit↓
MinerU2.5-Pro	Specialized VLMs	1.2B	95.75	0.036	97.45	93.42	95.92	0.120
GLM-OCR	Specialized VLMs	0.9B	95.22	0.044	97.18	92.83	95.39	0.133
PaddleOCR-VL-1.5	Specialized VLMs	0.9B	94.93	0.038	96.89	91.67	94.37	0.130
PaddleOCR-VL	Specialized VLMs	0.9B	94.18	0.040	95.91	90.65	93.74	0.135
Youtu-Parsing	Specialized VLMs	2.5B	93.74	0.044	93.63	92.02	95.00	0.116
Qianfan-OCR	Specialized VLMs	4B	93.90	0.04	95.08	90.53	93.31	0.13
Ovis2.6-30B-A3B	General VLMs	30B	93.70	0.035	95.17	89.44	92.40	0.135
Logics-Parsing-v2	Specialized VLMs	4B	93.33	0.041	95.65	88.42	91.98	0.137
ABot-OCR	Specialized VLMs	2B	93.30	0.037	94.86	88.69	91.87	0.137
FireRed-OCR	Specialized VLMs	2B	93.26	0.037	95.44	88.04	91.06	0.131
MinerU-2.5	Specialized VLMs	1.2B	93.04	0.045	95.77	87.88	91.47	0.130
Gemini 3 Pro	General VLMs	-	92.91	0.064	95.99	89.15	92.96	0.165
Gemini 3 Flash	General VLMs	-	92.62	0.066	95.16	89.29	93.51	0.172
dots.ocr	Specialized VLMs	3B	90.77	0.048	89.95	87.18	90.58	0.138
OpenDoc-0.1B	Specialized VLMs	0.1B	90.67	0.049	93.02	83.88	87.45	0.140
DeepSeek-OCR 2	Specialized VLMs	3B	90.25	0.050	91.84	83.89	87.75	0.144
RapidDoc	Pipeline Tools	-	90.157	0.047	93.777	81.394	88.402	0.136
HunyuanOCR	Specialized VLMs	1B	89.95	0.088	87.68	91.01	93.23	0.171
Qwen3-VL-235B	General VLMs	235B	89.78	0.063	92.55	83.07	86.75	0.166
Dolphin-v2	Specialized VLMs	3B	89.50	0.069	91.01	84.40	87.44	0.150
OCRVerse	Specialized VLMs	4B	88.60	0.063	89.61	82.44	86.27	0.163
MonkeyOCR-pro-3B	Specialized VLMs	3B	88.57	0.074	88.74	84.35	88.62	0.189
GPT-5.2	General VLMs	-	86.59	0.114	88.21	82.95	87.93	0.193
Dolphin-1.5	Specialized VLMs	0.3B	86.52	0.094	87.49	81.43	84.82	0.167
MinerU-Pipeline	Pipeline Tools	-	86.47	0.055	83.07	81.88	88.68	0.153
olmOCR	Specialized VLMs	7B	85.74	0.139	88.10	83.00	87.17	0.216
Mistral OCR	Specialized VLMs	-	85.66	0.097	89.91	76.78	80.93	0.171
Kimi K2.5	General VLMs	1T	84.53	0.107	83.50	80.76	84.00	0.211
InternVL3.5-241B	General VLMs	241B	83.76	0.130	89.95	74.35	79.78	0.215
Nanonets-OCR-s	Specialized VLMs	3B	83.61	0.108	81.46	80.18	84.51	0.213
POINTS-Reader	Specialized VLMs	3B	83.37	0.096	85.72	73.98	77.40	0.198
Marker	Pipeline Tools	-	78.44	0.157	85.24	65.77	73.24	0.243

🛠️ 安装RapidDoc

使用pip安装

pip install rapid-doc[cpu] -i https://mirrors.aliyun.com/pypi/simple
或
pip install rapid-doc[gpu] -i https://mirrors.aliyun.com/pypi/simple

通过源码安装

# 克隆仓库
git clone https://github.com/RapidAI/RapidDoc.git
cd RapidDoc

# 安装依赖
pip install -e .[cpu] -i https://mirrors.aliyun.com/pypi/simple
或
pip install -e .[gpu] -i https://mirrors.aliyun.com/pypi/simple

使用gpu推理

# rapid-doc[gpu] 默认安装 onnxruntime-gpu 最新版
# 需要确定onnxruntime-gpu与GPU对应，参考 https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements

# 在 Python 中指定 GPU（必须在导入 rapid_doc 之前设置）
import os
# 使用默认 GPU（cuda:0）
os.environ['MINERU_DEVICE_MODE'] = "cuda"
# 或指定 GPU 编号，例如使用第二块 GPU（cuda:1）
os.environ['MINERU_DEVICE_MODE'] = "cuda:1"

使用PaddleOCRVL系列推理

vl模型的部署，参考官方文档

import os
os.environ['PADDLEOCRVL_VERSION'] = "v1.6"
os.environ['PADDLEOCRVL_VL_REC_BACKEND'] = "vllm-server"
os.environ['PADDLEOCRVL_VL_VL_REC_SERVER_URL'] = "http://localhost:8118/v1"

from rapid_doc.model.layout.rapid_layout_self import ModelType as LayoutModelType
from rapid_doc.model.custom.paddleocr_vl.paddleocr_vl import PaddleOCRVLTableModel, PaddleOCRVLOCRModel, PaddleOCRVLFormulaModel
layout_config = {
    "model_type": LayoutModelType.PP_DOCLAYOUTV3,
}
ocr_config = {
    "custom_model": PaddleOCRVLOCRModel()
}
formula_config = {
    "custom_model": PaddleOCRVLFormulaModel()
}
table_config = {
    "custom_model": PaddleOCRVLTableModel()
}

使用docker部署RapidDoc

RapidDoc提供了便捷的docker部署方式，这有助于快速搭建环境并解决一些棘手的环境兼容问题。

您可以在文档中获取 Docker部署说明，镜像已推送至 Docker Hub。

📋 使用

import os
from pathlib import Path
from rapid_doc import RapidDoc
__dir__ = Path(__file__).resolve().parent.parent
output_dir = os.path.join(__dir__, "output")

doc_path_list = [
    __dir__ / "demo/pdfs/示例1-论文模板.pdf",
    __dir__ / "demo/docx/test.docx",
]
engine = RapidDoc()
outputs = engine(doc_path_list, output_dir=output_dir)
for output in outputs:
    print(output.markdown)

在线体验

基于Gradio的在线demo

基于gradio开发的webui，界面简洁，仅包含核心解析功能，免登录

📋 使用示例

模型下载

不指定模型路径，初次运行时，会自动下载

📌 TODO

跨页表格合并
复选框识别，使用opencv（默认关闭、opencv识别存在误检）
提供 fastapi，支持cpu和gpu版本的docker镜像构建
文本型pdf，表格非OCR文本提取
文本型pdf，使用pypdfium2提取文本框bbox
文本型pdf，支持0/90/270度三个方向的表格解析
文本型pdf，使用pypdfium2提取原始图片（默认截图会导致清晰度降低和图片边界可能丢失部分）
表格内公式提取，表格内图片提取
优化阅读顺序，支持多栏、竖排等复杂版面恢复
公式支持torch推理，可用GPU加速
版面、表格模型支持openvino
markdown转docx、html
支持 PP-DocLayoutV2 版面识别+阅读顺序
OmniDocBench评测
支持自定义的ocr、table、公式。支持PaddleOCR-VL系列
支持docx/doc、pptx/ppt、xlsx/xls的原生解析（不使用模型）
支持印章文本检测
文档方向90°、270°矫正（默认关闭），表格方向90°、270°矫正（默认开启）

🙏 致谢

Star History

⚖️ 开源许可

基于 MinerU 改造而来，已移除原项目中的 YOLO 模型，并替换为 PP-StructureV3 系列 ONNX 模型。
由于已移除 AGPL 授权的 YOLO 模型部分，本项目整体不再受 AGPL 约束。

该项目采用 Apache 2.0 license 开源许可证。

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.9.8

Jun 28, 2026

0.9.7

Jun 7, 2026

0.9.6

Jun 7, 2026

0.9.5

May 24, 2026

0.9.4

May 10, 2026

0.9.3

May 5, 2026

0.9.2

May 5, 2026

0.9.1

Apr 12, 2026

0.9.0

Apr 12, 2026

0.8.0

Mar 29, 2026

0.7.0

Feb 8, 2026

0.6.1

Dec 29, 2025

0.6.0

Dec 28, 2025

0.5.1

Dec 2, 2025

0.5.0

Dec 2, 2025

0.4.0

Nov 23, 2025

0.3.0

Nov 8, 2025

0.2.0

Oct 24, 2025

0.1.0

Sep 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rapid_doc-0.9.8-py3-none-any.whl (73.2 MB view details)

Uploaded Jun 28, 2026 Python 3

File details

Details for the file rapid_doc-0.9.8-py3-none-any.whl.

File metadata

Download URL: rapid_doc-0.9.8-py3-none-any.whl
Upload date: Jun 28, 2026
Size: 73.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for rapid_doc-0.9.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2d7cdc731a624d943542e25052a72f30a6c82efd65c90031b3bd5bd2b8a76817`
MD5	`e55d5f8904de09b15e0235e3277bcd5a`
BLAKE2b-256	`425f3c03926b7815b9068b3cf8df743112a67745fe29d908d737df1151765391`

See more details on using hashes here.

rapid-doc 0.9.8

Navigation

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Project description

RapidDoc - 高速文档解析系统

😺 项目介绍

👏 项目特点

基准测试结果

1. OmniDocBench

🛠️ 安装RapidDoc

使用pip安装

通过源码安装

使用gpu推理

使用PaddleOCRVL系列推理

使用docker部署RapidDoc

📋 使用

在线体验

基于Gradio的在线demo

📋 使用示例

模型下载

📌 TODO

🙏 致谢

Star History

⚖️ 开源许可

Project details

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes