doc-json-sdk 调用云上docmind解析能力
Project description
DOC-JSON-SDK (PYTHON)
什么是DOC-JSON
DOC-JSON-SDK功能特点
- 提供DocMind文档结构化输出的doc-json结果反序列化对象,以及辅助功能函数SDK
使用场景
使用场景: DocMind 文档智能解析调用
集成方式
- 源码安装
#uv 准备环境
uv install
#使用虚拟环境
uv shell
# 构建
uv build
twine check $pkg_path
# 上传
twine upload -r aliyun-pypi pkg_path --verbose
- python 3.10以上 环境
云上环境
pip install doc-json-sdk
- 设置DocMind文档智能解析环境变量
export ALIBABA_CLOUD_ACCESS_KEY_ID=<access_key_id>
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<access_key_secret>
#调用服务
功能方法示例
1、获得json数据:
- 调用文档智能解析 阿里云官网SDK调用API
2、json加载/公有云服务调用
加载对象可以是:
- doc-json 字符串对象
from doc_json_sdk.loader.document_model_loader import DocumentModelLoader
def test_local_json_document():
file_path = "gongshi.json"
loader = DocumentModelLoader()
document = loader.load(doc_json_fp=open(file_path,"r"))
- 公有云环境调用(配置ALIBABA_CLOUD_ACCESS_KEY_ID,ALIBABA_CLOUD_ACCESS_KEY_SECRET)
from doc_json_sdk.loader.document_model_loader import DocumentModelLoader
from doc_json_sdk.handler.document_handler import DocumentExtractHandler,DocumentDigitalExtractHandler
def test_document_hander():
file_path = "gongshi.png"
file_url = None
# DocumentExtractHandler:文档智能解析,DocumentDigitalExtractHandler:文档电子解析
loader = DocumentModelLoader(handler=DocumentExtractHandler())
document = loader.load(file_path=file_path,file_url=file_url)
- 公式参数调用/markdown输出/json保存
from doc_json_sdk.loader.document_model_loader import DocumentModelLoader
from doc_json_sdk.handler.document_handler import DocumentExtractHandler
def test_render_formula_markdown():
file_path = "gongshi.png"
file_url = None
handler = DocumentExtractHandler()
loader = DocumentModelLoader(handler=handler)
document = loader.load(file_path=file_path,file_url=file_url,
formula_enhancement=True,
markdown_result=True,
save_json_path="/Users/sanchuan/Downloads/docmind.json")
- 私有化服务调用(配置PRIVATE_DOCMIND_HOST或显式传入)
from doc_json_sdk.loader.document_model_loader import DocumentModelLoader
from doc_json_sdk.handler.document_private_handler import PrivateDocumentExtractHandler,PrivateDigitalDocumentExtractHandler
def test_private_document_hander():
file_path = "gongshi.png"
file_url = None
loader = DocumentModelLoader(handler=PrivateDocumentExtractHandler(host="127.0.0.1:7001"))
document = loader.load(file_path=file_path,file_url=file_url)
3、功能函数
3.1 对DocumentModel使用处理为markdown
使用内置函数处理为markdown
from doc_json_sdk.loader.document_model_loader import DocumentModelLoader
from doc_json_sdk.handler.document_handler import DocumentExtractHandler,DocumentDigitalExtractHandler
from doc_json_sdk.render.document_model_render import DocumentModelRender
def test_render_markdown():
file_path = "gongshi.png"
file_url = None
loader = DocumentModelLoader(handler=DocumentExtractHandler())
document = loader.load(file_path=file_path,file_url=file_url,markdown_result=True)
render = DocumentModelRender(document_model=document)
with open("/Users/sanchuan/Downloads/docmind.md","w") as f:
f.write(render.render_markdown_result())
3.2 对Layout版面块使用
LayoutModel 对象分为内容信息(来源电子解析/OCR)、版面类型信息(来源OCR/NLP)、逻辑关系信息(来源NLP)
from doc_json_sdk.model.enums.layout_type_enum import LayoutTypeEnum
for layout in document:
type_enum = layout.get_layout_type_enum()
if (type_enum == LayoutTypeEnum.Elements.FOOTER or
type_enum == LayoutTypeEnum.Elements.HEADER or
type_enum == LayoutTypeEnum.Elements.NOTE):
# header and footer notes
pass
elif type_enum == LayoutTypeEnum.Elements.IMAGE:
# image with head_line or split_line
if layout.type.find("_line")!=-1:
continue
elif type_enum == LayoutTypeEnum.Elements.TABLE:
#table
pass
else:
# paragraph or note(table or figure)
pass
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docmind_doc_json_sdk-1.1.0-py3-none-any.whl.
File metadata
- Download URL: docmind_doc_json_sdk-1.1.0-py3-none-any.whl
- Upload date:
- Size: 62.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5e4d88a0313cd21a1494a7bd0cef78fcf6afe934e40ad9150c6fb82e8867562
|
|
| MD5 |
fec2f3c8dfc56ab3ce0ad26ccfe9b34f
|
|
| BLAKE2b-256 |
7357466ba4ff5b22df2411ea4dc6559449d664d71058b17071c76f72f7c68255
|