Skip to main content

doc-json-sdk 调用云上docmind解析能力

Project description

DOC-JSON-SDK (PYTHON)

什么是DOC-JSON

doc-json-model 简要描述

DOC-JSON-SDK功能特点

  • 提供DocMind文档结构化输出的doc-json结果反序列化对象,以及辅助功能函数SDK

使用场景

使用场景: DocMind 文档智能解析调用

阿里云官网 文档智能解析调用

集成方式

  • 源码安装
#uv 准备环境
uv install
#使用虚拟环境
uv shell
# 构建
uv build
twine check $pkg_path
# 上传
twine upload -r aliyun-pypi pkg_path --verbose
  • python 3.10以上 环境

云上环境

pip install docmind-doc-json-sdk
  • 设置DocMind文档智能解析环境变量
export ALIBABA_CLOUD_ACCESS_KEY_ID=<access_key_id>
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<access_key_secret>
#调用服务

功能方法示例

1、获得json数据:

2、json加载/公有云服务调用

加载对象可以是:

  • doc-json 字符串对象
from doc_json_sdk.loader.document_model_loader import DocumentModelLoader
def test_local_json_document():
    file_path = "gongshi.json"
    loader = DocumentModelLoader()
    document = loader.load(doc_json_fp=open(file_path,"r"))
  • 公有云环境调用(配置ALIBABA_CLOUD_ACCESS_KEY_ID,ALIBABA_CLOUD_ACCESS_KEY_SECRET)
from doc_json_sdk.loader.document_model_loader import DocumentModelLoader
from doc_json_sdk.handler.document_handler import DocumentExtractHandler,DocumentDigitalExtractHandler
def test_document_hander():
    file_path = "gongshi.png"
    file_url = None
    # DocumentExtractHandler:文档智能解析,DocumentDigitalExtractHandler:文档电子解析
    loader = DocumentModelLoader(handler=DocumentExtractHandler())
    document = loader.load(file_path=file_path,file_url=file_url)
  • 公式参数调用/markdown输出/json保存
from doc_json_sdk.loader.document_model_loader import DocumentModelLoader
from doc_json_sdk.handler.document_handler import DocumentExtractHandler
def test_render_formula_markdown():
    file_path = "gongshi.png"
    file_url = None
    handler = DocumentExtractHandler()
    loader = DocumentModelLoader(handler=handler)
    document = loader.load(file_path=file_path,file_url=file_url,
                           formula_enhancement=True,
                           markdown_result=True,
                           save_json_path="/Users/sanchuan/Downloads/docmind.json")
  • 私有化服务调用(配置PRIVATE_DOCMIND_HOST或显式传入)
from doc_json_sdk.loader.document_model_loader import DocumentModelLoader
from doc_json_sdk.handler.document_private_handler import  PrivateDocumentExtractHandler,PrivateDigitalDocumentExtractHandler
def test_private_document_hander():
    file_path = "gongshi.png"
    file_url = None
    loader = DocumentModelLoader(handler=PrivateDocumentExtractHandler(host="127.0.0.1:7001"))
    document = loader.load(file_path=file_path,file_url=file_url)

3、功能函数

3.1 对DocumentModel使用处理为markdown

使用内置函数处理为markdown

from doc_json_sdk.loader.document_model_loader import DocumentModelLoader
from doc_json_sdk.handler.document_handler import DocumentExtractHandler,DocumentDigitalExtractHandler
from doc_json_sdk.render.document_model_render import DocumentModelRender
def test_render_markdown():
    file_path = "gongshi.png"
    file_url = None
    loader = DocumentModelLoader(handler=DocumentExtractHandler())
    document = loader.load(file_path=file_path,file_url=file_url,markdown_result=True)
    render = DocumentModelRender(document_model=document)
    with open("/Users/sanchuan/Downloads/docmind.md","w") as f:
        f.write(render.render_markdown_result())

3.2 对Layout版面块使用

LayoutModel 对象分为内容信息(来源电子解析/OCR)、版面类型信息(来源OCR/NLP)、逻辑关系信息(来源NLP)

doc-json-layout-model 简要描述

from doc_json_sdk.model.enums.layout_type_enum import LayoutTypeEnum

for layout in document:
    type_enum = layout.get_layout_type_enum()
    if (type_enum == LayoutTypeEnum.Elements.FOOTER or
            type_enum == LayoutTypeEnum.Elements.HEADER or
            type_enum == LayoutTypeEnum.Elements.NOTE):
        #  header and footer notes
        pass
    elif type_enum == LayoutTypeEnum.Elements.IMAGE:
        # image with head_line or split_line
        if layout.type.find("_line")!=-1:
            continue
    elif type_enum == LayoutTypeEnum.Elements.TABLE:
        #table
        pass
    else:
        # paragraph or note(table or figure)
        pass

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docmind_doc_json_sdk-1.1.1-py3-none-any.whl (42.5 kB view details)

Uploaded Python 3

File details

Details for the file docmind_doc_json_sdk-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for docmind_doc_json_sdk-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 248e95367aa68ad566577d37557fcf2ec32f3b7f1935e0d28ccb0d5180c9d267
MD5 098723b1705339319b70e4d398d1aabe
BLAKE2b-256 687dea9a7fc9af06f58d20e8e6c45dbed8fe5f47ac0ecd99ef121f415642fa4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page