Skip to main content

Storage for documents

Project description

doc_store

DocStore 是一个用于管理文档、页面、布局和内容的客户端库。本文档介绍 DocClient 的完整 API 使用方法。

目录


安装与配置

from doc_store import DocClient

快速开始

from doc_store import DocClient

# 创建客户端实例
client = DocClient(server_url="http://localhost:8000")

# 获取文档
doc = client.get_doc("doc-xxx")

# 遍历页面
for page in doc.pages:
    print(page.id, page.image_path)

# 关闭客户端
client.close()

# 或使用上下文管理器
with DocClient(server_url="http://localhost:8000") as client:
    doc = client.get_doc("doc-xxx")

数据模型

Doc(文档)

文档是最顶层的数据结构,代表一个 PDF 文件。

属性

属性 类型 说明
id str 文档唯一标识符
rid int 数据库行 ID
create_time int | None 创建时间戳
update_time int | None 更新时间戳
pdf_path str PDF 文件的 S3 路径
pdf_filename str | None PDF 文件名
pdf_filesize int PDF 文件大小(字节)
pdf_hash str PDF 文件的 SHA256 哈希值
num_pages int PDF 页数
page_width float 页面宽度
page_height float 页面高度
metadata dict PDF 元数据
orig_path str | None 原始文件路径(如 Word/PPT)
orig_filesize int | None 原始文件大小
orig_filename str | None 原始文件名
orig_hash str | None 原始文件哈希值
tags list[str] 标签列表
attrs dict[str, AttrValueType] 属性字典
metrics dict[str, MetricValueType] 指标字典

动态属性

属性 类型 说明
pdf_bytes bytes 获取 PDF 文件的二进制内容
pdf PDFDocument 获取 PDF 文档对象
pages list[Page] 获取文档的所有页面(按页码排序)

方法

# 查找文档的页面
def find_pages(
    self,
    query: dict | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> Iterable[Page]

# 插入页面
def insert_page(self, page_idx: int, page_input: DocPageInput) -> Page

Page(页面)

页面代表文档中的单个页面。

属性

属性 类型 说明
id str 页面唯一标识符
rid int 数据库行 ID
create_time int | None 创建时间戳
update_time int | None 更新时间戳
doc_id str | None 所属文档 ID
page_idx int | None 页码索引
image_path str 页面图片的 S3 路径
image_filesize int 图片文件大小(字节)
image_hash str 图片的 SHA256 哈希值
image_width int 图片宽度(像素)
image_height int 图片高度(像素)
image_dpi int | None 图片 DPI
providers list[str] 已处理的 provider 列表
old_image_path str | None 迁移前的旧图片路径
tags list[str] 标签列表
attrs dict[str, AttrValueType] 属性字典
metrics dict[str, MetricValueType] 指标字典

动态属性

属性 类型 说明
image_bytes bytes 获取页面图片的二进制内容
image PIL.Image.Image 获取页面图片对象
image_presigned_link str 获取页面图片的预签名链接(24小时有效)
image_pub_link str 获取页面图片的公开链接
super_block Block 获取页面的超级块
doc Doc | None 获取所属文档

方法

# 尝试获取布局
def try_get_layout(self, provider: str, expand: bool = False) -> Layout | None

# 获取布局
def get_layout(self, provider: str, expand: bool = False) -> Layout

# 查找布局
def find_layouts(
    self,
    query: dict | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> Iterable[Layout]

# 查找块
def find_blocks(
    self,
    query: dict | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> Iterable[Block]

# 查找内容
def find_contents(
    self,
    query: dict | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> Iterable[Content]

# 插入布局
def insert_layout(
    self, 
    provider: str, 
    layout_input: LayoutInput, 
    insert_blocks: bool = False, 
    upsert: bool = False
) -> Layout

# 更新或插入布局
def upsert_layout(
    self, 
    provider: str, 
    layout_input: LayoutInput, 
    insert_blocks: bool = False
) -> Layout

# 插入单个块
def insert_block(self, block_input: BlockInput) -> Block

# 批量插入块
def insert_blocks(self, blocks: list[BlockInput]) -> list[Block]

# 插入带内容的块布局
def insert_content_blocks_layout(
    self,
    provider: str,
    content_blocks: list[ContentBlockInput],
    upsert: bool = False,
) -> Layout

Layout(布局)

布局代表页面的结构分析结果,包含多个块。

属性

属性 类型 说明
id str 布局唯一标识符
rid int 数据库行 ID
create_time int | None 创建时间戳
update_time int | None 更新时间戳
page_id str 所属页面 ID
provider str 布局提供者标识
masks list[MaskBlock] 遮罩块列表
blocks list[Block] 块列表
relations list[dict] 块关系列表
contents list[Content] 内容列表
is_human_label bool 是否为人工标注
tags list[str] 标签列表
attrs dict[str, AttrValueType] 属性字典
metrics dict[str, MetricValueType] 指标字典

动态属性

属性 类型 说明
page Page 获取所属页面
masked_image PIL.Image.Image 获取带遮罩的页面图片
framed_image PIL.Image.Image 获取带块边框的页面图片

方法

# 列出所有内容版本
def list_versions(self) -> list[str]

# 列出所有块
def list_blocks(self) -> list[Block]

# 列出指定版本的内容
def list_contents(self, version: str | None = None) -> list[Content]

# 展开布局(加载所有块和内容)
def expand(self) -> Layout

Block(块)

块代表页面上的一个区域,如标题、段落、表格、图片等。

属性

属性 类型 说明
id str 块唯一标识符
rid int 数据库行 ID
create_time int | None 创建时间戳
update_time int | None 更新时间戳
layout_id str | None 所属布局 ID
provider str | None 块提供者标识
page_id str | None 所属页面 ID
type str 块类型(如 title, text, table, image)
bbox list[float] 归一化边界框 [x1, y1, x2, y2],范围 0-1
angle Literal[None, 0, 90, 180, 270] 旋转角度
score float | None 检测置信度分数
image_path str | None 独立块的图片路径
image_filesize int | None 图片文件大小
image_hash str | None 图片哈希值
image_width int | None 图片宽度
image_height int | None 图片高度
versions list[str] 已有的内容版本列表
tags list[str] 标签列表
attrs dict[str, AttrValueType] 属性字典
metrics dict[str, MetricValueType] 指标字典

动态属性

属性 类型 说明
page Page | None 获取所属页面
image PIL.Image.Image 获取块图片(裁剪自页面或独立图片)
image_bytes bytes 获取块图片的二进制内容
image_pub_link str 获取块图片的公开链接

方法

# 尝试获取内容
def try_get_content(self, version: str) -> Content | None

# 获取内容
def get_content(self, version: str) -> Content

# 查找内容
def find_contents(
    self,
    query: dict | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> Iterable[Content]

# 插入内容
def insert_content(
    self, 
    version: str, 
    content_input: ContentInput, 
    upsert: bool = False
) -> Content

# 更新或插入内容
def upsert_content(self, version: str, content_input: ContentInput) -> Content

Content(内容)

内容代表块的识别结果,如 OCR 文本、LaTeX 公式等。

属性

属性 类型 说明
id str 内容唯一标识符
rid int 数据库行 ID
create_time int | None 创建时间戳
update_time int | None 更新时间戳
page_id str | None 所属页面 ID
block_id str 所属块 ID
version str 内容版本(如 ocr-v1, llm-gpt4)
format str 内容格式(如 text, html, latex, markdown)
content str 内容文本
is_human_label bool 是否为人工标注
tags list[str] 标签列表
attrs dict[str, AttrValueType] 属性字典
metrics dict[str, MetricValueType] 指标字典

动态属性

属性 类型 说明
page Page | None 获取所属页面
block Block 获取所属块

Value(值)

Value 用于存储元素的扩展数据,如嵌入向量等。

属性

属性 类型 说明
id str 值唯一标识符
rid int 数据库行 ID
elem_id str 所属元素 ID
key str 键名
type str 值类型(如 ndarray, json)
value Any 值内容

动态属性

属性 类型 说明
elem DocElement 获取所属元素

方法

# 解码值(如将 base64 编码的 ndarray 解码)
def decode(self) -> Value

Task(任务)

任务用于异步处理文档的各种操作。

属性

属性 类型 说明
id str 任务唯一标识符
rid int 数据库行 ID
target str 目标元素 ID
batch_id str 批次 ID
command str 命令名称
args dict[str, Any] 命令参数
priority int 优先级
status str 任务状态
create_user str 创建用户
update_user str | None 更新用户
error_message str | None 错误信息

DocClient API

初始化

client = DocClient(
    server_url: str | None = None,      # 服务器 URL
    prefix: str = "/api/v1",             # API 路径前缀
    timeout: int = 300,                   # 读取超时(秒)
    connect_timeout: int = 30,            # 连接超时(秒)
    decode_value: bool = True,            # 是否自动解码 Value
)

健康检查

# 检查服务器健康状态
def health_check(self, show_stats: bool = False) -> dict

读取操作

获取单个元素

# 获取文档
def get_doc(self, doc_id: str) -> Doc
def get_doc_by_pdf_path(self, pdf_path: str) -> Doc
def get_doc_by_pdf_hash(self, pdf_hash: str) -> Doc

# 获取页面
def get_page(self, page_id: str) -> Page
def get_page_by_image_path(self, image_path: str) -> Page
def get_page_by_image_hash(self, image_hash: str) -> Page

# 获取布局
def get_layout(self, layout_id: str, expand: bool = False) -> Layout
def get_layout_by_page_id_and_provider(
    self, page_id: str, provider: str, expand: bool = False
) -> Layout

# 获取块
def get_block(self, block_id: str) -> Block
def get_block_by_image_path(self, image_path: str) -> Block
def get_super_block(self, page_id: str) -> Block

# 获取内容
def get_content(self, content_id: str) -> Content
def get_content_by_block_id_and_version(self, block_id: str, version: str) -> Content

# 获取值
def get_value(self, value_id: str) -> Value
def get_value_by_elem_id_and_key(self, elem_id: str, key: str) -> Value

# 获取评估布局/内容
def get_eval_layout(self, eval_layout_id: str) -> EvalLayout
def get_eval_content(self, eval_content_id: str) -> EvalContent

Try 方法(不存在时返回 None)

def try_get(self, elem_id: str) -> DocElement | None
def try_get_doc(self, doc_id: str) -> Doc | None
def try_get_doc_by_pdf_path(self, pdf_path: str) -> Doc | None
def try_get_doc_by_pdf_hash(self, pdf_hash: str) -> Doc | None
def try_get_page(self, page_id: str) -> Page | None
def try_get_page_by_image_path(self, image_path: str) -> Page | None
def try_get_page_by_image_hash(self, image_hash: str) -> Page | None
def try_get_layout(self, layout_id: str, expand: bool = False) -> Layout | None
def try_get_layout_by_page_id_and_provider(
    self, page_id: str, provider: str, expand: bool = False
) -> Layout | None
def try_get_block(self, block_id: str) -> Block | None
def try_get_block_by_image_path(self, image_path: str) -> Block | None
def try_get_content(self, content_id: str) -> Content | None
def try_get_content_by_block_id_and_version(
    self, block_id: str, version: str
) -> Content | None
def try_get_value(self, value_id: str) -> Value | None
def try_get_value_by_elem_id_and_key(self, elem_id: str, key: str) -> Value | None
def try_get_user(self, name: str) -> User | None
def try_get_task(self, task_id: str) -> Task | None

查询元素

# 通用查找方法(返回流式迭代器)
def find(
    self,
    elem_type: ElemType | type,           # 元素类型: "doc", "page", "layout", "block", "content", "value"
    query: dict | list[dict] | None = None,  # 查询条件(MongoDB 风格)
    query_from: ElemType | type | None = None,  # 从指定类型查询
    skip: int | None = None,               # 跳过数量
    limit: int | None = None,              # 限制数量
) -> Iterable[Element]

# 统计元素数量
def count(
    self,
    elem_type: ElemType | type,
    query: dict | list[dict] | None = None,
    query_from: ElemType | type | None = None,
    estimated: bool = False,               # 是否使用估算
) -> int

# 获取字段的唯一值列表
def distinct_values(
    self,
    elem_type: ElemType,
    field: Literal["tags", "providers", "provider", "versions", "version"],
    query: dict | None = None,
) -> list[str]

类型特定的查找方法

def find_docs(
    self,
    query: dict | list[dict] | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> Iterable[Doc]

def find_pages(
    self,
    query: dict | list[dict] | None = None,
    doc_id: str | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> Iterable[Page]

def find_layouts(
    self,
    query: dict | list[dict] | None = None,
    page_id: str | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> Iterable[Layout]

def find_blocks(
    self,
    query: dict | list[dict] | None = None,
    page_id: str | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> Iterable[Block]

def find_contents(
    self,
    query: dict | list[dict] | None = None,
    page_id: str | None = None,
    block_id: str | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> Iterable[Content]

def find_values(
    self,
    query: dict | list[dict] | None = None,
    elem_id: str | None = None,
    key: str | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> Iterable[Value]

获取唯一值列表的便捷方法

def doc_tags(self) -> list[str]
def page_tags(self) -> list[str]
def page_providers(self) -> list[str]
def layout_providers(self) -> list[str]
def layout_tags(self) -> list[str]
def block_tags(self) -> list[str]
def block_versions(self) -> list[str]
def content_versions(self) -> list[str]
def content_tags(self) -> list[str]

写入操作

插入元素

# 插入文档
def insert_doc(self, doc_input: DocInput, skip_ext_check: bool = False) -> Doc

# 插入页面
def insert_page(self, page_input: PageInput, skip_hash_check: bool = False) -> Page

# 插入布局
def insert_layout(
    self, 
    page_id: str, 
    provider: str, 
    layout_input: LayoutInput, 
    insert_blocks: bool = False, 
    upsert: bool = False
) -> Layout

# 插入块
def insert_block(self, page_id: str, block_input: BlockInput) -> Block
def insert_blocks(self, page_id: str, blocks: list[BlockInput]) -> list[Block]
def insert_standalone_block(self, block_input: StandaloneBlockInput) -> Block

# 插入内容
def insert_content(
    self, 
    block_id: str, 
    version: str, 
    content_input: ContentInput, 
    upsert: bool = False
) -> Content

# 插入带内容的块布局
def insert_content_blocks_layout(
    self,
    page_id: str,
    provider: str,
    content_blocks: list[ContentBlockInput],
    upsert: bool = False,
) -> Layout

# 插入值
def insert_value(self, elem_id: str, key: str, value_input: ValueInput) -> Value

# 插入评估布局/内容
def insert_eval_layout(
    self,
    layout_id: str,
    provider: str,
    blocks: list[EvalLayoutBlock] | None = None,
    relations: list[dict] | None = None,
) -> EvalLayout

def insert_eval_content(
    self,
    content_id: str,
    version: str,
    format: str,
    content: str,
) -> EvalContent

便捷插入方法(从本地文件)

# 上传本地文件并插入文档
def insert_local_doc(self, local_pdf_path: str) -> Doc

# 上传本地图片并插入页面
def insert_local_page(self, local_image_path: str) -> Page

# 上传本地图片并插入独立块
def insert_local_block(self, type: str, local_image_path: str) -> Block

Upsert 方法(更新或插入)

def upsert_layout(
    self, page_id: str, provider: str, layout_input: LayoutInput, insert_blocks: bool = False
) -> Layout

def upsert_content(self, block_id: str, version: str, content_input: ContentInput) -> Content

标签与属性操作

单个元素操作

# 标签操作
def add_tag(self, elem_id: str, tag: str) -> None
def del_tag(self, elem_id: str, tag: str) -> None

# 属性操作
def add_attr(self, elem_id: str, name: str, attr_input: AttrInput) -> None
def add_attrs(self, elem_id: str, attrs: dict[str, AttrValueType]) -> None
def del_attr(self, elem_id: str, name: str) -> None

# 指标操作
def add_metric(self, elem_id: str, name: str, metric_input: MetricInput) -> None
def del_metric(self, elem_id: str, name: str) -> None

# 批量标签/属性/指标操作
def tagging(self, elem_id: str, tagging_input: TaggingInput) -> None

批量操作

def batch_add_tag(self, elem_type: ElemType, tag: str, elem_ids: list[str]) -> None
def batch_del_tag(self, elem_type: ElemType, tag: str, elem_ids: list[str]) -> None
def batch_tagging(self, elem_type: ElemType, inputs: list[TaggingInput]) -> None

任务操作

# 列出任务
def list_tasks(
    self,
    query: dict | None = None,
    target: str | None = None,
    batch_id: str | None = None,
    command: str | None = None,
    status: str | None = None,
    create_user: str | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> list[Task]

# 获取任务
def get_task(self, task_id: str) -> Task

# 插入任务
def insert_task(self, target_id: str, task_input: TaskInput) -> Task

# 抓取新任务(用于任务处理器)
def grab_new_tasks(self, command: str, num: int = 10, hold_sec: int = 3600) -> list[Task]
def grab_new_task(self, command: str, hold_sec: int = 3600) -> Task | None

# 更新任务状态
def update_task(
    self,
    task_id: str,
    command: str,
    status: Literal["done", "error", "skipped"],
    error_message: str | None = None,
    check_mismatch: bool = False,
    task: Task | None = None,
) -> None

def update_grabbed_task(
    self,
    task: Task,
    status: Literal["done", "error", "skipped"],
    error_message: str | None = None,
) -> None

# 统计任务
def count_tasks(self, command: str | None = None) -> list[TaskCount]

管理操作

用户管理

def list_users(self) -> list[User]
def get_user(self, name: str) -> User
def insert_user(self, user_input: UserInput) -> User
def update_user(self, name: str, user_update: UserUpdate) -> User

已知名称管理(标签/属性/指标定义)

def list_known_names(self) -> list[KnownName]
def insert_known_name(self, known_name_input: KnownNameInput) -> KnownName
def update_known_name(self, name: str, known_name_update: KnownNameUpdate) -> KnownName
def add_known_option(self, attr_name: str, option_name: str, option_input: KnownOptionInput) -> None
def del_known_option(self, attr_name: str, option_name: str) -> None

S3 存储桶管理

def list_s3_buckets(self) -> list[S3Bucket]

嵌入向量操作

# 列出嵌入模型
def list_embedding_models(self) -> list[EmbeddingModel]
def get_embedding_model(self, name: str) -> EmbeddingModel
def insert_embedding_model(self, embedding_model: EmbeddingModel) -> EmbeddingModel
def update_embedding_model(self, name: str, update: EmbeddingModelUpdate) -> EmbeddingModel

# 添加嵌入向量
def add_embeddings(
    self, 
    elem_type: EmbeddableElemType,  # "page" | "block"
    model: str, 
    embeddings: list[EmbeddingInput]
) -> None

# 搜索嵌入向量
def search_embeddings(
    self, 
    elem_type: EmbeddableElemType, 
    model: str, 
    query: EmbeddingQuery
) -> list[Embedding]

触发器操作

def list_triggers(self) -> list[Trigger]
def get_trigger(self, trigger_id: str) -> Trigger
def insert_trigger(self, trigger_input: TriggerInput) -> Trigger
def update_trigger(self, trigger_id: str, trigger_input: TriggerInput) -> Trigger
def delete_trigger(self, trigger_id: str) -> None

文件操作

# 读取文件
def read_file(self, file_path: str, allow_local: bool = True) -> bytes

# 读取图片
def read_image(self, file_path: str) -> PIL.Image.Image

# 获取文件大小
def stat_file_size(self, file_path: str, allow_local: bool = True) -> int

# 上传本地文件到 S3
def upload_local_file(
    self, 
    file_type: Literal["doc", "page", "block"], 
    file_path: str
) -> str

# 获取 S3 客户端
def get_s3_client(self, path: str) -> boto3.client

便捷方法

通用获取方法

# 根据 ID 前缀自动判断类型并获取元素
def get(self, elem_id: str) -> DocElement

迭代处理

# 并行迭代处理元素
def iterate(
    self,
    elem_type: ElemType | type,
    func: Callable[[int, DocElement], None],
    query: dict | list[dict] | None = None,
    query_from: ElemType | type | None = None,
    max_workers: int = 10,
    total: int | None = None,
) -> None

异常处理

DocClient 定义了以下异常类型:

异常类 说明
ElementNotFoundError 元素不存在
NotFoundError 资源不存在(通用)
DocExistsError 文档已存在(包含 pdf_path 和 pdf_hash)
PageExistsError 页面已存在(包含 image_path 和 image_hash)
ElementExistsError 元素已存在
AlreadyExistsError 资源已存在(通用)
UnauthorizedError 未授权
TaskMismatchError 任务状态不匹配

使用示例:

from doc_store.interface import ElementNotFoundError, DocExistsError

try:
    doc = client.get_doc("doc-xxx")
except ElementNotFoundError:
    print("文档不存在")

try:
    doc = client.insert_doc(DocInput(pdf_path="s3://..."))
except DocExistsError as e:
    print(f"文档已存在: {e.pdf_path}, hash: {e.pdf_hash}")
    doc = client.get_doc_by_pdf_hash(e.pdf_hash)

输入模型

DocInput

class DocInput:
    pdf_path: str                    # PDF 文件的 S3 路径(必填)
    pdf_filename: str | None = None  # PDF 文件名
    orig_path: str | None = None     # 原始文件路径(Word/PPT)
    orig_filename: str | None = None # 原始文件名
    tags: list[str] | None = None    # 标签列表

PageInput

class PageInput:
    image_path: str                   # 图片的 S3 路径(必填)
    image_dpi: int | None = None      # 图片 DPI
    doc_id: str | None = None         # 所属文档 ID
    page_idx: int | None = None       # 页码索引
    tags: list[str] | None = None     # 标签列表

LayoutInput

class LayoutInput:
    blocks: list[ContentBlockInput]   # 块列表(必填)
    masks: list[MaskBlock] = []       # 遮罩列表
    relations: list[dict] | None = None  # 关系列表
    is_human_label: bool = False      # 是否人工标注
    tags: list[str] | None = None     # 标签列表

BlockInput

class BlockInput:
    type: str                         # 块类型(必填)
    bbox: list[float]                 # 归一化边界框 [x1, y1, x2, y2](必填)
    angle: Literal[0, 90, 180, 270] | None = None  # 旋转角度
    score: float | None = None        # 置信度分数
    tags: list[str] | None = None     # 标签列表

ContentBlockInput

class ContentBlockInput(BlockInput):
    format: str | None = None         # 内容格式
    content: str | None = None        # 内容文本
    content_tags: list[str] | None = None  # 内容标签

StandaloneBlockInput

class StandaloneBlockInput:
    type: str                         # 块类型(必填)
    image_path: str                   # 图片的 S3 路径(必填)
    tags: list[str] | None = None     # 标签列表

ContentInput

class ContentInput:
    format: str                       # 内容格式(必填)
    content: str                      # 内容文本(必填)
    is_human_label: bool = False      # 是否人工标注
    tags: list[str] | None = None     # 标签列表

ValueInput

class ValueInput:
    value: Any                        # 值(必填)
    type: str | None = None           # 值类型

AttrInput

class AttrInput:
    value: str | list[str] | int | bool  # 属性值(必填)

MetricInput

class MetricInput:
    value: float | int                # 指标值(必填)

TaggingInput

class TaggingInput:
    elem_id: str | None = None        # 元素 ID(批量操作时必填)
    tags: list[str] | None = None     # 要添加的标签
    attrs: dict[str, AttrValueType] | None = None  # 要添加的属性
    metrics: dict[str, MetricValueType] | None = None  # 要添加的指标
    del_tags: list[str] | None = None  # 要删除的标签
    del_attrs: list[str] | None = None  # 要删除的属性
    del_metrics: list[str] | None = None  # 要删除的指标

TaskInput

class TaskInput:
    command: str                      # 命令名称(必填)
    args: dict[str, Any] | None = None  # 命令参数
    priority: int = 0                 # 优先级
    batch_id: str | None = None       # 批次 ID

EmbeddingInput

class EmbeddingInput:
    elem_id: str                      # 元素 ID(必填)
    vector: list[float]               # 嵌入向量(必填)

EmbeddingQuery

class EmbeddingQuery:
    vector: list[float]               # 查询向量(必填)
    k: int                            # 返回数量(必填)
    show_vector: bool = False         # 是否返回向量

TriggerInput

class TriggerInput:
    name: str                         # 触发器名称(必填)
    description: str                  # 描述(必填)
    condition: TriggerCondition       # 触发条件(必填)
    actions: list[TriggerAction]      # 触发动作列表(必填)
    disabled: bool = False            # 是否禁用
    display_order: float | None = None  # 显示顺序

元素通用方法

所有继承自 DocElement 的对象(Doc, Page, Layout, Block, Content)都具有以下通用方法:

标签操作

def add_tag(self, tag: str) -> None
def del_tag(self, tag: str) -> None

属性操作

def add_attr(self, name: str, attr_input: AttrInput) -> None
def add_attrs(self, attrs: dict[str, AttrValueType]) -> None
def del_attr(self, name: str) -> None

指标操作

def add_metric(self, name: str, metric_input: MetricInput) -> None
def del_metric(self, name: str) -> None

批量标签/属性/指标操作

def tagging(self, tagging_input: TaggingInput) -> None

Value 操作

def try_get_value(self, key: str) -> Value | None
def get_value(self, key: str) -> Value
def find_values(
    self,
    query: dict | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> Iterable[Value]
def insert_value(self, key: str, value_input: ValueInput) -> Value

Task 操作

def list_tasks(
    self,
    query: dict | None = None,
    batch_id: str | None = None,
    command: str | None = None,
    status: str | None = None,
    create_user: str | None = None,
    skip: int | None = None,
    limit: int | None = None,
) -> list[Task]
def insert_task(self, task_input: TaskInput) -> Task

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc_store-0.7.9.tar.gz (7.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doc_store-0.7.9-py3-none-any.whl (7.3 MB view details)

Uploaded Python 3

File details

Details for the file doc_store-0.7.9.tar.gz.

File metadata

  • Download URL: doc_store-0.7.9.tar.gz
  • Upload date:
  • Size: 7.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for doc_store-0.7.9.tar.gz
Algorithm Hash digest
SHA256 4eb3db1b804ba01fd5d801d51de1d1ef60da737deb81c285d912053a18a3229e
MD5 aee396f887c6a7b2582003d3ac8b7b81
BLAKE2b-256 cfa8e199198f942ce551a9f03fd653a2cb4126ce5f7e7c5f8c7c938d01402d61

See more details on using hashes here.

File details

Details for the file doc_store-0.7.9-py3-none-any.whl.

File metadata

  • Download URL: doc_store-0.7.9-py3-none-any.whl
  • Upload date:
  • Size: 7.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for doc_store-0.7.9-py3-none-any.whl
Algorithm Hash digest
SHA256 347476d65fc3316c0eb968947af7f38617c58587bbf872272a9ad1710a68e959
MD5 6125ba046ae6f23aa8177ea2bb5d4a98
BLAKE2b-256 33012e385389d0b17c118161373735b0220312cf319eddc35fb43622d115384e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page