面向 Agent 和 RAG 的文档处理 SDK
Project description
xParse Client
面向 Agent 和 RAG 的文档解析 Python SDK
目录
SDK 安装
[!NOTE] 本 SDK 支持 Python 3.9 及以上版本。
uv(推荐)
uv add xparse-client
pip
pip install xparse-client
快速开始
API 概览
| API | 用途 | 返回值 |
|---|---|---|
client.parse.run() |
同步解析文档 | ParseResponse |
client.parse.create_job() |
创建异步解析任务 | AsyncJobResponse |
client.parse.get_job() |
查询异步任务状态 | JobStatusResponse |
client.parse.wait_job() |
轮询等待异步任务终态 | JobStatusResponse |
1. 环境配置
export TEXTIN_APP_ID="your-app-id"
export TEXTIN_SECRET_CODE="your-secret-code"
可以在 TextIn 开发者控制台 获取认证凭证。
2. 同步解析
from xparse_client import XParseClient, ParseConfig, Capabilities, Scope
client = XParseClient()
with open("document.pdf", "rb") as f:
result = client.parse.run(
file=f,
filename="document.pdf",
config=ParseConfig(
capabilities=Capabilities(
include_table_structure=True,
title_tree=True,
),
scope=Scope(page_range="1-10"),
),
)
print(f"解析出 {len(result.elements)} 个元素")
# 访问 markdown
if result.markdown:
print(result.markdown)
# 遍历元素
for el in result.elements:
print(f"[{el.type}] {el.text[:80]}")
3. 异步任务
处理大文件时使用服务端异步任务:
with open("large_document.pdf", "rb") as f:
job = client.parse.create_job(
file=f,
filename="large_document.pdf",
webhook="https://example.com/callback", # 可选
)
print(f"任务已创建: {job.job_id}")
result = client.parse.wait_job(job_id=job.job_id, timeout=300.0, poll_interval=5.0)
if result.is_completed:
# 异步任务返回 result_url,需要另外下载获取解析结果
import httpx
resp = httpx.get(result.result_url)
print(resp.json())
配置说明
认证配置
SDK 按以下优先级自动解析凭证:构造参数 > 环境变量 > .env 文件
# 方式 1:环境变量 + 无参构造(推荐)
client = XParseClient()
# 方式 2:直接传参
client = XParseClient(
app_id="your-app-id",
secret_code="your-secret-code",
)
# 方式 3:.env 文件(需安装 pip install xparse-client[dotenv])
client = XParseClient()
超时和重试
client = XParseClient(
timeout=120.0, # 请求超时时间(秒),默认 630
max_retries=3, # 最大重试次数,默认 3
)
自定义 API 地址
client = XParseClient(
server_url="https://custom-api.example.com"
)
自定义 HTTP 客户端
可以传入 httpx.Client 来自定义代理、SSL 证书等底层网络配置,SDK 会自动处理认证、重试和错误映射:
import httpx
http_client = httpx.Client(
proxy="http://proxy.example.com:8080",
verify="/path/to/custom-ca.pem",
)
client = XParseClient(
app_id="your-app-id",
secret_code="your-secret-code",
http_client=http_client,
)
资源管理
with XParseClient() as client:
result = client.parse.run(...)
# 退出时自动关闭连接
错误处理
错误类层次
HTTP 层错误:
| 错误类 | 说明 |
|---|---|
XParseClientError |
基础错误类,捕获所有 SDK 错误 |
ValidationError |
客户端参数验证失败 |
ServerError |
服务器错误 (HTTP 5xx) |
APIError |
API 请求错误(基类) |
业务层错误(HTTP 200 + 业务 code):
| 错误类 | 业务码 | 说明 |
|---|---|---|
AuthenticationError |
40101/40102 | 认证失败 |
PermissionDeniedError |
40103 | IP 不在白名单 |
InsufficientBalanceError |
40003 | 余额不足 |
InvalidParameterError |
40004 | 参数错误 |
UnsupportedFileTypeError |
40301 | 文件类型不支持 |
FileSizeError |
40302 | 文件过大(限制 500MB) |
CorruptedFileError |
40422 | 文件损坏 |
PasswordProtectedError |
40423 | PDF 需要密码 |
ServiceUnavailableError |
30203 | 服务暂时不可用 |
错误处理示例
from xparse_client.exceptions import (
XParseClientError, BusinessError, AuthenticationError, APIError
)
try:
with open("document.pdf", "rb") as f:
result = client.parse.run(file=f, filename="document.pdf")
except AuthenticationError as e:
print(f"认证失败: {e.message}")
except BusinessError as e:
print(f"业务错误 [{e.business_code}]: {e.message}, x_request_id={e.x_request_id}")
except APIError as e:
print(f"API 错误 [HTTP {e.status_code}]: {e.message}, x_request_id={e.x_request_id}")
except XParseClientError as e:
print(f"SDK 错误: {e.message}")
获取请求 ID
每个 API 请求都会返回 x_request_id,联系技术支持时提供此 ID 可加快问题定位:
result = client.parse.run(file=f, filename="document.pdf")
print(f"x_request_id={result.x_request_id}")
调试与日志
import logging
logging.getLogger("xparse_client").setLevel(logging.DEBUG)
本地开发
git clone https://github.com/intsig-textin/xparse-python-client.git
cd xparse-python-client
uv sync --dev
make test
make format
常用命令
make test # 运行所有测试
make test-unit # 运行单元测试
make test-cov # 代码覆盖率
make format # 代码格式化
make lint # 代码检查
相关资源
- 完整文档 | GitHub | PyPI
- TextIn 开发者控制台 | 问题反馈
故障排查
| 问题 | 解决方案 |
|---|---|
AuthenticationError |
检查 TEXTIN_APP_ID 和 TEXTIN_SECRET_CODE |
FileSizeError |
文件限制 500MB |
TimeoutException |
增加超时:XParseClient(timeout=300.0) |
许可证
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
xparse_client-0.3.0b31.tar.gz
(94.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xparse_client-0.3.0b31.tar.gz.
File metadata
- Download URL: xparse_client-0.3.0b31.tar.gz
- Upload date:
- Size: 94.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a19c28d743a77e5c9ea9b15776a546eed7058f489b0ed23605852d14d35f1fa9
|
|
| MD5 |
6a8e4edd82e4ea5133358c1057c5cd34
|
|
| BLAKE2b-256 |
6f5871f33c2ed03799d9015eee28578404fe7fada15063bf7a6eb9d2dd158449
|
File details
Details for the file xparse_client-0.3.0b31-py3-none-any.whl.
File metadata
- Download URL: xparse_client-0.3.0b31-py3-none-any.whl
- Upload date:
- Size: 126.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff71d3c29f237cd6002e153225ee298acd0677079a368bc193cec6aff43e0f13
|
|
| MD5 |
61c62e71148b4ec970f53d73e671d29e
|
|
| BLAKE2b-256 |
be82d87a4ba96a7c490933e106ae3af3a753918414f9b12f92a4720ce32c1358
|