Skip to main content

Add your description here

Project description

Extract Agent

本 README 只给最短操作流程。

场景:从本地 PDF 准备 workspace,然后运行 agentic-extract 迭代。

0. 快速开始

quick start

命令行常用用法见 agentic-extract CLI

维护者发布流程见 release-process skill

1. 安装

uv tool install extract-agent

安装 extract-agent 后会同时暴露这些命令:

  • agentic-extract
  • xdev
  • xdev-config
  • pdf-ai-explorer
  • tree-sitter-cli

2. 配置

~/.config/xdev/config.json

{
  "pdf_parse_concurrent": 1
}

PDF 解析默认使用本机 ppx 命令(来自 memect-ppx),请先确认 ppx parse <pdf> 可用。pdf_parse_concurrent 控制批量 PDF 时 PPX 同时解析多少个文件。

~/.config/agentic-extract/config.json

{
  "model": "openai/gpt-4.1",
  "api_base": "https://api.openai.com/v1",
  "api_key": "YOUR_API_KEY"
}

也可以直接运行:

xdev-config

它会同时写入全局 agentic-extract / xdev 配置。

如果后端是 OpenAI 兼容接口上的 GLM 等模型,仍写成 openai/<model_name>。但官方 DeepSeek API 建议显式写成 deepseek/<model_name>,这样能启用 DeepSeek 专用 formatter,在 thinking + tools 场景里正确保留 reasoning_content。例如:

{
  "model": "openai/GLM-5",
  "api_base": "http://your-openai-compatible-endpoint",
  "api_key": "YOUR_API_KEY"
}

官方 DeepSeek API 示例:

{
  "model": "deepseek/deepseek-v4-pro",
  "api_base": "https://api.deepseek.com/v1",
  "api_key": "YOUR_DEEPSEEK_API_KEY"
}

3. 一键模式

如果你不想手动拆成“导入数据 + 运行”两步,可以直接:

agentic-extract auto \
  --workspace /path/to/workspace \
  --pdfs-dir /path/to/pdfs

4. 创建 workspace

xdev init /path/to/workspace
cd /path/to/workspace

5. 从本地 PDF 导入数据

xdev import-data --pdfs /path/to/pdfs

导入后先看一下:

xdev list

6. 调试

看单个文档:

xdev doc <doc_id>

运行单个文档:

xdev run <doc_id>
xdev run --workspace /path/to/workspace --pdf /path/to/file.pdf
xdev run --workspace /path/to/workspace --docjson /path/to/file.json

--docjson 会自动识别 canonical DocJSON 和 PPX DocJSON;不需要指定格式。

评估:

xdev eval

7. 跑 agentic-extract

agentic-extract run --workspace /path/to/workspace

如果只想先检查配置和连通性:

agentic-extract run --workspace /path/to/workspace --dry-run

如果想清空运行状态后重新开始:

agentic-extract run --workspace /path/to/workspace --reset

8. 后续继续迭代

新增 PDF:

xdev import-data --add-pdf /path/to/new.pdf

重新解析:

xdev import-data --reparse

然后继续跑:

agentic-extract run --workspace /path/to/workspace

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extract_agent-0.4.1.tar.gz (11.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

extract_agent-0.4.1-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file extract_agent-0.4.1.tar.gz.

File metadata

  • Download URL: extract_agent-0.4.1.tar.gz
  • Upload date:
  • Size: 11.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for extract_agent-0.4.1.tar.gz
Algorithm Hash digest
SHA256 565e874d71517a53c7e10668b52236bc0d2dea51bd0a4c20b32bfdd3683b8a45
MD5 d809c357905b0af24096229f3417f27e
BLAKE2b-256 94c70aefd6138b0510e723552fc61ded3e1386e9e28f0c30b9ddd8da3f09249a

See more details on using hashes here.

File details

Details for the file extract_agent-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: extract_agent-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for extract_agent-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 68af4781fcdb9664589dc22a8617c2fdfe10e2ba016126d8c256074d026c9c12
MD5 e43f30014ae8f31a87331d9aa630b862
BLAKE2b-256 4748140409ffd880c9e4cfd6ea9e42d4848df75debbd19818f29f448b5ab9206

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page