Add your description here
Project description
Extract Agent
本 README 只给最短操作流程。
场景:从本地 PDF 准备 workspace,然后运行 agentic-extract 迭代。
0. 快速开始
命令行常用用法见 agentic-extract CLI
维护者发布流程见 release-process skill
1. 安装
uv tool install extract-agent
安装 extract-agent 后会同时暴露这些命令:
agentic-extractxdevxdev-configpdf-ai-explorertree-sitter-cli
2. 配置
~/.config/xdev/config.json
{
"pdf_parse_concurrent": 1
}
PDF 解析默认使用本机 ppx 命令(来自 memect-ppx),请先确认 ppx parse <pdf> 可用。pdf_parse_concurrent 控制批量 PDF 时 PPX 同时解析多少个文件。
~/.config/agentic-extract/config.json
{
"model": "openai/gpt-4.1",
"api_base": "https://api.openai.com/v1",
"api_key": "YOUR_API_KEY"
}
也可以直接运行:
xdev-config
它会同时写入全局 agentic-extract / xdev 配置。
如果后端是 OpenAI 兼容接口上的 GLM 等模型,仍写成 openai/<model_name>。但官方 DeepSeek API 建议显式写成 deepseek/<model_name>,这样能启用 DeepSeek 专用 formatter,在 thinking + tools 场景里正确保留 reasoning_content。例如:
{
"model": "openai/GLM-5",
"api_base": "http://your-openai-compatible-endpoint",
"api_key": "YOUR_API_KEY"
}
官方 DeepSeek API 示例:
{
"model": "deepseek/deepseek-v4-pro",
"api_base": "https://api.deepseek.com/v1",
"api_key": "YOUR_DEEPSEEK_API_KEY"
}
3. 一键模式
如果你不想手动拆成“导入数据 + 运行”两步,可以直接:
agentic-extract auto \
--workspace /path/to/workspace \
--pdfs-dir /path/to/pdfs
4. 创建 workspace
xdev init /path/to/workspace
cd /path/to/workspace
5. 从本地 PDF 导入数据
xdev import-data --pdfs /path/to/pdfs
导入后先看一下:
xdev list
6. 调试
看单个文档:
xdev doc <doc_id>
运行单个文档:
xdev run <doc_id>
xdev run --workspace /path/to/workspace --pdf /path/to/file.pdf
xdev run --workspace /path/to/workspace --docjson /path/to/file.json
--docjson 会自动识别 canonical DocJSON 和 PPX DocJSON;不需要指定格式。
评估:
xdev eval
7. 跑 agentic-extract
agentic-extract run --workspace /path/to/workspace
如果只想先检查配置和连通性:
agentic-extract run --workspace /path/to/workspace --dry-run
如果想清空运行状态后重新开始:
agentic-extract run --workspace /path/to/workspace --reset
8. 后续继续迭代
新增 PDF:
xdev import-data --add-pdf /path/to/new.pdf
重新解析:
xdev import-data --reparse
然后继续跑:
agentic-extract run --workspace /path/to/workspace
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file extract_agent-0.4.0.tar.gz.
File metadata
- Download URL: extract_agent-0.4.0.tar.gz
- Upload date:
- Size: 11.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9af17c2b14bc1f4aaf85bff882ba3cb131b664df79da94c2903fd1ac186bf00a
|
|
| MD5 |
8deee6a35ec156d9cc2b27dfa5b24d6e
|
|
| BLAKE2b-256 |
16e84e2dda83993d2a4434d955c19e5c7f6917965e3b80300e2e4b7f75e570cf
|
File details
Details for the file extract_agent-0.4.0-py3-none-any.whl.
File metadata
- Download URL: extract_agent-0.4.0-py3-none-any.whl
- Upload date:
- Size: 2.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72b20ab4282f32403906db4f5c79f222ded02de955934b45f6de027b4da606cc
|
|
| MD5 |
258b6e885280935147c9e8321b4184c2
|
|
| BLAKE2b-256 |
085d54500f39120291e82cfe1fd1f29f976ce3cf4881cd76361de535bfe4b089
|