SciVerse Agent Tools — OpenAI/Anthropic/LangChain compatible tool schema and async client for SciVerse retrieval APIs
Project description
sciverse
SciVerse open-platform Python SDK + CLI for academic paper retrieval. Wraps
five retrieval tools (search_papers, semantic_search, read_content,
list_catalog, get_resource) behind one async client + ready-to-use
OPENAI_TOOLS / ANTHROPIC_TOOLS constants for direct tool-calling.
工具:
search_papers(结构化元数据)/semantic_search(语义检索)/read_content(原文切片)/list_catalog(字段 introspection)/get_resource(论文图片二进制)
English
Install
pip install sciverse
# or, if you only want the CLI:
pipx install sciverse
Configure once (no env vars needed afterwards)
sciverse auth login
# - opens https://sciverse.space/tokens in your browser
# - paste the token you create
# - saved to ~/.sciverse/credentials.json (file mode 0600)
After this any AgentToolsClient() without explicit args picks it up
automatically. Override hierarchy: explicit arg → SCIVERSE_API_TOKEN env →
credentials file → default.
CLI
sciverse auth login [--token <t>] [--endpoint <url>] [--no-browser]
sciverse auth status # show masked token / endpoint / saved_at
sciverse auth logout # delete the credentials file
--token is useful in CI scripts. --no-browser is for remote / headless
boxes.
Quick start
import asyncio
from sciverse import AgentToolsClient
async def main():
async with AgentToolsClient() as c: # token + endpoint auto-resolved
r = await c.semantic_search(query="Transformer attention mechanism", top_k=3)
for hit in r["hits"]:
print(hit["doc_id"], hit["score"], hit["title"])
asyncio.run(main())
Long-lived client (web server / agent runtime)
client = AgentToolsClient() # construct once at startup
try:
while serving:
r = await client.semantic_search(query=...)
...
finally:
await client.aclose() # release the underlying httpx connection pool
Five retrieval tools
# 1. Structured metadata search (Boolean filters + sort + pagination)
await c.search_papers(
query="transformer", # full-text BM25 (optional)
authors=["Hinton"],
year_from=2020, year_to=2024,
journals=["Nature", "Science"],
sort_by_year="desc", # "desc" / "asc" / "none"
page_size=10,
)
# 2. Natural-language semantic search (vector + BM25 hybrid, returns chunks)
await c.semantic_search(query="How does attention work?", top_k=10, mode="balanced")
# 3. Byte-range read of original paper text
# (use doc_id + offset from semantic_search hits)
await c.read_content(doc_id="p_xxx", offset=0, limit=8192)
# 4. Schema introspection — call once to discover field names + enum values
await c.list_catalog(include_sample_values=True)
# 5. Fetch a paper figure / table image (when read_content Markdown contains
#  placeholders)
bytes_, mime_type = await c.get_resource(file_name="dt=xxx/p_yyy/f3.png")
Use with Anthropic / OpenAI tool-calling
The SDK exports ready-to-use tool schemas matching each provider's spec —
drop straight into messages.create(tools=...) or
chat.completions.create(tools=...).
from anthropic import Anthropic
from sciverse import ANTHROPIC_TOOLS, AgentToolsClient
anthropic = Anthropic()
async with AgentToolsClient() as sv:
messages = [{"role": "user", "content": "Find 3 transformer papers"}]
resp = anthropic.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
tools=ANTHROPIC_TOOLS,
messages=messages,
)
# ... handle tool_use blocks by dispatching to sv.search_papers / ...
from openai import OpenAI
from sciverse import OPENAI_TOOLS, AgentToolsClient
openai = OpenAI()
async with AgentToolsClient() as sv:
resp = openai.chat.completions.create(
model="gpt-4o",
tools=OPENAI_TOOLS,
messages=[{"role": "user", "content": "Find 3 transformer papers"}],
)
# ... handle tool_calls similarly
For Claude Agent SDK / OpenAI Agents SDK (agent loop handled by framework),
see sciverse-mcp-server.
Error handling
Non-2xx responses raise httpx.HTTPStatusError. Platform error body:
{code, message, request_id}.
import httpx
try:
await c.search_papers(query="x")
except httpx.HTTPStatusError as e:
print(e.response.status_code, e.response.text)
| HTTP | Meaning |
|---|---|
| 400 | Bad request (unknown field, conflicting query+sort, ...) |
| 401 | Token missing / invalid / user disabled |
| 403 | Field permission denied |
| 429 | Rate limit (60 req / 60s per user, shared across protected endpoints) |
| 502 | Upstream metadata-service unavailable |
Typed request models (optional)
from sciverse.types import SearchPapersRequest, SemanticSearchRequest
# Pydantic v2 models — for explicit validation when constructing requests.
Links
- Source repo: https://github.com/opendatalab/SciVerse-agent-tools
- Changelog: https://github.com/opendatalab/SciVerse-agent-tools/blob/main/CHANGELOG.md
- Console (get a token): https://sciverse.space
- License: Apache-2.0
中文
SciVerse 开放平台 Python SDK + CLI,提供 5 个学术文献检索 tool(结构化元数据、 语义检索、原文切片、字段 introspection、论文图片)。
安装
pip install sciverse
# 只想用 CLI 时:
pipx install sciverse
登录(只跑一次,后续 SDK 无需再传 token)
sciverse auth login
# - 浏览器打开 https://sciverse.space/tokens
# - 复制控制台生成的 token,粘贴回 CLI
# - 保存到 ~/.sciverse/credentials.json(文件权限 0600)
之后任何 AgentToolsClient() 不传 token 自动 fallback 读取。优先级:
显式参数 → SCIVERSE_API_TOKEN 环境变量 → 凭据文件 → 默认值。
CLI
sciverse auth login [--token <t>] [--endpoint <url>] [--no-browser]
sciverse auth status # 查看打码后的 token、endpoint、保存时间
sciverse auth logout # 删凭据文件
--token 用于 CI 脚本场景(跳过交互式粘贴)。--no-browser 适合远程 / 无桌面环境。
快速开始
import asyncio
from sciverse import AgentToolsClient
async def main():
async with AgentToolsClient() as c: # token + endpoint 自动解析
r = await c.semantic_search(query="Transformer 注意力机制", top_k=3)
for hit in r["hits"]:
print(hit["doc_id"], hit["score"], hit["title"])
asyncio.run(main())
长生命周期 client(web server / agent runtime 场景)
client = AgentToolsClient() # 启动时构造一次
try:
while serving:
r = await client.semantic_search(query=...)
...
finally:
await client.aclose() # 显式关闭底层 httpx 连接池
5 个检索 tool
# 1. 结构化元数据查询(布尔过滤 + 排序 + 分页)
await c.search_papers(
query="transformer", # 全文 BM25(可选)
authors=["Hinton"],
year_from=2020, year_to=2024,
journals=["Nature", "Science"],
sort_by_year="desc", # "desc" / "asc" / "none"
page_size=10,
)
# 2. 自然语言语义检索(向量 + BM25 混合,返回 chunk)
await c.semantic_search(query="注意力机制如何工作?", top_k=10, mode="balanced")
# 3. 按字节区间读原文(配合 semantic_search 返回的 doc_id + offset 用)
await c.read_content(doc_id="p_xxx", offset=0, limit=8192)
# 4. 字段 introspection —— Agent 接入第一步先调一次拿 schema + 枚举值
await c.list_catalog(include_sample_values=True)
# 5. 取文献附属图片(当 read_content 的 Markdown 含  占位时)
bytes_, mime_type = await c.get_resource(file_name="dt=xxx/p_yyy/f3.png")
接入 Anthropic / OpenAI tool calling
SDK 内嵌了对应 provider 格式的 tool schema 常量,可直接喂给
messages.create(tools=...) / chat.completions.create(tools=...):
from anthropic import Anthropic
from sciverse import ANTHROPIC_TOOLS, AgentToolsClient
anthropic = Anthropic()
async with AgentToolsClient() as sv:
messages = [{"role": "user", "content": "找 3 篇 Transformer 论文"}]
resp = anthropic.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
tools=ANTHROPIC_TOOLS,
messages=messages,
)
# ... 在 tool_use block 里分发到 sv.search_papers / sv.semantic_search / ...
from openai import OpenAI
from sciverse import OPENAI_TOOLS, AgentToolsClient
openai = OpenAI()
async with AgentToolsClient() as sv:
resp = openai.chat.completions.create(
model="gpt-4o",
tools=OPENAI_TOOLS,
messages=[{"role": "user", "content": "找 3 篇 Transformer 论文"}],
)
# ... 同理 dispatch tool_calls
Claude Agent SDK / OpenAI Agents SDK 写起来更简单 —— 它们接受 MCP server 配置,
agent loop 全权处理。详见 sciverse-mcp-server。
错误处理
非 2xx 响应抛 httpx.HTTPStatusError。平台错误体格式 {code, message, request_id}:
import httpx
try:
await c.search_papers(query="x")
except httpx.HTTPStatusError as e:
print(e.response.status_code, e.response.text)
| HTTP | 含义 |
|---|---|
| 400 | 请求参数错误(未知字段 / query 与 sort 冲突等) |
| 401 | Token 缺失 / 无效 / 用户被禁用 |
| 403 | 字段权限不足 |
| 429 | 用户级限流(60 请求 / 60 秒,受保护接口共享额度) |
| 502 | 上游 metadata-service 不可用 |
类型化请求构造(可选)
from sciverse.types import SearchPapersRequest, SemanticSearchRequest
# Pydantic v2 模型,需要显式校验构造时用。
链接
- 源码仓库:https://github.com/opendatalab/SciVerse-agent-tools
- 变更日志:https://github.com/opendatalab/SciVerse-agent-tools/blob/main/CHANGELOG.md
- 控制台申请 Token:https://sciverse.space
- 协议:Apache-2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sciverse-0.3.0.tar.gz.
File metadata
- Download URL: sciverse-0.3.0.tar.gz
- Upload date:
- Size: 37.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2188a4598d616ef4f77651ab370360a4e7e9bc15d95d3fba6d319ec3db2d6594
|
|
| MD5 |
7daea331020ed704b0e0514ecd882e47
|
|
| BLAKE2b-256 |
230b0be98df3e71e445b6b027e8daa8d0f673d4c84829face3803e5c427c9611
|
File details
Details for the file sciverse-0.3.0-py3-none-any.whl.
File metadata
- Download URL: sciverse-0.3.0-py3-none-any.whl
- Upload date:
- Size: 18.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02d086eab67dc407ba751c745d0c3466e54b87e845b77ce813ed6ee282f6128b
|
|
| MD5 |
0527092e44dac6706b72f4cca0cc85fb
|
|
| BLAKE2b-256 |
d93232d78dbcfff8fd9e07e44fdce48acade84bc4a4139a5b5af94836d9ccb45
|