LLM Agent experimentation harness for SWE tasks
Project description
llm-harness-swe
LLM Agent 实验基座。提供多 Agent 协作基础设施,用于 SWE(Software Engineering)领域的 LLM Agent 实验。
安装
pip install -e ".[anthropic,openai]"
核心概念
| 概念 | 说明 |
|---|---|
| Harness | IoC 容器,装配 provider、tools、memory、tracker、skills |
| AgentLoop | ReAct 循环:LLM 推理 → 工具调用 → 执行 → 重复 |
| AgentDefinition | Agent 角色定义(system_prompt、工具集、模型、最大轮数) |
| SubagentManager | 子 Agent 管理器,上下文隔离发射,spawn 工具排除防递归 |
| SessionMemory | 会话级记忆,每样本隔离,存当前样本的分析、尝试轨迹 |
| ProjectMemory | 项目级记忆,跨会话共享,存通用模式和经验 |
| Tracker | 会话级 JSONL 轨迹记录,记录所有 LLM 推理和工具调用 |
| Skill | 可挂载的诊断知识包(SOP 文档 + 参考 + 脚本) |
快速开始
import asyncio
from llm_harness_swe import (
Harness, Config, AgentConfig, AgentDefinition,
ProjectMemory,
)
# 1. 装配 Harness
config = Config(
agent=AgentConfig(model="claude-sonnet-4-6", provider="anthropic"),
memory={"project_memory_dir": "./my_project/memory"},
)
harness = Harness.from_config(config)
# 2. 注册 Agent 角色
harness.agent_definitions = {
"analyzer": AgentDefinition(
name="analyzer",
description="分析代码结构,理解项目上下文",
system_prompt="你是代码分析专家。用 glob、grep、ast、git 理解项目结构。",
tools=["read_file", "glob", "grep", "ast", "git"],
),
"generator": AgentDefinition(
name="generator",
description="生成和修改代码",
system_prompt="你是代码生成专家。先读相关文件,再写代码,最后验证。",
tools=["read_file", "write_file", "edit_file", "exec"],
max_turns=20,
),
}
# 3. 为每个样本创建会话
async def run_sample(sample_id: str, prompt: str):
session_dir = f"./my_project/sessions/{sample_id}"
harness.create_session(session_dir)
await harness.start_session()
result = await harness.process(
user_prompt=prompt,
system_prompt="你是主 Agent。根据任务类型选择合适的子 Agent。",
)
await harness.stop_session()
return result
# 4. 运行
result = asyncio.run(run_sample("func_001", "为 calculate_tax 函数生成单元测试"))
print(result)
工具集
| 工具 | 类型 | 说明 |
|---|---|---|
read_file |
文件 | 读取文件内容(含图片) |
write_file |
文件 | 写入新文件 |
edit_file |
文件 | 精确字符串替换编辑 |
list_dir |
文件 | 列出目录内容 |
exec |
系统 | 执行 shell 命令(可选沙箱隔离) |
glob |
搜索 | 文件模式匹配 |
grep |
搜索 | 正则表达式内容搜索 |
git |
版本 | 只读 git 操作(diff/log/blame/status/show) |
ast |
分析 | Python AST 解析(函数/类/导入/结构) |
web_search |
网络 | 网页搜索 |
web_fetch |
网络 | 获取网页内容 |
skill |
知识 | 调用注册的 Skill |
spawn |
Agent | 发射上下文隔离的子 Agent(运行时注入) |
memory_read |
记忆 | 读取分层记忆(运行时注入) |
memory_write |
记忆 | 写入分层记忆(运行时注入) |
定制
自定义 Tool
from llm_harness_swe import BaseTool, ToolResult, ToolExecutionContext
from pydantic import BaseModel, Field
class MyInput(BaseModel):
param: str = Field(..., description="参数说明")
class MyTool(BaseTool):
name = "my_tool"
description = "工具描述,LLM 根据此描述决定何时调用"
input_model = MyInput
async def execute(self, args: MyInput, ctx: ToolExecutionContext) -> ToolResult:
return ToolResult(output=f"结果: {args.param}")
def is_read_only(self, args: MyInput) -> bool:
return True
# 注册
harness.tools.register(MyTool())
自定义 Skill
Skill 是一个目录,包含 SKILL.md(SOP 文档)和可选的参考文件、脚本:
my_skills/
└── diagnose-coverage/
├── SKILL.md # SOP 诊断流程
├── references/
│ └── patterns.md # 常见未覆盖模式
└── scripts/
└── analyze.py # AST 分析脚本
from llm_harness_swe import load_skills_from_dirs
skills = load_skills_from_dirs(["my_skills/", "other_skills/"])
harness = Harness(..., skills=skills)
样本调度器
基座不提供调度器,实验人员自行实现:
for sample_id, prompt in dataset:
session_dir = f"./workspace/sessions/{sample_id}"
harness.create_session(session_dir)
await harness.start_session()
result = await harness.process(prompt, system_prompt=SOP)
await harness.stop_session()
save_result(sample_id, result)
# 可选:分析 track.jsonl,write_memory 积累经验
会话目录结构
workspace/
├── project_memory/
│ └── MEMORY.md # 项目级共享经验
└── sessions/
├── sample_001/
│ ├── track.jsonl # 该样本完整执行轨迹
│ └── session_memory/
│ └── MEMORY.md # 样本级隔离记忆
└── sample_002/
├── track.jsonl
└── session_memory/
└── MEMORY.md
架构
Harness (IoC 容器)
├─ Provider (Anthropic / OpenAI compat)
├─ ToolRegistry (13 个内置工具 + 自定义)
├─ AgentLoop (ReAct 循环)
├─ SubagentManager (隔离发射 + contextvars + 双信号 abort)
├─ SessionMemory + ProjectMemory (分层记忆)
├─ EventBus + Tracker (会话级轨迹记录)
├─ SkillRegistry (可挂载诊断知识包)
├─ ContextBuilder (可插拔系统提示)
├─ Sandbox (可选命令沙箱)
└─ BackgroundTaskManager (子进程任务管理)
许可
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
llm_harness_swe-0.1.0.tar.gz
(58.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_harness_swe-0.1.0.tar.gz.
File metadata
- Download URL: llm_harness_swe-0.1.0.tar.gz
- Upload date:
- Size: 58.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe847d2fccbf57eaff99d0fb7223a45b28e5bbd6f382e309946a189b12a60391
|
|
| MD5 |
020aebc6c623ec9cf2a7c1470992ba65
|
|
| BLAKE2b-256 |
a1721841c861c66f00a7c7e054f092159163e56a2818dbb88ad258474ed26cb2
|
File details
Details for the file llm_harness_swe-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_harness_swe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 80.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a760759d33f8c62acc6770940fa3e18c5d7ce82cbfd4be91d08fa2509ed6aa9
|
|
| MD5 |
e9f3429813fcd968603333667933a6a8
|
|
| BLAKE2b-256 |
40a58d2ae114c1a5b9abf2562598185ef0566b6c9b36ce08a58780945e201fdc
|