Skip to main content

LLM Agent experimentation harness for SWE tasks

Project description

llm-harness-swe

LLM Agent 实验基座。提供多 Agent 协作基础设施,用于 SWE(Software Engineering)领域的 LLM Agent 实验。

安装

pip install -e ".[anthropic,openai]"

核心概念

概念 说明
Harness IoC 容器,装配 provider、tools、memory、tracker、skills
AgentLoop ReAct 循环:LLM 推理 → 工具调用 → 执行 → 重复
AgentDefinition Agent 角色定义(system_prompt、工具集、模型、最大轮数)
SubagentManager 子 Agent 管理器,上下文隔离发射,spawn 工具排除防递归
SessionMemory 会话级记忆,每样本隔离,存当前样本的分析、尝试轨迹
ProjectMemory 项目级记忆,跨会话共享,存通用模式和经验
Tracker 会话级 JSONL 轨迹记录,记录所有 LLM 推理和工具调用
Skill 可挂载的诊断知识包(SOP 文档 + 参考 + 脚本)

快速开始

import asyncio
from llm_harness_swe import (
    Harness, Config, AgentConfig, AgentDefinition,
    ProjectMemory,
)

# 1. 装配 Harness
config = Config(
    agent=AgentConfig(model="claude-sonnet-4-6", provider="anthropic"),
    memory={"project_memory_dir": "./my_project/memory"},
)
harness = Harness.from_config(config)

# 2. 注册 Agent 角色
harness.agent_definitions = {
    "analyzer": AgentDefinition(
        name="analyzer",
        description="分析代码结构,理解项目上下文",
        system_prompt="你是代码分析专家。用 glob、grep、ast、git 理解项目结构。",
        tools=["read_file", "glob", "grep", "ast", "git"],
    ),
    "generator": AgentDefinition(
        name="generator",
        description="生成和修改代码",
        system_prompt="你是代码生成专家。先读相关文件,再写代码,最后验证。",
        tools=["read_file", "write_file", "edit_file", "exec"],
        max_turns=20,
    ),
}

# 3. 为每个样本创建会话
async def run_sample(sample_id: str, prompt: str):
    session_dir = f"./my_project/sessions/{sample_id}"
    harness.create_session(session_dir)
    await harness.start_session()

    result = await harness.process(
        user_prompt=prompt,
        system_prompt="你是主 Agent。根据任务类型选择合适的子 Agent。",
    )

    await harness.stop_session()
    return result

# 4. 运行
result = asyncio.run(run_sample("func_001", "为 calculate_tax 函数生成单元测试"))
print(result)

工具集

工具 类型 说明
read_file 文件 读取文件内容(含图片)
write_file 文件 写入新文件
edit_file 文件 精确字符串替换编辑
list_dir 文件 列出目录内容
exec 系统 执行 shell 命令(可选沙箱隔离)
glob 搜索 文件模式匹配
grep 搜索 正则表达式内容搜索
git 版本 只读 git 操作(diff/log/blame/status/show)
ast 分析 Python AST 解析(函数/类/导入/结构)
web_search 网络 网页搜索
web_fetch 网络 获取网页内容
skill 知识 调用注册的 Skill
spawn Agent 发射上下文隔离的子 Agent(运行时注入)
memory_read 记忆 读取分层记忆(运行时注入)
memory_write 记忆 写入分层记忆(运行时注入)

定制

自定义 Tool

from llm_harness_swe import BaseTool, ToolResult, ToolExecutionContext
from pydantic import BaseModel, Field

class MyInput(BaseModel):
    param: str = Field(..., description="参数说明")

class MyTool(BaseTool):
    name = "my_tool"
    description = "工具描述,LLM 根据此描述决定何时调用"
    input_model = MyInput

    async def execute(self, args: MyInput, ctx: ToolExecutionContext) -> ToolResult:
        return ToolResult(output=f"结果: {args.param}")

    def is_read_only(self, args: MyInput) -> bool:
        return True

# 注册
harness.tools.register(MyTool())

自定义 Skill

Skill 是一个目录,包含 SKILL.md(SOP 文档)和可选的参考文件、脚本:

my_skills/
└── diagnose-coverage/
    ├── SKILL.md          # SOP 诊断流程
    ├── references/
    │   └── patterns.md   # 常见未覆盖模式
    └── scripts/
        └── analyze.py    # AST 分析脚本
from llm_harness_swe import load_skills_from_dirs

skills = load_skills_from_dirs(["my_skills/", "other_skills/"])
harness = Harness(..., skills=skills)

样本调度器

基座不提供调度器,实验人员自行实现:

for sample_id, prompt in dataset:
    session_dir = f"./workspace/sessions/{sample_id}"
    harness.create_session(session_dir)

    await harness.start_session()
    result = await harness.process(prompt, system_prompt=SOP)
    await harness.stop_session()

    save_result(sample_id, result)
    # 可选:分析 track.jsonl,write_memory 积累经验

会话目录结构

workspace/
├── project_memory/
│   └── MEMORY.md              # 项目级共享经验
└── sessions/
    ├── sample_001/
    │   ├── track.jsonl        # 该样本完整执行轨迹
    │   └── session_memory/
    │       └── MEMORY.md      # 样本级隔离记忆
    └── sample_002/
        ├── track.jsonl
        └── session_memory/
            └── MEMORY.md

架构

Harness (IoC 容器)
  ├─ Provider (Anthropic / OpenAI compat)
  ├─ ToolRegistry (13 个内置工具 + 自定义)
  ├─ AgentLoop (ReAct 循环)
  ├─ SubagentManager (隔离发射 + contextvars + 双信号 abort)
  ├─ SessionMemory + ProjectMemory (分层记忆)
  ├─ EventBus + Tracker (会话级轨迹记录)
  ├─ SkillRegistry (可挂载诊断知识包)
  ├─ ContextBuilder (可插拔系统提示)
  ├─ Sandbox (可选命令沙箱)
  └─ BackgroundTaskManager (子进程任务管理)

许可

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_harness_swe-0.1.0.tar.gz (58.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_harness_swe-0.1.0-py3-none-any.whl (80.6 kB view details)

Uploaded Python 3

File details

Details for the file llm_harness_swe-0.1.0.tar.gz.

File metadata

  • Download URL: llm_harness_swe-0.1.0.tar.gz
  • Upload date:
  • Size: 58.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for llm_harness_swe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fe847d2fccbf57eaff99d0fb7223a45b28e5bbd6f382e309946a189b12a60391
MD5 020aebc6c623ec9cf2a7c1470992ba65
BLAKE2b-256 a1721841c861c66f00a7c7e054f092159163e56a2818dbb88ad258474ed26cb2

See more details on using hashes here.

File details

Details for the file llm_harness_swe-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_harness_swe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8a760759d33f8c62acc6770940fa3e18c5d7ce82cbfd4be91d08fa2509ed6aa9
MD5 e9f3429813fcd968603333667933a6a8
BLAKE2b-256 40a58d2ae114c1a5b9abf2562598185ef0566b6c9b36ce08a58780945e201fdc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page