LLM Agent experimentation harness for SWE tasks

These details have not been verified by PyPI

Project links

Repository

Project description

llm-harness-swe

LLM Agent 实验基座。提供多 Agent 协作基础设施，用于 SWE（Software Engineering）领域的 LLM Agent 实验。

安装

pip install -e ".[anthropic,openai]"

核心概念

概念	说明
Harness	IoC 容器，装配 provider、tools、memory、tracker、skills
AgentLoop	ReAct 循环：LLM 推理 → 工具调用 → 执行 → 重复
AgentDefinition	Agent 角色定义（system_prompt、工具集、模型、最大轮数）
SubagentManager	子 Agent 管理器，上下文隔离发射，spawn 工具排除防递归
SessionMemory	会话级记忆，每样本隔离，存当前样本的分析、尝试轨迹
ProjectMemory	项目级记忆，跨会话共享，存通用模式和经验
Tracker	会话级 JSONL 轨迹记录，记录所有 LLM 推理和工具调用
Skill	可挂载的诊断知识包（SOP 文档 + 参考 + 脚本）

快速开始

import asyncio
from llm_harness_swe import (
    Harness, Config, AgentConfig, AgentDefinition,
    ProjectMemory,
)

# 1. 装配 Harness
config = Config(
    agent=AgentConfig(model="claude-sonnet-4-6", provider="anthropic"),
    memory={"project_memory_dir": "./my_project/memory"},
)
harness = Harness.from_config(config)

# 2. 注册 Agent 角色
harness.agent_definitions = {
    "analyzer": AgentDefinition(
        name="analyzer",
        description="分析代码结构，理解项目上下文",
        system_prompt="你是代码分析专家。用 glob、grep、ast、git 理解项目结构。",
        tools=["read_file", "glob", "grep", "ast", "git"],
    ),
    "generator": AgentDefinition(
        name="generator",
        description="生成和修改代码",
        system_prompt="你是代码生成专家。先读相关文件，再写代码，最后验证。",
        tools=["read_file", "write_file", "edit_file", "exec"],
        max_turns=20,
    ),
}

# 3. 为每个样本创建会话
async def run_sample(sample_id: str, prompt: str):
    session_dir = f"./my_project/sessions/{sample_id}"
    harness.create_session(session_dir)
    await harness.start_session()

    result = await harness.process(
        user_prompt=prompt,
        system_prompt="你是主 Agent。根据任务类型选择合适的子 Agent。",
    )

    await harness.stop_session()
    return result

# 4. 运行
result = asyncio.run(run_sample("func_001", "为 calculate_tax 函数生成单元测试"))
print(result)

工具集

工具	类型	说明
`read_file`	文件	读取文件内容（含图片）
`write_file`	文件	写入新文件
`edit_file`	文件	精确字符串替换编辑
`list_dir`	文件	列出目录内容
`exec`	系统	执行 shell 命令（可选沙箱隔离）
`glob`	搜索	文件模式匹配
`grep`	搜索	正则表达式内容搜索
`git`	版本	只读 git 操作（diff/log/blame/status/show）
`ast`	分析	Python AST 解析（函数/类/导入/结构）
`web_search`	网络	网页搜索
`web_fetch`	网络	获取网页内容
`skill`	知识	调用注册的 Skill
`spawn`	Agent	发射上下文隔离的子 Agent（运行时注入）
`memory_read`	记忆	读取分层记忆（运行时注入）
`memory_write`	记忆	写入分层记忆（运行时注入）

定制

自定义 Tool

from llm_harness_swe import BaseTool, ToolResult, ToolExecutionContext
from pydantic import BaseModel, Field

class MyInput(BaseModel):
    param: str = Field(..., description="参数说明")

class MyTool(BaseTool):
    name = "my_tool"
    description = "工具描述，LLM 根据此描述决定何时调用"
    input_model = MyInput

    async def execute(self, args: MyInput, ctx: ToolExecutionContext) -> ToolResult:
        return ToolResult(output=f"结果: {args.param}")

    def is_read_only(self, args: MyInput) -> bool:
        return True

# 注册
harness.tools.register(MyTool())

自定义 Skill

Skill 是一个目录，包含 SKILL.md（SOP 文档）和可选的参考文件、脚本：

my_skills/
└── diagnose-coverage/
    ├── SKILL.md          # SOP 诊断流程
    ├── references/
    │   └── patterns.md   # 常见未覆盖模式
    └── scripts/
        └── analyze.py    # AST 分析脚本

from llm_harness_swe import load_skills_from_dirs

skills = load_skills_from_dirs(["my_skills/", "other_skills/"])
harness = Harness(..., skills=skills)

样本调度器

基座不提供调度器，实验人员自行实现：

for sample_id, prompt in dataset:
    session_dir = f"./workspace/sessions/{sample_id}"
    harness.create_session(session_dir)

    await harness.start_session()
    result = await harness.process(prompt, system_prompt=SOP)
    await harness.stop_session()

    save_result(sample_id, result)
    # 可选：分析 track.jsonl，write_memory 积累经验

会话目录结构

workspace/
├── project_memory/
│   └── MEMORY.md              # 项目级共享经验
└── sessions/
    ├── sample_001/
    │   ├── track.jsonl        # 该样本完整执行轨迹
    │   └── session_memory/
    │       └── MEMORY.md      # 样本级隔离记忆
    └── sample_002/
        ├── track.jsonl
        └── session_memory/
            └── MEMORY.md

架构

Harness (IoC 容器)
  ├─ Provider (Anthropic / OpenAI compat)
  ├─ ToolRegistry (13 个内置工具 + 自定义)
  ├─ AgentLoop (ReAct 循环)
  ├─ SubagentManager (隔离发射 + contextvars + 双信号 abort)
  ├─ SessionMemory + ProjectMemory (分层记忆)
  ├─ EventBus + Tracker (会话级轨迹记录)
  ├─ SkillRegistry (可挂载诊断知识包)
  ├─ ContextBuilder (可插拔系统提示)
  ├─ Sandbox (可选命令沙箱)
  └─ BackgroundTaskManager (子进程任务管理)

许可

MIT

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.1.0

May 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_harness_swe-0.1.0.tar.gz (58.7 kB view details)

Uploaded May 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_harness_swe-0.1.0-py3-none-any.whl (80.6 kB view details)

Uploaded May 25, 2026 Python 3

File details

Details for the file llm_harness_swe-0.1.0.tar.gz.

File metadata

Download URL: llm_harness_swe-0.1.0.tar.gz
Upload date: May 25, 2026
Size: 58.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for llm_harness_swe-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`fe847d2fccbf57eaff99d0fb7223a45b28e5bbd6f382e309946a189b12a60391`
MD5	`020aebc6c623ec9cf2a7c1470992ba65`
BLAKE2b-256	`a1721841c861c66f00a7c7e054f092159163e56a2818dbb88ad258474ed26cb2`

See more details on using hashes here.

File details

Details for the file llm_harness_swe-0.1.0-py3-none-any.whl.

File metadata

Download URL: llm_harness_swe-0.1.0-py3-none-any.whl
Upload date: May 25, 2026
Size: 80.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for llm_harness_swe-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8a760759d33f8c62acc6770940fa3e18c5d7ce82cbfd4be91d08fa2509ed6aa9`
MD5	`e9f3429813fcd968603333667933a6a8`
BLAKE2b-256	`40a58d2ae114c1a5b9abf2562598185ef0566b6c9b36ce08a58780945e201fdc`

See more details on using hashes here.

llm-harness-swe 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llm-harness-swe

安装

核心概念

快速开始

工具集

定制

自定义 Tool

自定义 Skill

样本调度器

会话目录结构

架构

许可

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes