Education Language Model Evaluation System

Project description

ELMES Logo

ELMES - Evaluating Large Language Models in Educational Scenarios

ELMES (Evaluating Large Language Models in Educational Scenarios) 是一个 Python 框架，旨在为 LLM 不同场景下的各种任务提供代理编排和自动评估的功能。它采用模块化架构，基于 YAML 配置，可扩展的实体使得该框架适用于构建、配置和评估复杂的基于代理的工作流。

核心特性

模块化架构：采用 pydantic-ai 和 pydantic-graph 构建，支持灵活的代理编排
YAML 配置驱动：通过简单的 YAML 文件定义多轮对话场景、模型、代理和工作流
多轮对话支持：支持复杂的多智能体交互场景，包括路由器和条件跳转
自动评估：基于 LLM-as-Judge 的自动评估系统，支持多维度评分
MCP 集成：支持 Model Context Protocol (MCP) 服务器，扩展代理能力
可视化分析：内置雷达图和堆叠柱状图生成，直观展示评估结果
工作流可视化：自动生成 Mermaid 流程图，展示代理交互流程

技术栈

Python 3.10+
pydantic-ai：用于构建和管理 AI 代理
pydantic-graph：用于定义和执行图工作流
pydantic-evals：用于 LLM-as-Judge 评估
Click：用于构建 CLI 工具
Matplotlib：用于数据可视化
FastMCP：用于 MCP 服务器集成

安装

# 使用 uv 安装依赖
uv sync

# 或者使用 pip
pip install -e .

可选的 OpenAI 支持：

uv add --dev openai

快速开始

1. 配置环境

创建配置文件 config.yaml，参考 config.yaml.example：

globals:
  concurrency: 16
  recursion_limit: 3
  output_dir: "./generated"

models:
  teacher_model:
    type: openai
    api_key: <YOUR_API_KEY>
    base_url: <YOUR_BASE_URL>
    model: gpt-4o

agents:
  teacher:
    model: teacher_model
    system_prompt: "你是一位耐心的老师..."

directions:
  - START -> teacher
  - teacher -> END

tasks:
  start_prompt: "教学主题: {topic}"
  mode: union
  content:
    topic:
      - "数学"
      - "物理"

evaluation:
  name: teaching_quality
  judge_model: teacher_model
  target: gpt-4o
  fields:
    - name: clarity
      rubric: 教学内容是否清晰易懂
      reason: true

2. 生成对话

elmes generate --config config.yaml

3. 评估结果

elmes eval --config config.yaml

4. 可视化结果

elmes visualize ./generated

CLI 命令

`generate` - 生成对话

生成基于配置的多轮对话数据。

elmes generate --config config.yaml --output ./results --debug

选项：

--config：配置文件路径（默认：config.yaml）
--output：输出目录（默认使用 globals.output_dir）
--debug：启用调试模式

`eval` - 评估对话

使用 LLM-as-Judge 评估生成的对话质量。

elmes eval --config config.yaml --input ./generated --output ./eval_results

选项：

--config：配置文件路径（必需）
--input：生成结果目录（默认自动推断）
--output：评估结果输出目录
--avg/--no-avg：是否计算平均分（默认：启用）
--include-reasons/--no-include-reasons：是否包含评分理由（默认：启用）
--debug：启用调试模式

`export` - 导出数据

将对话数据导出为不同格式。

# 导出为 JSON
elmes export json --input ./generated --output ./exported.json

# 导出为 Label Studio 格式
elmes export label-studio --input ./generated --output ./label_studio.json

`visualize` - 可视化评估结果

从 CSV 文件生成堆叠柱状图和雷达图。

elmes visualize ./generated --x-rotation 30

参数：

input_dir：包含 CSV 文件的目录
--x-rotation：X 轴标签旋转角度（默认：30）

`draw` - 绘制工作流图

根据配置生成代理工作流图。

elmes draw --config config.yaml --output workflow.png

选项：

--config：配置文件路径
--output：输出文件路径（支持 .png 或 .mmd）
--print：在控制台打印 Mermaid 代码
--direction：图表方向（TB/LR/RL/BT，默认：LR）

`hash` - 计算配置哈希

计算配置文件的 MD5 哈希值，用于确定结果子目录名称。

elmes hash --config config.yaml

配置说明

全局配置 (globals)

globals:
  concurrency: 16 # 并发任务数
  recursion_limit: 3 # 最大递归调用次数
  output_dir: "./generated" # 结果输出目录

模型配置 (models)

models:
  model_alias:
    type: openai
    api_key: <API_KEY>
    base_url: <BASE_URL>
    model: gpt-4o
    kargs:
      temperature: 0.7

代理配置 (agents)

agents:
  agent_name:
    model: model_alias
    system_prompt: "提示词内容"
    memory:
      enable: true
      keep_turns: 3
    tools:
      - calculator # MCP 工具名称

路由配置 (directions)

directions:
  - START -> teacher
  - teacher -> router:any_keyword_router(keywords=["<end>"], exists_to=END, else_to="student")
  - student -> teacher

支持的路由器：

any_keyword_router：关键词匹配路由

任务配置 (tasks)

tasks:
  start_prompt: "初始提示词 {variable}"
  mode: union # 或 iter
  content:
    variable:
      - "值1"
      - "值2"

union 模式：所有字段排列组合生成任务
iter 模式：逐条遍历内容

评估配置 (evaluation)

evaluation:
  name: eval_name
  judge_model: model_alias
  target: target_name
  fields:
    - name: dimension_name
      rubric: 评分细则描述
      reason: true # 是否生成评分理由

MCP 配置 (mcps)

mcps:
  tool_name:
    type: stdio # stdio / http-with-sse / streamable-http
    command: "python"
    args: ["script.py"]
    timeout: 30
    env:
      KEY: "value"

项目结构

elmes/
├── src/elmes/
│   ├── cli/           # CLI 命令实现
│   │   ├── generate/  # 生成对话
│   │   ├── eval/      # 评估对话
│   │   ├── export/    # 导出数据
│   │   ├── visualize/ # 可视化
│   │   ├── draw/      # 绘制工作流
│   │   └── hash_/     # 计算哈希
│   ├── config/        # 配置模型（Pydantic）
│   ├── graph/         # 图工作流实现
│   ├── agent/         # 代理构建器
│   ├── model/         # 模型提供商
│   └── mcp/           # MCP 服务器集成
├── example/           # 示例配置
├── tests/             # 测试文件
└── docs/              # 文档和资产

Project details

Release history Release notifications | RSS feed

This version

1.0.0

Mar 15, 2026

0.1.13

Jul 28, 2025

0.1.12

Jul 25, 2025

0.1.11

Jul 4, 2025

0.1.10

Jun 23, 2025

0.1.9

Jun 20, 2025

0.1.8

Jun 18, 2025

0.1.7

Jun 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elmes-1.0.0.tar.gz (12.0 MB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

elmes-1.0.0-py3-none-any.whl (11.7 MB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file elmes-1.0.0.tar.gz.

File metadata

Download URL: elmes-1.0.0.tar.gz
Upload date: Mar 15, 2026
Size: 12.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for elmes-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`d3d9ea2b3b6aa78213643cda386a9c61c11a781ba8e72b9daf717964a2b03f34`
MD5	`23327d5035867785788a079ca87743ed`
BLAKE2b-256	`f366b7f586210fbbda2f446b587a89fb157b5d84b49689105c2d871d09c871d1`

See more details on using hashes here.

File details

Details for the file elmes-1.0.0-py3-none-any.whl.

File metadata

Download URL: elmes-1.0.0-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 11.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for elmes-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6820f579b5d2dfadf8ac51e2dad37f1f7f0f19766e98755ceef6c5ae44dadc48`
MD5	`5ee28b988ccc7380df25c2c419e9da55`
BLAKE2b-256	`f6ad24e39fe61b8c01e1489e971e3468658841bfa5f853cc4d44dd2a209801da`

See more details on using hashes here.

elmes 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

ELMES - Evaluating Large Language Models in Educational Scenarios

核心特性

技术栈

安装

快速开始

1. 配置环境

2. 生成对话

3. 评估结果

4. 可视化结果

CLI 命令

generate - 生成对话

eval - 评估对话

export - 导出数据

visualize - 可视化评估结果

draw - 绘制工作流图

hash - 计算配置哈希

配置说明

全局配置 (globals)

模型配置 (models)

代理配置 (agents)

路由配置 (directions)

任务配置 (tasks)

评估配置 (evaluation)

MCP 配置 (mcps)

项目结构

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`generate` - 生成对话

`eval` - 评估对话

`export` - 导出数据

`visualize` - 可视化评估结果

`draw` - 绘制工作流图

`hash` - 计算配置哈希