Lightweight data annotation toolkit - generate standalone HTML labeling interfaces
Project description
DataLabel
Serverless Human-in-the-Loop Annotation Framework
with LLM Pre-Labeling and Inter-Annotator Agreement Analysis
零服务器人机协同标注框架 — LLM 预标注 · 多标注者一致性分析 · 离线 HTML 界面
Zero-dependency annotation tool with LLM-assisted pre-labeling, multi-annotator agreement metrics, and offline HTML interface
Abstract · Problem Statement · Formal Framework · Architecture · Key Innovations · Quick Start · Annotation Types · LLM Assisted Annotation · MCP Server · Ecosystem · References
Abstract
高质量标注数据是监督学习和 RLHF 的基础,但现有标注工具面临两个矛盾:重量级平台(Label Studio / Prodigy)部署运维成本高,轻量工具则缺少质量保证机制。DataLabel 提出零服务器标注范式 (serverless annotation paradigm)——生成独立 HTML 文件,浏览器直接打开即可标注,无需服务器、无需网络。
系统实现「Schema 定义 → LLM 预标注 → 人工校准 → 质量分析 → 一致性评估 → 冲突裁决」的完整标注管线。通过 LLM 预标注 (pre-labeling) 加速标注启动,通过标注者间一致性分析 (Inter-Annotator Agreement, IAA) 量化标注质量,通过多策略合并与冲突裁决 (adjudication) 产出高质量最终标签。
DataLabel implements a serverless annotation framework that generates self-contained HTML files for offline labeling. The system provides LLM-assisted pre-labeling (Kimi / OpenAI / Anthropic), 5 annotation types (scoring, single/multi-choice, text, ranking), multi-annotator result merging with 3 strategies, and rigorous IAA metrics (Cohen's $\kappa$, Fleiss' $\kappa$, Krippendorff's $\alpha$). Exposes 12 MCP tools, 6 resources, and 3 prompt templates for AI IDE integration.
Problem Statement
数据标注领域面临三个结构性问题:
| 根本性问题 | 形式化定义 | 现有工具局限 | DataLabel 的方法 |
|---|---|---|---|
| 部署壁垒 Deployment Barrier |
标注平台需要服务器、数据库、网络环境,部署成本 $\gg$ 标注本身 | Label Studio 需 Docker + PostgreSQL;Prodigy 需 Python 运行时 | 零服务器:生成独立 HTML,浏览器直接打开,离线可用 |
| 冷启动延迟 Cold Start Latency |
标注员从零开始标注,前期效率低,标注指南编写耗时 | 无预标注能力,标注指南靠人工编写 | LLM 预标注 + 自动指南生成,人工从"校准"而非"从零标注"开始 |
| 质量黑箱 Quality Opacity |
标注质量缺少量化指标,分歧处理凭经验 $\implies$ 一致性 $\kappa$ 未知 | 基础工具不提供 IAA,或仅支持两人比较 | 多指标一致性分析(Cohen's / Fleiss' $\kappa$ / Krippendorff's $\alpha$)+ LLM 分歧分析 + 可视化仪表盘 |
DataLabel 不是另一个标注平台。它是标注数据的生产工具——从 Schema 定义到最终标签的完整管线,零运维成本,质量可量化。
Formal Framework
Annotation Model
标注任务形式化为四元组 $\mathcal{L} = \langle X, Y, A, \phi \rangle$:
| 符号 | 定义 | 说明 |
|---|---|---|
| $X = {x_1, \ldots, x_n}$ | 待标注样本集 | 由 Schema 定义字段结构 |
| $Y$ | 标签空间 | $Y \in {\mathbb{R}, C, 2^C, \Sigma^*, S_k}$,对应 5 种标注类型 |
| $A = {a_1, \ldots, a_m}$ | 标注者集合 | 人工标注者 + LLM 预标注者 |
| $\phi: X \times A \to Y$ | 标注函数 | $\phi(x, a)$ 为标注者 $a$ 对样本 $x$ 的标签 |
五种标签空间对应五种标注类型:评分 ($\mathbb{R}$)、单选 ($C$)、多选 ($2^C$)、文本 ($\Sigma^*$)、排序 ($S_k$,$k$ 元素的全排列)。
Inter-Annotator Agreement (IAA)
Cohen's Kappa(两标注者):
$$\kappa = \frac{p_o - p_e}{1 - p_e}$$
其中 $p_o$ 为观测一致率,$p_e$ 为随机一致率。$\kappa > 0.8$ 表示高度一致,$\kappa < 0.4$ 表示标注指南需修订。
Fleiss' Kappa(多标注者名义变量):
$$\kappa_F = \frac{\bar{P} - \bar{P}e}{1 - \bar{P}e}, \quad \bar{P} = \frac{1}{N} \sum_i P_i, \quad P_i = \frac{1}{n(n-1)} \sum_k n{ik}(n{ik} - 1)$$
Krippendorff's Alpha(支持缺失数据、多类型量表):
$$\alpha = 1 - \frac{D_o}{D_e}$$
其中 $D_o$ 为观测不一致度,$D_e$ 为期望不一致度。
Merging Strategies
多标注者结果合并支持三种策略:
| 策略 | 评分 | 单选 | 多选 | 排序 |
|---|---|---|---|---|
| Majority | 众数 | 众数 | 交集/并集 | Borda 计数 |
| Average | 算术平均 | 众数 | 交集/并集 | Borda 计数 |
| Strict | 全一致 | 全一致 | 全一致 | 全一致 |
Strict 模式下未达一致的任务自动标记为 needs_review,进入冲突裁决流程。
Architecture
graph LR
S["Schema<br/>Definition"] --> P["LLM Pre-Label<br/>(Optional)"]
P --> G["HTML Generator<br/>Self-Contained"]
G --> B["Browser<br/>Offline Annotation"]
B --> R["Results<br/>JSON/JSONL/CSV"]
R --> Q["Quality<br/>LLM Analysis"]
R --> M["Merge<br/>3 Strategies"]
M --> IAA["IAA Metrics<br/>κ / α"]
M --> D["Dashboard<br/>HTML Report"]
style G fill:#0969da,color:#fff,stroke:#0969da
style M fill:#8b5cf6,color:#fff,stroke:#8b5cf6
style IAA fill:#2da44e,color:#fff,stroke:#2da44e
style D fill:#e5534b,color:#fff,stroke:#e5534b
style S fill:#1a1a2e,color:#e0e0e0,stroke:#444
style P fill:#1a1a2e,color:#e0e0e0,stroke:#444
style B fill:#1a1a2e,color:#e0e0e0,stroke:#444
style R fill:#1a1a2e,color:#e0e0e0,stroke:#444
style Q fill:#1a1a2e,color:#e0e0e0,stroke:#444
Annotation Pipeline
| 步骤 | 命令 | 产出 |
|---|---|---|
| 1. 生成指南 | knowlyr-datalabel gen-guidelines schema.json |
guide.md (可选) |
| 2. LLM 预标注 | knowlyr-datalabel prelabel schema.json tasks.json |
pre.json (可选) |
| 3. 生成界面 | knowlyr-datalabel create schema.json tasks.json |
annotator.html |
| 4. 分发标注 | 发送 HTML 给标注员 | 浏览器中完成标注 |
| 5. 收集结果 | 标注员导出 JSON/JSONL/CSV | results_*.json |
| 6. 质量分析 | knowlyr-datalabel quality schema.json results_*.json |
report.json (可选) |
| 7. 合并分析 | knowlyr-datalabel merge results_*.json |
merged.json + IAA |
| 8. 进度仪表盘 | knowlyr-datalabel dashboard results_*.json |
dashboard.html |
Key Innovations
1. Serverless Annotation Architecture
生成的 HTML 包含所有样式、逻辑和数据——无需服务器、无需网络、无需安装。标注数据保存在 localStorage,支持断点续标。
- 零依赖部署:发送 HTML 文件即完成分发
- 离线可用:飞机模式、内网环境均可使用
- 暗黑模式:一键切换,跟随系统偏好
- 快捷键:
←→导航、数字键评分/选择、Ctrl+Z撤销 - 大数据集:任务侧边栏 + 分页 (25/50/100/200) + 搜索过滤,支持 1000+ 任务
2. LLM-Assisted Pre-Labeling and Quality Analysis
LLM 介入标注管线的三个环节,从"人工从零标注"转变为"人工校准 LLM 预标":
| 环节 | 功能 | 效果 |
|---|---|---|
| 预标注 | LLM 批量生成初始标签 | 标注员从"校准"开始,效率提升 |
| 质量分析 | 检测可疑标注、分析分歧原因 | 量化质量问题,指导修订 |
| 指南生成 | 根据 Schema 和样例生成标注指南 | 消除指南编写的冷启动 |
支持三个 LLM Provider:
| Provider | 环境变量 | 默认模型 |
|---|---|---|
| Moonshot (Kimi) | MOONSHOT_API_KEY |
moonshot-v1-8k |
| OpenAI | OPENAI_API_KEY |
gpt-4o-mini |
| Anthropic | ANTHROPIC_API_KEY |
claude-sonnet-4-20250514 |
3. Multi-Metric Inter-Annotator Agreement
三种 IAA 指标覆盖不同场景:
| 指标 | 适用场景 | 范围 |
|---|---|---|
| Cohen's $\kappa$ | 两标注者 | $[-1, 1]$ |
| Fleiss' $\kappa$ | 多标注者、名义变量 | $[-1, 1]$ |
| Krippendorff's $\alpha$ | 多标注者、支持缺失数据 | $[-1, 1]$ |
输出两两一致矩阵 + 总体一致性 + 分歧任务列表。一致率 <40% 时建议回顾标注指南。
knowlyr-datalabel iaa ann1.json ann2.json ann3.json
4. Multi-Strategy Result Merging
三种合并策略适配不同质量要求:majority(通用)、average(连续评分)、strict(高质量要求,未一致标记 needs_review)。
支持 Borda 计数法合并排序标注,交集/并集策略合并多选标注。
5. Visual Analytics Dashboard
从标注结果生成独立 HTML 仪表盘(同样零依赖、离线可用):
| 区块 | 内容 |
|---|---|
| 概览卡片 | 总任务数、标注员数、平均完成率、一致率 |
| 标注员进度 | 每位标注员的完成进度条 |
| 标注值分布 | SVG 柱状图展示标注值分布 |
| 一致性热力图 | Cohen's $\kappa$ 两两矩阵 + Fleiss' $\kappa$ + Krippendorff's $\alpha$ |
| 标注分歧表 | 存在分歧的任务列表(支持搜索过滤) |
| 时间分析 | 按天统计标注量趋势图 |
6. Five Annotation Types
通过 Schema 中的 annotation_config 配置,覆盖主流标注场景:
| 类型 | 标签空间 | 适用场景 |
|---|---|---|
| Scoring | $\mathbb{R}$ | 质量评分、相关性打分 |
| Single Choice | $C$ | 情感分类、意图识别 |
| Multi Choice | $2^C$ | 多标签分类、属性标注 |
| Text | $\Sigma^*$ | 翻译、纠错、改写 |
| Ranking | $S_k$ | 偏好排序、RLHF 比较 |
7. DataRecipe Integration
直接从 DataRecipe 分析结果生成标注界面——自动推断 Schema、提取样例、生成任务:
knowlyr-datalabel generate ./analysis_output/my_dataset/
8. Conflict Adjudication
冲突裁决工具提供三种策略:多数投票、专家优先、最长回答,通过 MCP adjudicate 工具或 CLI 调用。
Quick Start
pip install knowlyr-datalabel
可选依赖
pip install knowlyr-datalabel[mcp] # MCP 服务器
pip install knowlyr-datalabel[llm] # LLM 分析 (Kimi/OpenAI)
pip install knowlyr-datalabel[llm-all] # LLM 分析 (含 Anthropic)
pip install knowlyr-datalabel[all] # 全部功能
# 1. 创建标注界面
knowlyr-datalabel create schema.json tasks.json -o annotator.html
# 2. LLM 预标注(可选)
knowlyr-datalabel prelabel schema.json tasks.json -o pre.json -p moonshot
# 3. 浏览器打开 annotator.html,完成标注,导出结果
# 4. 合并多标注员结果 + IAA 分析
knowlyr-datalabel merge ann1.json ann2.json ann3.json -o merged.json
# 5. 生成进度仪表盘
knowlyr-datalabel dashboard ann1.json ann2.json -o dashboard.html
Schema 格式示例
{
"project_name": "我的标注项目",
"fields": [
{"name": "instruction", "display_name": "指令", "type": "text"},
{"name": "response", "display_name": "回复", "type": "text"}
],
"scoring_rubric": [
{"score": 1, "label": "优秀", "description": "回答完整准确"},
{"score": 0.5, "label": "一般", "description": "回答基本正确"},
{"score": 0, "label": "差", "description": "回答错误或离题"}
]
}
Python SDK
from datalabel import AnnotatorGenerator, ResultMerger
# 生成标注界面
gen = AnnotatorGenerator()
gen.generate(schema=schema, tasks=tasks, output_path="annotator.html")
# 合并标注结果
merger = ResultMerger()
result = merger.merge(["ann1.json", "ann2.json"], output_path="merged.json", strategy="majority")
print(f"一致率: {result.agreement_rate:.1%}")
# 计算 IAA
metrics = merger.calculate_iaa(["ann1.json", "ann2.json", "ann3.json"])
print(f"Fleiss' κ: {metrics['fleiss_kappa']:.3f}")
print(f"Krippendorff's α: {metrics['krippendorff_alpha']:.3f}")
Annotation Types
5 种标注类型配置详情
1. Scoring (默认)
{
"scoring_rubric": [
{"score": 1, "description": "优秀"},
{"score": 0.5, "description": "一般"},
{"score": 0, "description": "差"}
]
}
2. Single Choice
{
"annotation_config": {
"type": "single_choice",
"options": [
{"value": "positive", "label": "正面"},
{"value": "negative", "label": "负面"},
{"value": "neutral", "label": "中性"}
]
}
}
3. Multi Choice
{
"annotation_config": {
"type": "multi_choice",
"options": [
{"value": "informative", "label": "信息丰富"},
{"value": "accurate", "label": "准确"},
{"value": "fluent", "label": "流畅"}
]
}
}
4. Text
{
"annotation_config": {
"type": "text",
"placeholder": "请输入翻译...",
"max_length": 500
}
}
5. Ranking
{
"annotation_config": {
"type": "ranking",
"options": [
{"value": "a", "label": "结果A"},
{"value": "b", "label": "结果B"},
{"value": "c", "label": "结果C"}
]
}
}
LLM Assisted Annotation
Pre-Labeling
# Kimi 预标注
knowlyr-datalabel prelabel schema.json tasks.json -o pre.json -p moonshot
# OpenAI 预标注
knowlyr-datalabel prelabel schema.json tasks.json -o pre.json -p openai
# 指定模型和批大小
knowlyr-datalabel prelabel schema.json tasks.json -o pre.json -p moonshot -m kimi-k2 --batch-size 10
Quality Analysis
# 单标注员质量检查
knowlyr-datalabel quality schema.json results.json -o report.json -p moonshot
# 多标注员分歧分析
knowlyr-datalabel quality schema.json ann1.json ann2.json -o report.json
Guidelines Generation
knowlyr-datalabel gen-guidelines schema.json -t tasks.json -o guidelines.md -l zh
MCP Server
{
"mcpServers": {
"knowlyr-datalabel": {
"command": "uv",
"args": ["--directory", "/path/to/data-label", "run", "python", "-m", "datalabel.mcp_server"]
}
}
}
Tools (12)
| Tool | Description |
|---|---|
generate_annotator |
从 DataRecipe 分析结果生成标注界面 |
create_annotator |
从 Schema 和任务创建标注界面 |
merge_annotations |
合并多个标注结果 |
calculate_iaa |
计算标注员间一致性 |
validate_schema |
验证 Schema 和任务数据格式 |
export_results |
导出为 JSON/JSONL/CSV |
import_tasks |
导入任务数据 |
generate_dashboard |
生成进度仪表盘 HTML |
llm_prelabel |
LLM 自动预标注 |
llm_quality_analysis |
LLM 标注质量分析 |
llm_gen_guidelines |
LLM 标注指南生成 |
adjudicate |
冲突裁决 |
Resources (6) · Prompts (3)
| Resources | Prompts |
|---|---|
datalabel://schemas/{type} — 5 种 Schema 模板 |
create-annotation-schema — 引导生成 Schema |
datalabel://reference/annotation-types — 标注类型说明 |
review-annotations — 分析标注质量 |
annotation-workflow — 完整工作流引导 |
CLI Reference
完整命令列表
| 命令 | 功能 |
|---|---|
knowlyr-datalabel create <schema> <tasks> -o <out> |
创建标注界面 |
knowlyr-datalabel create ... --page-size 100 |
自定义分页 |
knowlyr-datalabel create ... -g guidelines.md |
附带标注指南 |
knowlyr-datalabel generate <dir> |
从 DataRecipe 结果生成 |
knowlyr-datalabel merge <files...> -o <out> |
合并标注结果 |
knowlyr-datalabel merge ... -s majority|average|strict |
指定合并策略 |
knowlyr-datalabel iaa <files...> |
计算标注一致性 |
knowlyr-datalabel dashboard <files...> -o <out> |
生成仪表盘 |
knowlyr-datalabel validate <schema> [-t tasks] |
验证格式 |
knowlyr-datalabel export <file> -o <out> -f json|jsonl|csv |
导出转换 |
knowlyr-datalabel import-tasks <file> -o <out> |
导入任务 |
knowlyr-datalabel prelabel <schema> <tasks> -o <out> |
LLM 预标注 |
knowlyr-datalabel quality <schema> <results...> |
LLM 质量分析 |
knowlyr-datalabel gen-guidelines <schema> -o <out> |
LLM 指南生成 |
Docker
docker build -t knowlyr-datalabel .
# 创建标注界面
docker run --rm -v $(pwd):/data knowlyr-datalabel \
create schema.json tasks.json -o annotator.html
# 合并标注结果
docker run --rm -v $(pwd):/data knowlyr-datalabel \
merge ann1.json ann2.json -o merged.json
Ecosystem
Architecture Diagram
graph LR
Radar["Radar<br/>Discovery"] --> Recipe["Recipe<br/>Analysis"]
Recipe --> Synth["Synth<br/>Generation"]
Recipe --> Label["Label<br/>Annotation"]
Synth --> Check["Check<br/>Quality"]
Label --> Check
Check --> Audit["Audit<br/>Model Audit"]
Crew["Crew<br/>Deliberation Engine"]
Agent["Agent<br/>RL Framework"]
ID["ID<br/>Identity Runtime"]
Crew -.->|能力定义| ID
ID -.->|身份 + 记忆| Crew
Crew -.->|轨迹 + 奖励| Agent
Agent -.->|优化策略| Crew
style Label fill:#0969da,color:#fff,stroke:#0969da
style Crew fill:#2da44e,color:#fff,stroke:#2da44e
style Agent fill:#8b5cf6,color:#fff,stroke:#8b5cf6
style ID fill:#e5534b,color:#fff,stroke:#e5534b
style Radar fill:#1a1a2e,color:#e0e0e0,stroke:#444
style Recipe fill:#1a1a2e,color:#e0e0e0,stroke:#444
style Synth fill:#1a1a2e,color:#e0e0e0,stroke:#444
style Check fill:#1a1a2e,color:#e0e0e0,stroke:#444
style Audit fill:#1a1a2e,color:#e0e0e0,stroke:#444
| Layer | Project | Description | Repo |
|---|---|---|---|
| Discovery | AI Dataset Radar | 数据集竞争情报、趋势分析 | GitHub |
| Analysis | DataRecipe | 逆向分析、Schema 提取、成本估算 | GitHub |
| Production | DataSynth | LLM 批量合成 | GitHub |
| Production | DataLabel | 零服务器标注 · LLM 预标注 · IAA 分析 | You are here |
| Quality | DataCheck | 规则验证、重复检测、分布分析 | GitHub |
| Audit | ModelAudit | 蒸馏检测、模型指纹 | GitHub |
| Identity | knowlyr-id | 身份系统 + AI 员工运行时 | GitHub |
| Deliberation | Crew | 对抗式多智能体协商 · 持久记忆进化 · MCP 原生 | GitHub |
| Agent Training | knowlyr-agent | Gymnasium 风格 RL 框架 · 过程奖励模型 · SFT/DPO/GRPO | GitHub |
Development
git clone https://github.com/liuxiaotong/data-label.git
cd data-label
pip install -e ".[all,dev]"
pytest # 296 test cases
CI: GitHub Actions,Python 3.10+,Codecov 覆盖率。Tag push 自动发布 PyPI + GitHub Release。
References
- Inter-Annotator Agreement — Artstein, R. & Poesio, M., 2008. Inter-Coder Agreement for Computational Linguistics. Computational Linguistics — IAA 指标的系统性综述
- Cohen's Kappa — Cohen, J., 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement — 两标注者一致性度量
- Fleiss' Kappa — Fleiss, J.L., 1971. Measuring Nominal Scale Agreement Among Many Raters. Psychological Bulletin — 多标注者一致性推广
- Krippendorff's Alpha — Krippendorff, K., 2011. Computing Krippendorff's Alpha-Reliability. — 支持缺失数据的通用一致性度量
- Active Learning — Settles, B., 2009. Active Learning Literature Survey. CS Technical Report, University of Wisconsin-Madison — 主动学习选择策略
- RLHF — Christiano, P. et al., 2017. Deep RL from Human Preferences. arXiv:1706.03741 — 人类偏好标注驱动的强化学习
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file knowlyr_datalabel-0.3.1.tar.gz.
File metadata
- Download URL: knowlyr_datalabel-0.3.1.tar.gz
- Upload date:
- Size: 144.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e67bd95d5f11146e397333a109ac50babff6166a7c15b53e3e51d002a72bc7a
|
|
| MD5 |
97bbe528e42ec1fe0fc22ba00d848eb3
|
|
| BLAKE2b-256 |
23afe500a55bbafdc859f66fe1f7b8efb28beb544ad8921748bc8341bb3a527c
|
File details
Details for the file knowlyr_datalabel-0.3.1-py3-none-any.whl.
File metadata
- Download URL: knowlyr_datalabel-0.3.1-py3-none-any.whl
- Upload date:
- Size: 68.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e9f928334a2575f0844e21bef163bbd939d335bfab2fa9bac0c920fbe9c5d4a
|
|
| MD5 |
d290817ff77189b7fd66d6a078f1762e
|
|
| BLAKE2b-256 |
c578841d564baaac7bdaae3f199b19b911ec7fdb6a202a92b78075c86f5b8695
|