LLM distillation detection and model fingerprint audit tool - text source detection, model identity verification, and distillation analysis
Project description
ModelAudit
LLM Distillation Detection and Model Fingerprinting
via Statistical Forensics
LLM 蒸馏检测与模型指纹审计 — 统计取证 · 行为签名 · 跨模型血缘推断
Detect unauthorized model distillation through behavioral probing, stylistic fingerprinting, and representation similarity analysis
Abstract · Problem Statement · Formal Framework · Architecture · Key Innovations · Quick Start · Detection Methods · MCP Server · Ecosystem · References
Abstract
大语言模型的蒸馏行为 (knowledge distillation) 已成为模型知识产权保护的核心威胁——学生模型通过模仿教师模型的输出分布,可以在未经授权的情况下复制其能力。现有检测方法要么依赖白盒权重访问(实际场景中通常不可得),要么仅分析表面文本特征(易被规避)。
ModelAudit 提出基于统计取证 (statistical forensics) 的多方法蒸馏检测框架:通过行为探测 (behavioral probing) 提取模型指纹 $\mathcal{F}(M)$,基于假设检验 (hypothesis testing) 判定蒸馏关系,结合风格签名 (stylistic signature)、行为血缘推断 (behavioral lineage inference) 和表示相似度 (representation similarity) 四种互补方法,实现黑盒到白盒的完整审计链。
ModelAudit implements a multi-method distillation detection framework based on statistical forensics. The system extracts model fingerprints $\mathcal{F}(M)$ through 20 behavioral probes, applies hypothesis testing $H_0: M_S \perp M_T$ to determine distillation relationships, and combines 4 complementary methods — LLMmap (behavioral probing), DLI (lineage inference via Jensen-Shannon divergence), REEF (CKA representation similarity), and StyleAnalysis (12-family stylistic signatures) — to form a complete black-box to white-box audit chain. Built-in benchmark achieves 100% detection accuracy across 6 model families (14 samples).
Problem Statement
模型蒸馏检测面临三个根本性挑战:
| 根本性问题 | 形式化定义 | 现有方法局限 | ModelAudit 的方法 |
|---|---|---|---|
| 蒸馏不可观测 Distillation Opacity |
蒸馏过程 $M_S \leftarrow \text{KD}(M_T)$ 对外部观察者不可见,仅可观测 $M_T$ 和 $M_S$ 的输入输出行为 | 依赖白盒权重访问(API-only 模型无法适用) | 行为探测指纹:20 个探测 Prompt 提取可观测行为特征,黑盒即可工作 |
| 风格趋同 Stylistic Convergence |
RLHF 对齐使不同模型的输出风格趋于相似,$\text{style}(M_i) \approx \text{style}(M_j)$ | 简单文本特征(长度、词频)区分度不足 | 多维度签名:自我认知 / 安全边界 / 注入测试 / 格式控制等 10 个维度,捕获深层行为差异 |
| 跨模型不可比 Cross-Model Incomparability |
不同 Provider 的 API 格式、参数、行为规范各异 | 单一方法覆盖不全,黑盒/白盒方法割裂 | 四方法融合:LLMmap + DLI + REEF + StyleAnalysis,从行为到表示多层次覆盖 |
ModelAudit 不是通用的模型评测工具。它专注于一个问题:这个模型是否未经授权地复制了另一个模型的能力? 通过行为指纹提取和统计检验给出可量化的审计结论。
Formal Framework
Model Fingerprint Extraction
模型指纹定义为探测集上的行为响应分布:
$$\mathcal{F}(M) = {p_M(y \mid x_i)}_{i=1}^{N}$$
其中 ${x_i}_{i=1}^{N}$ 为 $N=20$ 个探测 Prompt,覆盖自我认知、安全边界、注入测试、推理、创意、多语言、格式控制、角色扮演、代码生成、摘要能力 10 个维度。每个探测的响应 $y$ 被提取为特征向量 $\phi(y) \in \mathbb{R}^d$。
Distillation Hypothesis Testing
蒸馏检测形式化为假设检验问题:
$$H_0: M_S \perp M_T \quad \text{vs} \quad H_1: M_S \leftarrow M_T$$
检验统计量(LLMmap 方法)——基于指纹向量的 Pearson 相关:
$$\text{sim}(M_1, M_2) = \frac{\sum_i (\phi_i^{(1)} - \bar{\phi}^{(1)})(\phi_i^{(2)} - \bar{\phi}^{(2)})}{\sqrt{\sum_i (\phi_i^{(1)} - \bar{\phi}^{(1)})^2 \cdot \sum_i (\phi_i^{(2)} - \bar{\phi}^{(2)})^2}}$$
当 $\text{sim}(M_S, M_T) > \delta$(默认 $\delta = 0.7$)时,拒绝 $H_0$,判定存在蒸馏嫌疑。
Behavioral Lineage Inference (DLI)
基于 Jensen-Shannon 散度的血缘推断:
$$D_{JS}(P | Q) = \frac{1}{2} D_{KL}(P | M) + \frac{1}{2} D_{KL}(Q | M), \quad M = \frac{P + Q}{2}$$
对每个探测维度计算行为签名的 JS 散度,综合多维度得分判定血缘关系。
Representation Similarity (REEF)
白盒场景下,基于 Centered Kernel Alignment (CKA) 比对中间层隐藏状态:
$$\text{CKA}(X, Y) = \frac{|Y^T X|_F^2}{|X^T X|_F \cdot |Y^T Y|_F}$$
其中 $X, Y$ 分别为教师和学生模型在相同输入上的隐藏层激活矩阵。逐层 CKA 热力图可揭示蒸馏发生在哪些层。
Architecture
graph LR
P["Probe Library<br/>20 Prompts × 10 Dims"] --> E["AuditEngine<br/>Concurrent Probing"]
E --> F["Fingerprint<br/>Feature Extraction"]
F --> L["LLMmap<br/>Pearson Correlation"]
F --> D["DLI<br/>JS Divergence"]
F --> S["StyleAnalysis<br/>12-Family Signatures"]
F --> R["REEF<br/>CKA Similarity"]
L --> V["Verdict Engine<br/>Hypothesis Testing"]
D --> V
S --> V
R --> V
V --> Rep["Audit Report<br/>6-Section Markdown"]
style E fill:#0969da,color:#fff,stroke:#0969da
style V fill:#8b5cf6,color:#fff,stroke:#8b5cf6
style Rep fill:#2da44e,color:#fff,stroke:#2da44e
style P fill:#1a1a2e,color:#e0e0e0,stroke:#444
style F fill:#1a1a2e,color:#e0e0e0,stroke:#444
style L fill:#1a1a2e,color:#e0e0e0,stroke:#444
style D fill:#1a1a2e,color:#e0e0e0,stroke:#444
style S fill:#1a1a2e,color:#e0e0e0,stroke:#444
style R fill:#1a1a2e,color:#e0e0e0,stroke:#444
Layered Architecture
| 层 | 模块 | 职责 |
|---|---|---|
| Probing | probes/prompts.py |
20 个探测 Prompt,覆盖 10 个行为维度 |
| Engine | engine.py |
统一入口,ThreadPoolExecutor 并发探测 (4 并发) |
| Methods | methods/ |
4 种检测方法注册表,按黑盒/白盒分层 |
| Fingerprint | models.py |
Pydantic 数据模型,指纹特征向量 |
| Cache | cache.py |
SHA-256 防碰撞指纹缓存,TTL 过期 |
| Report | report.py |
6 节结构化审计报告生成 |
| Benchmark | benchmark.py |
14 条样本 × 6 家族内置评估集 |
| Interface | cli.py · mcp_server.py |
CLI + MCP 8 工具 |
Key Innovations
1. Multi-Method Forensic Analysis
四种互补检测方法覆盖从黑盒到白盒的完整审计链:
| 方法 | 类型 | 原理 | 参考 |
|---|---|---|---|
| LLMmap | 黑盒 | 20 个探测 Prompt,Pearson 相关比对响应模式 | USENIX Security 2025 |
| DLI | 黑盒 | 行为签名 + Jensen-Shannon 散度血缘推断 | ICLR 2026 |
| REEF | 白盒 | CKA 逐层隐藏状态相似度 | NeurIPS 2024 |
| StyleAnalysis | 风格分析 | 12 个模型家族风格签名 + 语言检测 | — |
任一方法独立可用,多方法融合提高判定置信度。内置 benchmark 在 6 个模型家族上实现 100% 检测准确率。
2. Behavioral Probing with 10-Dimensional Coverage
超越简单的文本统计特征,通过 10 个认知维度的结构化探测提取深层行为差异:
| 维度 | 探测内容 |
|---|---|
| 自我认知 | 模型身份、创建者、训练截止 |
| 安全边界 | 拒绝策略、措辞差异 |
| 注入测试 | Prompt injection 响应差异 |
| 知识与推理 | 知识边界、逻辑推理、伦理判断 |
| 创意写作 | 叙事风格、类比能力 |
| 多语言 | 中文响应、多语翻译 |
| 格式控制 | JSON 输出、Markdown 表格 |
| 角色扮演 | 角色一致性、创意表达 |
| 代码生成 | 编码风格、注释习惯 |
| 摘要能力 | 信息压缩、表达密度 |
这些维度在 RLHF 对齐后仍保留显著的模型间差异,是可靠的指纹特征来源。
3. Cross-Provider Audit Chain
支持跨 Provider 的蒸馏审计——教师和学生模型可来自不同 API:
# 跨 provider 审计:Anthropic 教师 vs Moonshot 学生
knowlyr-modelaudit audit \
--teacher claude-opus --teacher-provider anthropic \
--student kimi-k2.5 --student-provider openai \
--student-api-base https://api.moonshot.cn/v1 \
-o report.md
自动生成 6 节详细审计报告:审计对象 → 方法 → 结果(指纹详情 + 逐条探测)→ 关键发现 → 结论 → 局限性声明。
4. Concurrent Probing with Intelligent Caching
ThreadPoolExecutor 并发发送探测 Prompt(默认 4 并发),指纹缓存支持 SHA-256 防碰撞 + TTL 过期:
- 首次探测:并发调用 API,自动缓存指纹到
.modelaudit_cache/ - 再次审计:直接复用缓存,避免重复 API 调用
- 智能重试:指数退避 + 认证/速率限制错误分类 + 可配置超时与重试次数
支持识别的 12 个模型家族:gpt-4 · gpt-3.5 · claude · llama · gemini · qwen · deepseek · mistral · yi · phi · cohere · chatglm
Quick Start
pip install knowlyr-modelaudit
可选依赖
pip install knowlyr-modelaudit[blackbox] # 黑盒指纹 (openai, anthropic, httpx)
pip install knowlyr-modelaudit[whitebox] # 白盒指纹 (torch, transformers)
pip install knowlyr-modelaudit[mcp] # MCP 服务器
pip install knowlyr-modelaudit[all] # 全部功能
# 1. 检测文本来源
knowlyr-modelaudit detect texts.jsonl
# 2. 验证模型身份
knowlyr-modelaudit verify gpt-4o --provider openai
# 3. 比对模型指纹
knowlyr-modelaudit compare gpt-4o claude-sonnet --provider openai
# 4. 完整蒸馏审计
knowlyr-modelaudit audit --teacher gpt-4o --student my-model -o report.md
# 5. 运行 benchmark
knowlyr-modelaudit benchmark
Python SDK
from modelaudit import AuditEngine
engine = AuditEngine()
# 检测文本来源
results = engine.detect(["Hello! I'd be happy to help..."])
for r in results:
print(f"{r.predicted_model}: {r.confidence:.2%}")
# 比对模型指纹
result = engine.compare("gpt-4o", "my-model", method="llmmap")
print(f"相似度: {result.similarity:.4f}")
print(f"蒸馏关系: {'是' if result.is_derived else '否'}")
# 完整审计(跨 provider)
audit = engine.audit(
"claude-opus", "kimi-k2.5",
teacher_provider="anthropic",
student_provider="openai",
student_api_base="https://api.moonshot.cn/v1",
)
print(f"{audit.verdict} (confidence: {audit.confidence:.3f})")
Detection Methods
探测维度详情(20 个 Probe)
| 维度 | 探测内容 |
|---|---|
| 自我认知 | 模型身份、创建者、训练截止 |
| 安全边界 | 拒绝策略、措辞差异 |
| 注入测试 | Prompt injection 响应差异 |
| 知识与推理 | 知识边界、逻辑推理、伦理判断 |
| 创意写作 | 叙事风格、类比能力 |
| 多语言 | 中文响应、多语翻译 |
| 格式控制 | JSON 输出、Markdown 表格 |
| 角色扮演 | 角色一致性、创意表达 |
| 代码生成 | 编码风格、注释习惯 |
| 摘要能力 | 信息压缩、表达密度 |
MCP Server
{
"mcpServers": {
"knowlyr-modelaudit": {
"command": "uv",
"args": ["--directory", "/path/to/model-audit", "run", "python", "-m", "modelaudit.mcp_server"]
}
}
}
| Tool | Description |
|---|---|
detect_text_source |
检测文本数据来源 |
verify_model |
验证模型身份 |
compare_models |
黑盒比对 (llmmap / dli / style) |
compare_models_whitebox |
白盒比对 (REEF CKA) |
audit_distillation |
完整蒸馏审计 |
audit_memorization |
记忆化检测(前缀补全相似度) |
audit_report |
生成综合审计报告 |
audit_watermark |
水印检测(零宽字符 / 统计特征 / 双元组唯一率) |
CLI Reference
完整命令列表
| 命令 | 功能 |
|---|---|
knowlyr-modelaudit detect <file> |
检测文本数据来源 |
knowlyr-modelaudit detect <file> -n 50 |
限制检测条数 |
knowlyr-modelaudit verify <model> |
验证模型身份 |
knowlyr-modelaudit compare <a> <b> |
比对两个模型指纹 |
knowlyr-modelaudit audit --teacher <a> --student <b> |
完整蒸馏审计 |
knowlyr-modelaudit audit ... --teacher-provider anthropic |
跨 provider 审计 |
knowlyr-modelaudit audit ... --no-cache |
跳过缓存 |
knowlyr-modelaudit audit ... -f json |
JSON 格式报告 |
knowlyr-modelaudit cache list |
查看缓存的指纹 |
knowlyr-modelaudit cache clear |
清除所有缓存 |
knowlyr-modelaudit benchmark |
运行内置 benchmark |
knowlyr-modelaudit benchmark --label claude |
按模型家族过滤 |
knowlyr-modelaudit methods |
列出可用检测方法 |
Ecosystem
Architecture Diagram
graph LR
Radar["Radar<br/>Discovery"] --> Recipe["Recipe<br/>Analysis"]
Recipe --> Synth["Synth<br/>Generation"]
Recipe --> Label["Label<br/>Annotation"]
Synth --> Check["Check<br/>Quality"]
Label --> Check
Check --> Audit["Audit<br/>Model Audit"]
Crew["Crew<br/>Deliberation Engine"]
Agent["Agent<br/>RL Framework"]
ID["ID<br/>Identity Runtime"]
Crew -.->|能力定义| ID
ID -.->|身份 + 记忆| Crew
Crew -.->|轨迹 + 奖励| Agent
Agent -.->|优化策略| Crew
style Audit fill:#0969da,color:#fff,stroke:#0969da
style Crew fill:#2da44e,color:#fff,stroke:#2da44e
style Agent fill:#8b5cf6,color:#fff,stroke:#8b5cf6
style ID fill:#e5534b,color:#fff,stroke:#e5534b
style Radar fill:#1a1a2e,color:#e0e0e0,stroke:#444
style Recipe fill:#1a1a2e,color:#e0e0e0,stroke:#444
style Synth fill:#1a1a2e,color:#e0e0e0,stroke:#444
style Label fill:#1a1a2e,color:#e0e0e0,stroke:#444
style Check fill:#1a1a2e,color:#e0e0e0,stroke:#444
| Layer | Project | Description | Repo |
|---|---|---|---|
| Discovery | AI Dataset Radar | 数据集竞争情报、趋势分析 | GitHub |
| Analysis | DataRecipe | 逆向分析、Schema 提取、成本估算 | GitHub |
| Production | DataSynth / DataLabel | LLM 批量合成 / 轻量标注 | GitHub · GitHub |
| Quality | DataCheck | 规则验证、重复检测、分布分析 | GitHub |
| Audit | ModelAudit | 蒸馏检测 · 模型指纹 · 统计取证 | You are here |
| Identity | knowlyr-id | 身份系统 + AI 员工运行时 | GitHub |
| Deliberation | Crew | 对抗式多智能体协商 · 持久记忆进化 · MCP 原生 | GitHub |
| Agent Training | knowlyr-agent | Gymnasium 风格 RL 框架 · 过程奖励模型 · SFT/DPO/GRPO | GitHub |
Development
git clone https://github.com/liuxiaotong/model-audit.git
cd model-audit
pip install -e ".[all,dev]"
pytest
CI: GitHub Actions,Python 3.10+。Tag push 自动发布 PyPI + GitHub Release。
References
- LLMmap — Haller, R. et al., 2025. LLMmap: Fingerprinting For Large Language Models. USENIX Security — 行为探测指纹的基础方法
- DLI — Chen, W. et al., 2026. Detecting LLM Distillation via Behavioral Lineage Inference. ICLR — 基于 JS 散度的蒸馏血缘推断
- REEF — Jia, J. et al., 2024. REEF: Representation Encoding Fingerprints for Large Language Models. NeurIPS — CKA 白盒表示相似度
- Knowledge Distillation — Hinton, G. et al., 2015. Distilling the Knowledge in a Neural Network. arXiv:1503.02531 — 知识蒸馏的奠基性工作
- CKA — Kornblith, S. et al., 2019. Similarity of Neural Network Representations Revisited. ICML — 表示相似度度量方法
- Model Fingerprinting — Cao, X. et al., 2021. IPGuard: Protecting Intellectual Property of Deep Neural Networks via Fingerprinting the Classification Boundary. AsiaCCS
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file knowlyr_modelaudit-0.4.1.tar.gz.
File metadata
- Download URL: knowlyr_modelaudit-0.4.1.tar.gz
- Upload date:
- Size: 64.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70ea20195b2206b034aadb736986fffe2d62c6878b64ab93ee0eb31f7b9c582c
|
|
| MD5 |
1ec0acb61dcf7e7336d05b0115270661
|
|
| BLAKE2b-256 |
d747b323bf2ca7fcb282a42e1e48d2b6ac1bf502f080871225adad2f016af207
|
File details
Details for the file knowlyr_modelaudit-0.4.1-py3-none-any.whl.
File metadata
- Download URL: knowlyr_modelaudit-0.4.1-py3-none-any.whl
- Upload date:
- Size: 62.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67840221a968152ad55180e9078ab96df19cf290d8114a8a318f78537874cd3e
|
|
| MD5 |
261d3756175de90774d357fcc9543007
|
|
| BLAKE2b-256 |
45dab8a0b74b6b3ee2ec59f452533b32ce0933d3d9fef4f4fff53755871d7a28
|