AI content safety scanner — scan, detect, and sanitize unsafe content in text
Project description
GuardRail 🛡️
LLM 内容审核护栏 — 防止工具输出污染对话上下文,避免 DeepSeek / OpenAI / Claude 的 400 错误。
这是什么?
用 LLM API 的人都遇到过这个问题:
HTTP 400: Content Exists Risk ← DeepSeek
HTTP 400: content_policy_violation ← OpenAI
整条会话报废,必须清空上下文重新开始。
根因不是你的 system prompt,也不是用户输入——而是工具输出(搜索结果、网页抓取、文件读取)里的敏感内容混进了对话上下文。
GuardRail 在 LLM API 和外部内容之间加一道安全护栏:扫描 → 检测 → 脱敏,防止脏数据污染你的会话。
真实案例
PA 搜 GitHub「量化」相关项目,搜索结果混入了 funNLP:
⭐ fighting41love/funNLP — 反动词表、暴恐词表、敏感词库、中文谣言数据...
这些关键词触发了 DeepSeek 审核,47 条消息全部被拒,整条会话报废。
# 一行代码解决
from guardrail import GuardRail
gr = GuardRail()
result = gr.scan(tool_output)
if not result.safe:
tool_output = result.sanitized # 敏感词已脱敏,会话安全
安装
pip install guardrail-safety
快速开始
3 行代码接入
from guardrail import GuardRail
gr = GuardRail()
result = gr.scan("搜索结果包含敏感内容...")
print(result.safe) # False
print(result.triggers) # ['illegal_content']
print(result.sanitized) # 敏感词已被替换
四种集成模式
| 模式 | 场景 | 示例 |
|---|---|---|
| Python 库 | Agent 代码中直接调用 | gr.scan(text) |
| CLI | 命令行手动扫描 | guardrail scan --text "..." |
| 代理中间件 | FastAPI / Flask 自动拦截 | GuardRailMiddleware() |
| Hermes Skill | Agent 自动加载 | skill_view('content-safety-scanner') |
详细用法
Python API
from guardrail import GuardRail, Sanitizer
# 初始化扫描器
gr = GuardRail()
# 扫描文本
result = gr.scan("My IP is 192.168.1.1 and I used sqlmap to scan.")
print(result.safe) # False
print(result.triggers) # ['ip_address', 'hacker_tools']
print(result.sanitized) # "My IP is [REDACTED IP ADDRESS] and I used [HACKING TOOL REFERENCE REMOVED] to scan."
# 检查 LLM 请求(消息列表)
messages = [
{"role": "user", "content": "搜索结果..."},
{"role": "assistant", "content": "回复内容..."},
]
safe = gr.check_request(messages) # True / False
三种脱敏策略
from guardrail import Sanitizer
from guardrail.sanitizer import MatchInfo
sanitizer = Sanitizer()
matches = [MatchInfo("敏感词", 0, 3, "political_sensitive")]
# 替换为安全描述
sanitizer.sanitize("文本", strategy="replace", matches=matches)
# → "[POLITICALLY SENSITIVE CONTENT REMOVED]"
# 遮盖为 ***
sanitizer.sanitize("文本", strategy="mask", matches=matches)
# → "***"
# 删除整行
sanitizer.sanitize("干净行\n敏感行\n干净行", strategy="remove", matches=matches)
# → "干净行\n干净行"
CLI 用法
# 扫描文本
guardrail scan --text "Check this IP: 10.0.0.1"
# 扫描文件
guardrail scan --file search_result.txt
# JSON 输出
guardrail scan --text "sensitive content" --json
# 脱敏(替换策略)
guardrail sanitize --text "sensitive data" --strategy replace
# 脱敏(遮盖策略)
guardrail sanitize --text "sensitive data" --strategy mask
# 脱敏(删除策略)
guardrail sanitize --text "sensitive data" --strategy remove
代理中间件
from guardrail.proxy import GuardRailMiddleware
middleware = GuardRailMiddleware()
# 检查单条文本
safe, details = middleware.check("user input text")
if not safe:
print(f"触发: {details['triggers']}")
print(f"安全版本: {details['sanitized']}")
# 检查 LLM 消息列表
all_safe, per_msg = middleware.check_request(messages)
# 直接脱敏响应
safe_text = middleware.sanitize_response("response with sensitive data")
规则库
GuardRail 基于 YAML 规则文件,支持精确匹配和正则模式:
| 类别 | 说明 | 示例触发词 |
|---|---|---|
political_sensitive |
政治敏感词 | tiananmen, falun gong... |
illegal_content |
违法内容引用 | cocaine, ransomware... |
personal_info |
个人隐私信息 | social security number... |
hacker_tools |
黑客工具引用 | sqlmap, metasploit... |
| 正则模式 | 说明 |
|---|---|
ip_address |
IP 地址 |
email |
邮箱地址 |
phone_number |
电话号码 |
credit_card |
信用卡号 |
api_key |
API 密钥 |
jwt_token |
JWT Token |
url |
URL 链接 |
自定义规则
在 guardrail/rules/ 目录下创建 YAML 文件即可扩展:
# trigger_words.yml
my_category:
- "敏感词1"
- "敏感词2"
# patterns.yml
custom_pattern:
pattern: '\b\d{6}\b' # 匹配6位数字
description: '[REDACTED CUSTOM]'
架构
┌─────────────────────────────────────────────┐
│ 你的 Agent / 应用 │
└────────────┬────────────────┬───────────────┘
│ │
┌─────────▼──────┐ ┌─────▼──────────┐
│ GuardRail CLI │ │ GuardRail Lib │
│ (手动扫描) │ │ (Python import)│
└─────────┬──────┘ └─────┬──────────┘
│ │
┌─────────▼────────────────▼──────────┐
│ 规则引擎 │
│ ┌──────────┐ ┌──────────────┐ │
│ │ 精确匹配 │ │ 正则模式匹配 │ │
│ └──────────┘ └──────────────┘ │
└─────────────────────────────────────┘
│
┌─────────▼──────────────────────────┐
│ 脱敏引擎 │
│ replace / mask / remove │
└────────────────────────────────────┘
贡献
欢迎贡献规则和改进!
- Fork 本仓库
- 创建特性分支 (
git checkout -b feature/my-rule) - 提交更改 (
git commit -m 'Add new rule category') - 推送到分支 (
git push origin feature/my-rule) - 创建 Pull Request
规则贡献指南
- 在
guardrail/rules/下添加 YAML 规则文件 - 每个类别至少 5 个触发词
- 正则模式必须附带测试用例
- 中文敏感词请参考国内 LLM API 审核标准
许可证
MIT License — 详见 LICENSE
致谢
GuardRail 从一个真实的 bug 中长出:
PA 搜 GitHub「量化」→ 混入 funNLP 敏感词 → DeepSeek 400 → 诊断根因 → 设计护栏 → 开源
感谢所有用 LLM API 做 Agent 的开发者——你们都可能遇到这个问题。
作者: zhangzhiwei610
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file guardrail_safety-0.1.0.tar.gz.
File metadata
- Download URL: guardrail_safety-0.1.0.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdeccdb63a98f1fc669d2002e40f11d2d064ad4ea1e5fb201bc86a2718e39ea7
|
|
| MD5 |
2cb45ecfaf938b121396986dc346907a
|
|
| BLAKE2b-256 |
36594af0d798f18d47c0a749e827b0dc87d948d5df8cdd832fd7b0cab9404666
|
File details
Details for the file guardrail_safety-0.1.0-py3-none-any.whl.
File metadata
- Download URL: guardrail_safety-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc5036a86350ba4f9d8bebe1311fd27048835f76741f75f3b5c9d73c1d404ea8
|
|
| MD5 |
a02cf8de1a3c2c7c52f3628d8c8b7c5d
|
|
| BLAKE2b-256 |
47963549fee2d6fd016e496ab9fdd208c306712ec13587a06b0d95cbdb7b7052
|