AI content safety scanner — scan, detect, and sanitize unsafe content in text

These details have not been verified by PyPI

Project links

Project description

GuardRail 🛡️

LLM 内容审核护栏 — 防止工具输出污染对话上下文，避免 DeepSeek / OpenAI / Claude 的 400 错误。

这是什么？

用 LLM API 的人都遇到过这个问题：

HTTP 400: Content Exists Risk       ← DeepSeek
HTTP 400: content_policy_violation  ← OpenAI

整条会话报废，必须清空上下文重新开始。

根因不是你的 system prompt，也不是用户输入——而是工具输出（搜索结果、网页抓取、文件读取）里的敏感内容混进了对话上下文。

GuardRail 在 LLM API 和外部内容之间加一道安全护栏：扫描 → 检测 → 脱敏，防止脏数据污染你的会话。

真实案例

PA 搜 GitHub「量化」相关项目，搜索结果混入了 funNLP：

⭐ fighting41love/funNLP — 反动词表、暴恐词表、敏感词库、中文谣言数据...

这些关键词触发了 DeepSeek 审核，47 条消息全部被拒，整条会话报废。

# 一行代码解决
from guardrail import GuardRail
gr = GuardRail()
result = gr.scan(tool_output)
if not result.safe:
    tool_output = result.sanitized  # 敏感词已脱敏，会话安全

安装

pip install guardrail-safety

快速开始

3 行代码接入

from guardrail import GuardRail

gr = GuardRail()
result = gr.scan("搜索结果包含敏感内容...")
print(result.safe)      # False
print(result.triggers)  # ['illegal_content']
print(result.sanitized) # 敏感词已被替换

四种集成模式

模式	场景	示例
Python 库	Agent 代码中直接调用	`gr.scan(text)`
CLI	命令行手动扫描	`guardrail scan --text "..."`
代理中间件	FastAPI / Flask 自动拦截	`GuardRailMiddleware()`
Hermes Skill	Agent 自动加载	`skill_view('content-safety-scanner')`

详细用法

Python API

from guardrail import GuardRail, Sanitizer

# 初始化扫描器
gr = GuardRail()

# 扫描文本
result = gr.scan("My IP is 192.168.1.1 and I used sqlmap to scan.")
print(result.safe)       # False
print(result.triggers)   # ['ip_address', 'hacker_tools']
print(result.sanitized)  # "My IP is [REDACTED IP ADDRESS] and I used [HACKING TOOL REFERENCE REMOVED] to scan."

# 检查 LLM 请求（消息列表）
messages = [
    {"role": "user", "content": "搜索结果..."},
    {"role": "assistant", "content": "回复内容..."},
]
safe = gr.check_request(messages)  # True / False

三种脱敏策略

from guardrail import Sanitizer
from guardrail.sanitizer import MatchInfo

sanitizer = Sanitizer()
matches = [MatchInfo("敏感词", 0, 3, "political_sensitive")]

# 替换为安全描述
sanitizer.sanitize("文本", strategy="replace", matches=matches)
# → "[POLITICALLY SENSITIVE CONTENT REMOVED]"

# 遮盖为 ***
sanitizer.sanitize("文本", strategy="mask", matches=matches)
# → "***"

# 删除整行
sanitizer.sanitize("干净行\n敏感行\n干净行", strategy="remove", matches=matches)
# → "干净行\n干净行"

CLI 用法

# 扫描文本
guardrail scan --text "Check this IP: 10.0.0.1"

# 扫描文件
guardrail scan --file search_result.txt

# JSON 输出
guardrail scan --text "sensitive content" --json

# 脱敏（替换策略）
guardrail sanitize --text "sensitive data" --strategy replace

# 脱敏（遮盖策略）
guardrail sanitize --text "sensitive data" --strategy mask

# 脱敏（删除策略）
guardrail sanitize --text "sensitive data" --strategy remove

代理中间件

from guardrail.proxy import GuardRailMiddleware

middleware = GuardRailMiddleware()

# 检查单条文本
safe, details = middleware.check("user input text")
if not safe:
    print(f"触发: {details['triggers']}")
    print(f"安全版本: {details['sanitized']}")

# 检查 LLM 消息列表
all_safe, per_msg = middleware.check_request(messages)

# 直接脱敏响应
safe_text = middleware.sanitize_response("response with sensitive data")

规则库

GuardRail 基于 YAML 规则文件，支持精确匹配和正则模式：

类别	说明	示例触发词
`political_sensitive`	政治敏感词	tiananmen, falun gong...
`illegal_content`	违法内容引用	cocaine, ransomware...
`personal_info`	个人隐私信息	social security number...
`hacker_tools`	黑客工具引用	sqlmap, metasploit...

正则模式	说明
`ip_address`	IP 地址
`email`	邮箱地址
`phone_number`	电话号码
`credit_card`	信用卡号
`api_key`	API 密钥
`jwt_token`	JWT Token
`url`	URL 链接

自定义规则

在 guardrail/rules/ 目录下创建 YAML 文件即可扩展：

# trigger_words.yml
my_category:
  - "敏感词1"
  - "敏感词2"

# patterns.yml
custom_pattern:
  pattern: '\b\d{6}\b'  # 匹配6位数字
  description: '[REDACTED CUSTOM]'

架构

┌─────────────────────────────────────────────┐
│              你的 Agent / 应用               │
└────────────┬────────────────┬───────────────┘
             │                │
   ┌─────────▼──────┐  ┌─────▼──────────┐
   │  GuardRail CLI  │  │ GuardRail Lib  │
   │  (手动扫描)     │  │ (Python import)│
   └─────────┬──────┘  └─────┬──────────┘
             │                │
   ┌─────────▼────────────────▼──────────┐
   │         规则引擎                      │
   │  ┌──────────┐  ┌──────────────┐     │
   │  │ 精确匹配  │  │ 正则模式匹配  │     │
   │  └──────────┘  └──────────────┘     │
   └─────────────────────────────────────┘
             │
   ┌─────────▼──────────────────────────┐
   │         脱敏引擎                    │
   │  replace / mask / remove           │
   └────────────────────────────────────┘

贡献

欢迎贡献规则和改进！

Fork 本仓库
创建特性分支 (git checkout -b feature/my-rule)
提交更改 (git commit -m 'Add new rule category')
推送到分支 (git push origin feature/my-rule)
创建 Pull Request

规则贡献指南

在 guardrail/rules/ 下添加 YAML 规则文件
每个类别至少 5 个触发词
正则模式必须附带测试用例
中文敏感词请参考国内 LLM API 审核标准

许可证

MIT License — 详见 LICENSE

致谢

GuardRail 从一个真实的 bug 中长出：

PA 搜 GitHub「量化」→ 混入 funNLP 敏感词 → DeepSeek 400 → 诊断根因 → 设计护栏 → 开源

感谢所有用 LLM API 做 Agent 的开发者——你们都可能遇到这个问题。

作者： zhangzhiwei610

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

guardrail_safety-0.1.0.tar.gz (16.7 kB view details)

Uploaded Jun 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

guardrail_safety-0.1.0-py3-none-any.whl (13.4 kB view details)

Uploaded Jun 1, 2026 Python 3

File details

Details for the file guardrail_safety-0.1.0.tar.gz.

File metadata

Download URL: guardrail_safety-0.1.0.tar.gz
Upload date: Jun 1, 2026
Size: 16.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for guardrail_safety-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cdeccdb63a98f1fc669d2002e40f11d2d064ad4ea1e5fb201bc86a2718e39ea7`
MD5	`2cb45ecfaf938b121396986dc346907a`
BLAKE2b-256	`36594af0d798f18d47c0a749e827b0dc87d948d5df8cdd832fd7b0cab9404666`

See more details on using hashes here.

File details

Details for the file guardrail_safety-0.1.0-py3-none-any.whl.

File metadata

Download URL: guardrail_safety-0.1.0-py3-none-any.whl
Upload date: Jun 1, 2026
Size: 13.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for guardrail_safety-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fc5036a86350ba4f9d8bebe1311fd27048835f76741f75f3b5c9d73c1d404ea8`
MD5	`a02cf8de1a3c2c7c52f3628d8c8b7c5d`
BLAKE2b-256	`47963549fee2d6fd016e496ab9fdd208c306712ec13587a06b0d95cbdb7b7052`

See more details on using hashes here.

guardrail-safety 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GuardRail 🛡️

这是什么？

真实案例

安装

快速开始

3 行代码接入

四种集成模式

详细用法

Python API

三种脱敏策略

CLI 用法

代理中间件

规则库

自定义规则

架构

贡献

规则贡献指南

许可证

致谢

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes