Skip to main content

A zero-config, local-first privacy layer for AI APIs with semantic-preserving de-identification.

Project description

YinShield

YinShield is a local-first privacy layer for LLM workflows.

当前版本的发布形态是:

  • PyPI 包:yinshield
  • 本地 HTTP 服务:yinshield serve
  • OpenClaw 薄插件:@serein-213/openclaw-yinshield

Status

  • 当前建议发布定位:0.1.0 alpha
  • 适用场景:本地单用户隐私层、开发者接入验证、OpenClaw 集成试用
  • 当前最稳模式:mode="placeholder"
  • 当前仍在持续打磨的部分:mode="alias" 在更真实英文分布下的恢复率与误伤控制

What Works Now

  • 中英 PII 脱敏:中文姓名、英文姓名、手机号、US phone、身份证、SSN、邮箱、微信号、银行卡、银行账号、开户行、座机、车牌、护照、统一社会信用代码、税号、公司名、地址、生日、DOB、IP、VIN、EIN、病历号、MRN、订单号、快递单号、tracking number、客户号、会员号、合同号
  • 两种替换模式:
    • mode="placeholder"张三 -> <PERSON_1>
    • mode="alias"张三 -> 陈明
  • 三档策略:
    • loose:只处理高置信实体
    • balanced:默认,适合一般对话和客服文本
    • strict:覆盖更多上下文实体和业务编号
  • 会话一致性:同一实体可跨轮保持一致替换,且支持持久化到文件
  • OpenAI-compatible 接入:
    • ShieldedOpenAI
    • ShieldedAsyncOpenAI
    • chat.completions
    • responses
    • stream=True
    • base_url=...
  • 本地 HTTP 服务:
    • POST /health
    • POST /mask
    • POST /unmask
    • POST /messages/mask
  • OpenClaw 集成:
    • yinshield_mask
    • yinshield_unmask
    • yinshield_shield_messages

Installation

pip install yinshield

For local release validation:

python -m unittest discover -s tests -v
python benchmarks/run_benchmark.py --dataset benchmarks/mini_realistic_dataset.json --mode placeholder --strategy strict --output benchmarks/mini_realistic_results.placeholder.json
python benchmarks/run_benchmark.py --dataset benchmarks/mini_realistic_dataset.json --mode alias --strategy strict --output benchmarks/mini_realistic_results.alias.json
node --check openclaw-plugin/src/index.js
python -m build

Release

Prepare the next release version:

python scripts/sync_release_version.py 0.1.0
python scripts/check_version_consistency.py

Full release steps are documented in RELEASE.md.

Alpha release notes:

Quick Start For OpenClaw

pip install yinshield
python -m yinshield.install_openclaw
openclaw plugins install @serein-213/openclaw-yinshield
openclaw plugins enable openclaw-yinshield
yinshield serve

python -m yinshield.install_openclaw will:

  • generate an auth token
  • scaffold the OpenClaw plugin config
  • print the exact yinshield serve --auth-token ... command to run

Installed CLI alias:

yinshield-install-openclaw

Shell bootstrap for users who prefer a one-shot script:

bash scripts/setup-openclaw-yinshield.sh

If you later host this script, the curl-style entry can be:

curl -fsSL https://your-domain/setup-openclaw-yinshield.sh | bash

OpenClaw plugin config:

{
  "plugins": {
    "entries": {
      "openclaw-yinshield": {
        "enabled": true,
        "config": {
          "baseUrl": "http://127.0.0.1:27811",
          "mode": "placeholder",
          "authToken": "change-me"
        }
      }
    }
  }
}

Basic Usage

from yinshield import Shield, ShieldSession

shield = Shield(
    mode="placeholder",   # or "alias"
    strategy="balanced",  # loose | balanced | strict
)

session = ShieldSession()
raw_text = "收件人:张三,手机号13812345678,收货地址:北京市朝阳区建国路88号。"

masked_text, mapping = shield.mask(raw_text, session=session)
print(masked_text)

restored = shield.unmask(masked_text, session=session)
print(restored)

Session Persistence

from yinshield import Shield

shield = Shield(mode="alias", strategy="strict")
shield.mask("联系人:王小明,手机号13812345678。")
shield.save_session("yinshield-session.json")

another = Shield(mode="alias", strategy="strict")
another.load_session("yinshield-session.json")
masked, _ = another.mask("请再次联系王小明,手机号13812345678。")

Local HTTP Service

Start the bridge:

yinshield serve

Default bind:

  • host: 127.0.0.1
  • port: 27811

Custom bind:

yinshield serve --host 127.0.0.1 --port 27811 --mode placeholder --strategy balanced --auth-token change-me

HTTP API:

POST /health

{}

POST /mask

{
  "text": "我叫张三,手机号13812345678",
  "mode": "placeholder",
  "session_id": "chat-1"
}

POST /unmask

{
  "text": "我叫<PERSON_1>,手机号<PHONE_1>",
  "mapping": {
    "<PERSON_1>": "张三",
    "<PHONE_1>": "13812345678"
  }
}

POST /messages/mask

{
  "messages": [
    { "role": "user", "content": "我叫张三,手机号13812345678" },
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "订单号20240324ABC123" }
      ]
    }
  ],
  "mode": "placeholder",
  "session_id": "chat-1"
}

Notes:

  • HTTP service is now stateless by default.
  • To reuse aliases/placeholders across turns, pass session_id.
  • If --auth-token is omitted, yinshield serve generates a temporary token and prints it.
  • To protect the local service, send Authorization: Bearer <token>.

OpenAI-Compatible Wrapper

from yinshield import ShieldedOpenAI

client = ShieldedOpenAI(
    api_key="YOUR_OPENAI_API_KEY",
    base_url="https://api.openai.com/v1",  # DeepSeek / OpenAI-compatible providers also work
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "我叫张三,手机号是13812345678"}
    ],
)

print(response.choices[0].message.content)
# 请求发送前自动脱敏,返回内容自动还原

Current wrapper coverage:

  • client.chat.completions.create(...)
  • client.chat.completions.create(..., stream=True)
  • client.responses.create(...)
  • client.responses.create(..., stream=True)
  • await async_client.chat.completions.create(...)
  • await async_client.responses.create(...)

Async Wrapper

from yinshield import ShieldedAsyncOpenAI

client = ShieldedAsyncOpenAI(api_key="YOUR_OPENAI_API_KEY")

response = await client.responses.create(
    model="gpt-4.1-mini",
    input="我叫张三,手机号13812345678",
)

print(response.output_text)

CLI

python -m yinshield --mode alias --strategy strict --session-file .yinshield.json \
  "收件人:张三,手机号13812345678,订单号20240324ABC123"

Run local service:

yinshield serve --session-file .yinshield-http-session.json

OpenClaw Installer

python -m yinshield.install_openclaw

Equivalent installed command:

yinshield-install-openclaw

Preview without writing files:

python -m yinshield.install_openclaw --print-only

Benchmark

Local benchmark script:

python benchmarks/run_benchmark.py --mode placeholder --strategy strict
python benchmarks/run_benchmark.py --mode alias --strategy strict
python benchmarks/run_benchmark.py --dataset benchmarks/mini_realistic_dataset.json --mode placeholder --strategy strict --output benchmarks/mini_realistic_results.placeholder.json
python benchmarks/run_benchmark.py --dataset benchmarks/mini_realistic_dataset.json --mode alias --strategy strict --output benchmarks/mini_realistic_results.alias.json

Current sample-set results:

  • placeholder + strict: precision=1.0 recall=1.0 false_positive_rate=0.0 recovery_rate=1.0 semantic_proxy=0.3662
  • alias + strict: precision=1.0 recall=1.0 false_positive_rate=0.0 recovery_rate=1.0 semantic_proxy=0.8182

Mini realistic-set results:

  • placeholder + strict: precision=0.9765 recall=0.9765 false_positive_rate=0.0645 recovery_rate=1.0 semantic_proxy=0.321
  • alias + strict: precision=0.954 recall=0.9765 false_positive_rate=0.129 recovery_rate=0.9032 semantic_proxy=0.75

The current sample set includes:

  • 中文身份与业务编号
  • 英文姓名、US phone、SSN、DOB、EIN、MRN、tracking number
  • 中英混合姓名与地址
  • 英文地址 Apt/Unit/Suite 变体
  • 负样例误伤检查

The mini realistic set adds:

  • 30 条更接近真实分布的小评测样本
  • 中文客服/金融/医疗/物流
  • 英文客户资料/合规/医疗/物流
  • 中英混合文本
  • 更严格的负样例和恢复率检查

semantic_proxy is only a local format-preservation heuristic, not a downstream LLM task benchmark.

Coverage Audit

当前规则覆盖度更接近“中英业务文本的高频显式字段脱敏 + 弱语义上下文识别”,不是通用语义 NER。

已支持:

  • 基础身份信息:中文姓名、英文姓名、手机号、US phone、身份证、SSN、生日、DOB、邮箱、微信号
  • 地址与位置:中文住址变体、英文街道地址、Apt/Unit/Suite 类英文地址
  • 企业与金融:公司名称、统一社会信用代码、税号、EIN、银行卡、银行账号、开户行
  • 交通与设备:车牌、护照、VIN、IPv4 地址
  • 医疗与业务编号:病历号、MRN、订单号、快递单号、tracking number、客户号、会员号、合同号

部分支持:

  • 中文姓名:对“我叫/联系人/收件人/患者”等上下文较强,对自然叙述句中的姓名识别仍有限
  • 中文地址:对“省市区路号”类格式较强,对口语化、园区/楼宇简称、缺少行政区前缀的短地址仍有限
  • 英文姓名与公司名:对显式字段和部分自然句式较稳,但复杂长句、缩写、跨句引用仍有限
  • alias 模式:在更真实的英文公司名和英文地址场景下,恢复率和误伤率仍弱于 placeholder
  • 企业信息:公司名称和统一社会信用代码/EIN 较稳,但法人、开户名、营业执照号等尚未覆盖

未支持或仍较弱:

  • MAC/GPS 坐标/精确地理位置
  • 发票号、设备序列号、组织机构代码、车牌以外更多车辆字段
  • 真正的语义实体识别、实体消歧、弱上下文推断

Next

  • 英文实体支持
  • OpenClaw 自动拉起本地服务
  • 更稳的上下文识别和实体边界
  • Anthropic / LiteLLM / LangChain 接入
  • 更真实的下游任务语义评测

License

Apache-2.0 License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yinshield-0.1.0.tar.gz (41.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yinshield-0.1.0-py3-none-any.whl (41.5 kB view details)

Uploaded Python 3

File details

Details for the file yinshield-0.1.0.tar.gz.

File metadata

  • Download URL: yinshield-0.1.0.tar.gz
  • Upload date:
  • Size: 41.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for yinshield-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4c41a4885564d3ecfeb4f33724a4338cc3329d9970807eb23bc4612d13194b43
MD5 3b163737636542d14f9bffa58d059bbe
BLAKE2b-256 16e23965badd354e9dec7d4fed9c9bd2e726f9fd58500c8966f2a0055faab9c3

See more details on using hashes here.

File details

Details for the file yinshield-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: yinshield-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 41.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for yinshield-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7bea2fcf01b10369637d70f632ba9fe8f55b6abe40714a33c111ec56efc6b118
MD5 e20221f195ffb9a40f4b36475512b2bb
BLAKE2b-256 415ac8a240b2f8c79e7e676791d1a4e04fee4bdac9a39943b257c881a8297c85

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page