Skip to main content

Streaming keyword guard for LLM output.

Project description

LLM Stream Guard

Python 流式敏感词拦截 SDK。它不封装模型厂商 SDK,只接收任意 AsyncIterable[str] 文本流,在文本发给客户端前做确定性敏感词检测、跨 chunk 拦截和安全前缀输出。

License: MIT

安装

从 PyPI 安装:

pip install llm-stream-guard

从本地 wheel 安装:

uv pip install ./dist/llm_stream_guard-0.1.1-py3-none-any.whl

或:

python -m pip install ./dist/llm_stream_guard-0.1.1-py3-none-any.whl

词库

业务侧维护自己的词库文件,例如:

# block_words.txt
hydrangea
violet comet
forbidden nebula

格式规则:

  • 一行一个 block 词。
  • 空行会被忽略。
  • # 开头的行会被忽略。

基础用法

from llm_stream_guard import BlockedEvent, DeltaEvent, StreamGuard

guard = StreamGuard.from_file("block_words.txt")

async for event in guard.wrap(model_stream, on_block=cancel_model):
    if isinstance(event, DeltaEvent):
        yield event.text

    if isinstance(event, BlockedEvent):
        yield {"type": "blocked", "word": event.word}
        break

模型流接入

SDK 只要求模型输出是 AsyncIterable[str]

from collections.abc import AsyncIterable


async def model_stream() -> AsyncIterable[str]:
    async for chunk in native_model_sdk_stream:
        yield extract_text_delta(chunk)

然后交给 guard:

guard = StreamGuard.from_file("block_words.txt")

async for event in guard.wrap(model_stream(), on_block=cancel_model):
    ...

跨 Chunk 拦截

如果词库中有:

hydrangea

模型输出被拆成:

chunk1: "hello hydr"
chunk2: "angea world"

SDK 会先输出安全前缀:

hello 

并在第二个 chunk 命中后返回 BlockedEvent,不会把 hydrangea 泄漏给客户端。

目录词库

可以读取目录下所有 .txt 文件:

guard = StreamGuard.from_directory("/path/to/Vocabulary", min_word_length=2)

如果使用 konsheng/Sensitive-lexicon

git clone https://github.com/konsheng/Sensitive-lexicon.git /tmp/Sensitive-lexicon
guard = StreamGuard.from_directory(
    "/tmp/Sensitive-lexicon/Vocabulary",
    min_word_length=2,
)

注意:通用敏感词库容易包含短词和高误杀词。生产环境更建议筛选后写入自己的 block_words.txt

归一化

如果需要匹配大小写、全半角、符号插入等变体:

guard = StreamGuard.from_file(
    "block_words.txt",
    drop_separators=True,
)

例如词库有:

violetcomet

可以命中:

VIOLET-comet

本地验证

uv run pytest python_tests
uv run python examples/basic_usage.py
uv run python examples/agno_terminal_chat.py
uv run python scripts/test_wheel_package.py

scripts/test_wheel_package.py 会重新构建 wheel,在临时 venv 里安装,并运行安装后的调用测试。

打包

uv build

产物:

dist/llm_stream_guard-0.1.1-py3-none-any.whl
dist/llm_stream_guard-0.1.1.tar.gz

发布到 PyPI

先创建 PyPI API token,然后发布:

uv publish --token "$PYPI_TOKEN"

也可以用环境变量:

export UV_PUBLISH_TOKEN="pypi-..."
uv publish

建议先发 TestPyPI 验证:

uv publish \
  --publish-url https://test.pypi.org/legacy/ \
  --token "$TEST_PYPI_TOKEN"

正式发布后,同事即可:

pip install llm-stream-guard

注意:

  • PyPI 的包名必须唯一;我检查时 llm-stream-guard 当前返回 404,表示暂未被占用。
  • 同一个版本号不能重复上传。修复后需要提升版本号,例如 0.1.1
  • wheel 不包含你的业务词库,同事需要自己提供 block_words.txt 或其他词库路径。

发布到私有源

如果公司有私有 pip 源:

uv publish \
  --publish-url "https://your-private-index.example.com/legacy/" \
  --token "$PRIVATE_PYPI_TOKEN"

同事安装:

pip install \
  --index-url "https://your-private-index.example.com/simple/" \
  llm-stream-guard

Anthropic-Compatible 体验脚本

当前仓库带一个真实模型体验脚本:

uv run python scripts/chat_anthropic_compatible.py

环境变量:

ANTHROPIC_COMPATIBLE_API_KEY=
ANTHROPIC_COMPATIBLE_BASE_URL=https://open.bigmodel.cn/api/anthropic
ANTHROPIC_COMPATIBLE_MODEL=glm-5.1

这个脚本使用 Anthropic Messages 兼容协议:POST /v1/messagesanthropic-version header、content_block_delta SSE。SDK 核心不绑定 Anthropic、OpenAI 或 BigModel。

Agno 终端对话示例

如果要直接使用当前项目 .env 中的 BIGMODEL_* 配置:

uv sync --group dev
uv run python examples/agno_current_env_chat.py

该脚本读取:

BIGMODEL_API_KEY=
BIGMODEL_BASE_URL=https://open.bigmodel.cn/api/anthropic
BIGMODEL_MODEL=glm-5.1

如果业务使用 Agno 的 agent.arun(..., stream=True),可以运行:

uv sync --group dev
uv run python examples/agno_terminal_chat.py

环境变量:

AGNO_OPENAI_API_KEY=
AGNO_OPENAI_BASE_URL=
AGNO_MODEL_ID=gpt-4o-mini

如果没有设置 AGNO_OPENAI_API_KEY,示例会 fallback 到 OPENAI_API_KEY。 该示例依赖 agnoopenai,它们只放在 dev 依赖里,不会进入 SDK 的运行时依赖。

核心适配逻辑是把 Agno 事件转成文本流:

async def agno_text_stream(agent, prompt):
    async for event in agent.arun(prompt, stream=True):
        text = getattr(event, "content", None)
        if isinstance(text, str) and text:
            yield text

然后交给:

guard.wrap(agno_text_stream(agent, prompt), on_block=cancel_agent_stream)

项目结构

llm_stream_guard/              SDK 源码
python_tests/                  行为测试
examples/basic_usage.py        最小调用示例
examples/agno_current_env_chat.py 当前 .env 的 Agno 对话示例
examples/agno_terminal_chat.py Agno 终端对话示例
scripts/test_wheel_package.py  wheel 安装验证
scripts/chat_anthropic_compatible.py
block_words.txt                本地测试词库

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_stream_guard-0.1.1.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_stream_guard-0.1.1-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file llm_stream_guard-0.1.1.tar.gz.

File metadata

  • Download URL: llm_stream_guard-0.1.1.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for llm_stream_guard-0.1.1.tar.gz
Algorithm Hash digest
SHA256 053537bfc771b94303dbddf725ad58ebf2e3c3077c1deeb455675d7037b9879d
MD5 c77077d1aa1ebe80fd95a43974c7a214
BLAKE2b-256 d02a0c6df5d764ae2f985305a9863d7ff5fb0207c6c6e69b78e7bfd5053ae621

See more details on using hashes here.

File details

Details for the file llm_stream_guard-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llm_stream_guard-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for llm_stream_guard-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9eca6a3984cd254f3273e085af8e0685a463d5d28c8f4489b927d50a4b371f24
MD5 6a1010dbfa3f347e1b9d809e15da990b
BLAKE2b-256 256cba022e1fb8a075764c08602379694e537e3f09f6be08494a16d7cc5342f1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page