Skip to main content

基于DFA算法的敏感词过滤模块

Project description

DFAFilter 敏感词过滤模块

功能简介

基于DFA(Deterministic Finite Automaton)算法实现的高效敏感词过滤系统,支持:

  • 实时敏感词检测
  • 自动替换敏感词
  • 动态更新词库
  • 中英文混合过滤

快速开始

1. 配置敏感词库

config.toml 文件中添加:

[DFAFilter]
# 初始敏感词库(每行一个词)
initial_words = """
敏感词1
敏感词2
敏感词3
"""

2. 安装模块

pip install ErisPulse-DFAFilter

或者从本地安装:

pip install -e .

3. 使用模块

from ErisPulse import sdk

# 获取DFA过滤器实例
dfa = sdk.DFAFilter

# 检测敏感词
has_sensitive = dfa.check("测试文本")

# 过滤文本
filtered_text = dfa.filter("包含敏感词的文本")

# 添加新词
dfa.add("新敏感词")

# 更新词库
dfa.update("新词1\n新词2\n新词3")

API文档

check(text: str) -> bool

功能:检查文本是否包含敏感词

参数:

  • text: 待检测文本

返回:

  • True: 包含敏感词
  • False: 不包含敏感词

filter(text: str, replace_char: str = '*') -> str

功能:过滤文本中的敏感词(替换为指定字符)

参数:

  • text: 原始文本
  • replace_char: 替换字符,默认为 '*'

返回:

  • 过滤后的文本

add(word: str) -> bool

功能:添加单个敏感词

参数:

  • word: 要添加的敏感词

返回:

  • True: 添加成功
  • False: 词库已存在

remove(word: str) -> bool

功能:删除单个敏感词

参数:

  • word: 要删除的敏感词

返回:

  • True: 删除成功
  • False: 词不存在

list() -> list

功能:列出所有敏感词

返回:

  • 敏感词列表

update(words_str: str) -> bool

功能:完全覆盖更新敏感词库(会清空原有词库)

参数:

  • words_str: 多行敏感词字符串(每行一个词)

返回:

  • True: 更新成功
  • False: 更新失败

clear() -> bool

功能:清空敏感词库(保留初始配置)

返回:

  • True: 清空成功
  • False: 清空失败

示例代码

基础使用

from ErisPulse import sdk

dfa = sdk.DFAFilter

# 检查文本是否包含敏感词
text = "这是一段包含敏感词1和敏感词2的测试文本"
if dfa.check(text):
    filtered = dfa.filter(text)  # 过滤敏感词(默认替换为*)
    print(f"过滤结果: {filtered}")

动态添加新敏感词

# 添加单个敏感词
dfa.add("新敏感词")
print(f"添加后检测结果: {dfa.check('包含新敏感词的文本')}")

批量更新词库

# 完全更新词库(会清空原有词库)
new_words = "词A\n词B\n词C"
dfa.update(new_words)
print(f"当前词库: {dfa.list()}")

删除敏感词

# 删除指定敏感词
dfa.remove("敏感词1")
print(f"删除后检测结果: {dfa.check('包含敏感词1的文本')}")

技术细节

数据存储

  • 初始配置:存储在 config.toml[DFAFilter.initial_words]
  • 动态存储:使用 sdk.storage 持久化到 SQLite 数据库
  • 加载时自动合并初始配置和动态存储

懒加载机制

模块采用懒加载策略,只在首次访问时初始化,提高启动速度。

持久化

所有动态添加的敏感词都会自动持久化到数据库,重启后依然保留。

配置说明

config.toml 配置

[DFAFilter]
# 初始敏感词库(每行一个词)
initial_words = """
敏感词1
敏感词2
敏感词3
"""

注意:clear() 方法只会清空动态存储的敏感词,不会删除配置文件中的初始词库。

性能特点

  • 基于DFA算法,时间复杂度为O(n),n为文本长度
  • 支持中英文混合过滤
  • 自动过滤特殊字符,只检测中文、数字和字母
  • 动态词库自动持久化

版本历史

2.0.0

  • 移植到 ErisPulse2+ 架构
  • 继承 BaseModule 基类
  • 使用 pyproject.toml 标准配置
  • 迁移到 sdk.storage 持久化存储
  • 实现完整的生命周期管理

1.1.1

  • 初始版本

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

erispulse_dfafilter-2.0.0.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

erispulse_dfafilter-2.0.0-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file erispulse_dfafilter-2.0.0.tar.gz.

File metadata

  • Download URL: erispulse_dfafilter-2.0.0.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for erispulse_dfafilter-2.0.0.tar.gz
Algorithm Hash digest
SHA256 3d89fb93320403f012e1a32a27231fec0acb536194faacaffb405e1d44fe7a53
MD5 fc9aa0948fddb204fa3b04be0ea5f624
BLAKE2b-256 996ffe93ca39f95a36a9d796e15b256e4a7cc8fe28ca3364c467415c43cd96f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for erispulse_dfafilter-2.0.0.tar.gz:

Publisher: python-publish.yml on wsu2059q/ErisPulse-DFAFilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file erispulse_dfafilter-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for erispulse_dfafilter-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7deff0cb81bd6ad11a26ae42f9ae2da16366a9b8f38f15ce2a8c7441a1ebf62f
MD5 240a7fd4d38246891ce2afb0ea2cd094
BLAKE2b-256 095895ee50ca043a3dfbb3fd72032f2c2fdb5a2859266c531e11e71456262f31

See more details on using hashes here.

Provenance

The following attestation bundles were made for erispulse_dfafilter-2.0.0-py3-none-any.whl:

Publisher: python-publish.yml on wsu2059q/ErisPulse-DFAFilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page