基于DFA算法的敏感词过滤模块
Project description
DFAFilter 敏感词过滤模块
功能简介
基于DFA(Deterministic Finite Automaton)算法实现的高效敏感词过滤系统,支持:
- 实时敏感词检测
- 自动替换敏感词
- 动态更新词库
- 中英文混合过滤
快速开始
1. 配置敏感词库
在 config.toml 文件中添加:
[DFAFilter]
# 初始敏感词库(每行一个词)
initial_words = """
敏感词1
敏感词2
敏感词3
"""
2. 安装模块
pip install ErisPulse-DFAFilter
或者从本地安装:
pip install -e .
3. 使用模块
from ErisPulse import sdk
# 获取DFA过滤器实例
dfa = sdk.DFAFilter
# 检测敏感词
has_sensitive = dfa.check("测试文本")
# 过滤文本
filtered_text = dfa.filter("包含敏感词的文本")
# 添加新词
dfa.add("新敏感词")
# 更新词库
dfa.update("新词1\n新词2\n新词3")
API文档
check(text: str) -> bool
功能:检查文本是否包含敏感词
参数:
- text: 待检测文本
返回:
- True: 包含敏感词
- False: 不包含敏感词
filter(text: str, replace_char: str = '*') -> str
功能:过滤文本中的敏感词(替换为指定字符)
参数:
- text: 原始文本
- replace_char: 替换字符,默认为 '*'
返回:
- 过滤后的文本
add(word: str) -> bool
功能:添加单个敏感词
参数:
- word: 要添加的敏感词
返回:
- True: 添加成功
- False: 词库已存在
remove(word: str) -> bool
功能:删除单个敏感词
参数:
- word: 要删除的敏感词
返回:
- True: 删除成功
- False: 词不存在
list() -> list
功能:列出所有敏感词
返回:
- 敏感词列表
update(words_str: str) -> bool
功能:完全覆盖更新敏感词库(会清空原有词库)
参数:
- words_str: 多行敏感词字符串(每行一个词)
返回:
- True: 更新成功
- False: 更新失败
clear() -> bool
功能:清空敏感词库(保留初始配置)
返回:
- True: 清空成功
- False: 清空失败
示例代码
基础使用
from ErisPulse import sdk
dfa = sdk.DFAFilter
# 检查文本是否包含敏感词
text = "这是一段包含敏感词1和敏感词2的测试文本"
if dfa.check(text):
filtered = dfa.filter(text) # 过滤敏感词(默认替换为*)
print(f"过滤结果: {filtered}")
动态添加新敏感词
# 添加单个敏感词
dfa.add("新敏感词")
print(f"添加后检测结果: {dfa.check('包含新敏感词的文本')}")
批量更新词库
# 完全更新词库(会清空原有词库)
new_words = "词A\n词B\n词C"
dfa.update(new_words)
print(f"当前词库: {dfa.list()}")
删除敏感词
# 删除指定敏感词
dfa.remove("敏感词1")
print(f"删除后检测结果: {dfa.check('包含敏感词1的文本')}")
技术细节
数据存储
- 初始配置:存储在
config.toml的[DFAFilter.initial_words]中 - 动态存储:使用
sdk.storage持久化到 SQLite 数据库 - 加载时自动合并初始配置和动态存储
懒加载机制
模块采用懒加载策略,只在首次访问时初始化,提高启动速度。
持久化
所有动态添加的敏感词都会自动持久化到数据库,重启后依然保留。
配置说明
config.toml 配置
[DFAFilter]
# 初始敏感词库(每行一个词)
initial_words = """
敏感词1
敏感词2
敏感词3
"""
注意:clear() 方法只会清空动态存储的敏感词,不会删除配置文件中的初始词库。
性能特点
- 基于DFA算法,时间复杂度为O(n),n为文本长度
- 支持中英文混合过滤
- 自动过滤特殊字符,只检测中文、数字和字母
- 动态词库自动持久化
版本历史
2.0.0
- 移植到 ErisPulse2+ 架构
- 继承 BaseModule 基类
- 使用 pyproject.toml 标准配置
- 迁移到 sdk.storage 持久化存储
- 实现完整的生命周期管理
1.1.1
- 初始版本
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file erispulse_dfafilter-2.0.0.tar.gz.
File metadata
- Download URL: erispulse_dfafilter-2.0.0.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d89fb93320403f012e1a32a27231fec0acb536194faacaffb405e1d44fe7a53
|
|
| MD5 |
fc9aa0948fddb204fa3b04be0ea5f624
|
|
| BLAKE2b-256 |
996ffe93ca39f95a36a9d796e15b256e4a7cc8fe28ca3364c467415c43cd96f6
|
Provenance
The following attestation bundles were made for erispulse_dfafilter-2.0.0.tar.gz:
Publisher:
python-publish.yml on wsu2059q/ErisPulse-DFAFilter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
erispulse_dfafilter-2.0.0.tar.gz -
Subject digest:
3d89fb93320403f012e1a32a27231fec0acb536194faacaffb405e1d44fe7a53 - Sigstore transparency entry: 991896168
- Sigstore integration time:
-
Permalink:
wsu2059q/ErisPulse-DFAFilter@3a5d4203225d8250bfa718da9e30233e37502274 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/wsu2059q
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@3a5d4203225d8250bfa718da9e30233e37502274 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file erispulse_dfafilter-2.0.0-py3-none-any.whl.
File metadata
- Download URL: erispulse_dfafilter-2.0.0-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7deff0cb81bd6ad11a26ae42f9ae2da16366a9b8f38f15ce2a8c7441a1ebf62f
|
|
| MD5 |
240a7fd4d38246891ce2afb0ea2cd094
|
|
| BLAKE2b-256 |
095895ee50ca043a3dfbb3fd72032f2c2fdb5a2859266c531e11e71456262f31
|
Provenance
The following attestation bundles were made for erispulse_dfafilter-2.0.0-py3-none-any.whl:
Publisher:
python-publish.yml on wsu2059q/ErisPulse-DFAFilter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
erispulse_dfafilter-2.0.0-py3-none-any.whl -
Subject digest:
7deff0cb81bd6ad11a26ae42f9ae2da16366a9b8f38f15ce2a8c7441a1ebf62f - Sigstore transparency entry: 991896188
- Sigstore integration time:
-
Permalink:
wsu2059q/ErisPulse-DFAFilter@3a5d4203225d8250bfa718da9e30233e37502274 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/wsu2059q
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@3a5d4203225d8250bfa718da9e30233e37502274 -
Trigger Event:
workflow_dispatch
-
Statement type: