A simple word filtering library for Python.
Project description
simple-word-filter
simple-word-filter 是一个轻量、可扩展的 Python 敏感词过滤库,内置多种匹配算法,可快速集成到文本审核或内容过滤场景中。
主要特性
- 多种匹配模式:内置
simple、regex、trie三种匹配器,按需取舍准确性与性能。 - 可扩展架构:通过装饰器即可注册自定义匹配器,满足特殊匹配策略。
- 统一 API:
contains、match_all、match_first、replace等方法在各匹配器间保持一致。 - 性能自测:提供
WordFilter.matcher_speed_test,快速评估不同匹配器的运行效率。 - 现代 Python:基于 3.10+ 类型注解,易读、易维护。
环境要求
- Python 3.10 及以上
安装
pip install simple-word-filter
快速上手
from simple_word_filter import WordFilter
blocked = ["敏感词", "违禁品", "badword"]
wf = WordFilter(blocked, mode="trie")
text = "这是一段包含敏感词的文本"
wf.contains(text)
# True
wf.match_all(text)
# [('敏感词', 4)]
wf.replace(text, repl_char="*")
# '这是一段包含***的文本'
选择匹配模式
| 模式 | 适用场景 | 特点 |
|---|---|---|
simple |
词库较小、实现最简洁 | 顺序扫描文本,易理解,性能中等 |
regex |
需要正则表达式能力 | 支持复杂模式匹配,灵活但构造成本较高 |
trie |
词库较大、追求性能 | 基于 Trie 树,查询效率高 |
可调用 BaseMatcher.available_matchers() 查看当前可用模式。
from simple_word_filter import BaseMatcher
print(BaseMatcher.available_matchers())
# ['simple', 'regex', 'trie']
自定义匹配器
from simple_word_filter import BaseMatcher
@BaseMatcher.matcher("suffix")
class SuffixMatcher(BaseMatcher):
def match_all(self, text: str):
matches = []
for word in self._word_list:
if text.endswith(word):
matches.append((word, len(text) - len(word)))
return matches
def match_first(self, text: str):
return self.match_all(text)[0] if self.match_all(text) else None
# 注册后即可像内置模式一样使用
性能快速评估
from simple_word_filter import WordFilter
best_filter = WordFilter.matcher_speed_test(
word_list=["foo", "bar", "baz"],
sample_words=["foo", "bar", "baz", "qux"],
)
print(best_filter.mode)
# 依据测试结果输出运行最快的模式
开发者指南
git clone https://github.com/Sparrived/simple-word-filter.git
cd simple-word-filter
uv sync --dev # 或使用 pip 安装开发依赖
运行测试:
pytest
发布流程
仓库已配置 GitHub Actions。向 master 推送包含 src/simple_word_filter/__init__.py 中 __version__ 变更的提交后,将自动:
- 构建发布包并上传到 GitHub Release(标签
v<version>)。 - 将同一制品上传到 PyPI。
也可在 GitHub 上手动触发 Upload Python Package workflow。
许可证
MIT License © Sparrived
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simple_word_filter-1.0.3.tar.gz.
File metadata
- Download URL: simple_word_filter-1.0.3.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
288bef09720ec8992961eca54718f55a94f09110f639f385769c38eb750e2859
|
|
| MD5 |
81cbfba9308e402268c5513af168b0fa
|
|
| BLAKE2b-256 |
8c13be9bb5ab806ad85a923317060d49ea803408356c71329da2cd62b8ddace2
|
Provenance
The following attestation bundles were made for simple_word_filter-1.0.3.tar.gz:
Publisher:
python-publish.yml on Sparrived/simple-word-filter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
simple_word_filter-1.0.3.tar.gz -
Subject digest:
288bef09720ec8992961eca54718f55a94f09110f639f385769c38eb750e2859 - Sigstore transparency entry: 641847660
- Sigstore integration time:
-
Permalink:
Sparrived/simple-word-filter@a3dc695911d63ce9bd31333f13a0d5ceac244006 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/Sparrived
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a3dc695911d63ce9bd31333f13a0d5ceac244006 -
Trigger Event:
push
-
Statement type:
File details
Details for the file simple_word_filter-1.0.3-py3-none-any.whl.
File metadata
- Download URL: simple_word_filter-1.0.3-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e07a90c934792d4a96dcf1453f2098024fba1433756b64270c5c6445a345733
|
|
| MD5 |
cdb72121b8a0f1a7e36db4bbf4becd17
|
|
| BLAKE2b-256 |
7d94e49b6969a3f7973392e4ed3f3098becf3eeefb1364260a5e70bded2634e9
|
Provenance
The following attestation bundles were made for simple_word_filter-1.0.3-py3-none-any.whl:
Publisher:
python-publish.yml on Sparrived/simple-word-filter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
simple_word_filter-1.0.3-py3-none-any.whl -
Subject digest:
2e07a90c934792d4a96dcf1453f2098024fba1433756b64270c5c6445a345733 - Sigstore transparency entry: 641847661
- Sigstore integration time:
-
Permalink:
Sparrived/simple-word-filter@a3dc695911d63ce9bd31333f13a0d5ceac244006 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/Sparrived
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a3dc695911d63ce9bd31333f13a0d5ceac244006 -
Trigger Event:
push
-
Statement type: