text preprocess.
Project description
Proces
🐨 文本预处理。
1 安装
⚠️ 注意:
- 本地安装仅支持 Python 的 3.6 以上版本;
- 尽可能使用
proces
的最新版本。
使用 pip 安装
pip install proces -U
从代码库安装
git clone https://github.com/Ailln/proces.git
cd proces && python setup.py install
2 使用
from proces import preprocess
# 默认会按照顺序执行,处理空白字符、大写转小写、繁体转简体、全角转半角
result = preprocess("Today, 你 幹 什 麼 !")
# result: today,你干什么!
# 配置 pipeline,比如只去除空白字符
result = preprocess("Today, 你 幹 什 麼 !", pipelines=["handle_blank_character"])
# result: Today,你幹什麼!
# 单独使用子方法
from proces import filter_unusual_characters, filter_
from proces import handle_blank_character
from proces import uppercase_to_lowercase
from proces import traditional_to_simplified
from proces import full_angle_to_half_angle
from proces import handle_substitute
# 删除不常见字符
result = filter_unusual_characters("【你是个恶魔😈啊�】")
# result: 【你是个恶魔啊】
# 也可以使用短方法 filter_
result = filter_("【你是个恶魔😈啊�】")
# result: 【你是个恶魔啊】
# 处理空白字符
result = handle_blank_character("空 白 字 符")
# result: 空白字符
result = handle_blank_character("空 白 字 符", ",")
# result: 空,白,字,符
# 大写转小写
result = uppercase_to_lowercase("UP to low")
# result: up to low
# 繁体转简体
result = traditional_to_simplified("我幹什麼不干你事")
# result: 我干什么不干你事
# 全角转半角
result = full_angle_to_half_angle("你好!")
# result: 你好!
# 替换一些字符
result = handle_substitute("你好!/:-", r"/:-", "表情")
# result: 你好!表情
## 敏感信息过滤
from proces import mask_phone, mask_address
# 过滤手机号
result = mask_phone("手机号 13397238231")
# result: 手机号 133********
# 过滤地址
result = mask_address("我在浙江杭州余杭区")
# result: 我在浙江杭州***
3 TODO
- add get all methods of preprocess
- 装饰器
4 许可
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
proces-0.1.7.tar.gz
(31.2 kB
view details)
Built Distribution
proces-0.1.7-py3-none-any.whl
(137.7 kB
view details)
File details
Details for the file proces-0.1.7.tar.gz
.
File metadata
- Download URL: proces-0.1.7.tar.gz
- Upload date:
- Size: 31.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70a05d9e973dd685f7a9092c58be695a8181a411d63796c213232fd3fdc43775 |
|
MD5 | f67ef78a899e4d55828fa09a63752ef1 |
|
BLAKE2b-256 | 2c3d4159b57736ced0fd22553226df20a985ef7655519c80ffcb8a9fb49ebeee |
File details
Details for the file proces-0.1.7-py3-none-any.whl
.
File metadata
- Download URL: proces-0.1.7-py3-none-any.whl
- Upload date:
- Size: 137.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 308325bbc96877263f06e57e5e9c760c4b42cc722887ad60be6b18fc37d68762 |
|
MD5 | a1bf89c15906e1fb75c1dba894a07847 |
|
BLAKE2b-256 | 6f8806cc0c7d890ed8d7e16ef0e56880dea516a21643fb1f3a69a50f4cc6f716 |