Skip to main content

text preprocess.

Project description

Proces

Pypi MIT License stars

🐨 文本预处理。

1 安装

⚠️ 注意:

  1. 本地安装仅支持 Python 的 3.6 以上版本;
  2. 尽可能使用 proces 的最新版本。

使用 pip 安装

pip install proces -U

从代码库安装

git clone https://github.com/Ailln/proces.git
cd proces && python setup.py install

2 使用

from proces import preprocess

# 默认会按照顺序执行,删除空白字符、大写转小写、繁体转简体、全角转半角
result = preprocess("Today, 你 幹 什 麼 !")
# result: today,你干什么!

# 配置 pipeline,比如只去除空白字符
result = preprocess("Today, 你 幹 什 麼 !", pipelines=["delete_blank_character"])
# result: Today,你幹什麼!

# 单独使用子方法
from proces import delete_blank_character
from proces import uppercase_to_lowercase
from proces import traditional_to_simplified
from proces import full_angle_to_half_angle

# 删除空白字符
result = delete_blank_character("空 白 字 符")
# result: 空白字符

# 大写转小写
result = uppercase_to_lowercase("UP to low")
# result: up to low

# 繁体转简体
result = traditional_to_simplified("我幹什麼不干你事")
# result: 我干什么不干你事

# 全角转半角
result = full_angle_to_half_angle("你好!")
# result: 你好!

3 TODO

  • preprocess test

4 许可

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proces-0.1.1.tar.gz (4.2 kB view hashes)

Uploaded Source

Built Distribution

proces-0.1.1-py3-none-any.whl (5.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page