A package for processing complex text with mixed Chinese and English characters
Project description
Complex Text Tools
一个用于处理包含中英文混合字符的复杂文本的Python包,能够移除多余空格并根据特定规则计算文本长度。
功能特性
- 移除中文字符之间的多余空格
- 移除中英文字符之间的多余空格
- 正确处理标点符号周围的间距
- 根据特定规则计算文本长度(中文字符、英文单词、数字、等式等)
- 修复中文文本中的标点符号(将英文标点转换为中文标点)
- 高效处理混合语言文本
安装
pip install complex-text-tools
使用方法
移除多余空格
from complex_text_tools import remove_extra_spaces
text = "这 是 中文 测试 文本 , mixed English text here , 还 有 symbols : ; ! "
clean_text = remove_extra_spaces(text)
print(clean_text)
# 输出: "这是中文测试文本,mixed English text here,还有 symbols:;!"
计算有效文本长度
from complex_text_tools import count_eff_len
text = "这是一段包含 English words 和 123.45 数字的 mixed 文本"
result = count_eff_len(text)
print(result)
# 输出:15
修复标点符号
from complex_text_tools import fix_punctuation
text = "这是中文文本,但使用了英文标点.这看起来不太自然,对吗?"
fixed_text = fix_punctuation(text)
print(fixed_text)
# 输出: "这是中文文本,但使用了中文标点。这看起来不太自然,对吗?"
许可证
该项目基于 MIT 许可证 - 详情请见 LICENSE 文件。
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file complex_text_tools-0.2.4.tar.gz.
File metadata
- Download URL: complex_text_tools-0.2.4.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a2fcb1ec77bd8b8c2dd117f2c2167ac84970798217ebb87a89b6835d93f9c1d
|
|
| MD5 |
a51f376f690f2e45c1aa3cd7722dda32
|
|
| BLAKE2b-256 |
552e8fffd82df359667d0bcc0172ecfed678609c26816e4240c0193734798257
|
File details
Details for the file complex_text_tools-0.2.4-py3-none-any.whl.
File metadata
- Download URL: complex_text_tools-0.2.4-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8f8278aec99282cadbffcca77e3901186b0d3beb80c189f59f9c5c8a6624c5e
|
|
| MD5 |
124aa16430d7377f3afeca78f8e340a4
|
|
| BLAKE2b-256 |
c85e26b447d49d17c14c81045d685a716958f4adac47d446910017eed09884c5
|