A package for processing complex text with mixed Chinese and English characters
Project description
Complex Text Tools
一个用于处理包含中英文混合字符的复杂文本的Python包,能够规范化空格、修复标点符号并根据特定规则计算文本长度。
功能特性
- 规范化空格:自动在中英文/数字之间添加空格,移除中文之间的空格
- 修复标点符号:智能转换中英文标点,处理混合括号
- 计算有效长度:参考 Word 字数统计规则,支持中文日期格式
安装
pip install complex-text-tools
使用方法
规范化空格
from complex_text_tools import remove_extra_spaces
# 自动在中英文之间添加空格
text1 = "这是中文English文本"
print(remove_extra_spaces(text1))
# 输出: "这是中文 English 文本"
# 自动在中文和数字之间添加空格
text2 = "数量是100个"
print(remove_extra_spaces(text2))
# 输出: "数量是 100 个"
# 移除中文之间的空格
text3 = "这 是 中 文"
print(remove_extra_spaces(text3))
# 输出: "这是中文"
计算有效文本长度
from complex_text_tools import count_eff_len
# 中文日期格式统计(参考 Word 规则)
text = "2024年1月15日"
print(count_eff_len(text))
# 输出: 6 (2024=1, 年=1, 1=1, 月=1, 15=1, 日=1)
# 混合文本统计
text2 = "这是一段包含 English 和 123.45 的文本"
print(count_eff_len(text2))
# 输出: 13
修复标点符号
from complex_text_tools import fix_punctuation
# 智能转换圆括号(根据上下文判断)
text1 = "(3)中文内容"
print(fix_punctuation(text1))
# 输出: "(3)中文内容"
text2 = "(3)english"
print(fix_punctuation(text2))
# 输出: "(3)english"
# 修复混合方括号
text3 = "[测试】内容"
print(fix_punctuation(text3))
# 输出: "【测试】内容"
许可证
该项目基于 MIT 许可证 - 详情请见 LICENSE 文件。
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file complex_text_tools-0.3.0.tar.gz.
File metadata
- Download URL: complex_text_tools-0.3.0.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf116fa2f780f463727d6c67d7d1fa575061698c7c896a51a1b8442835d19ada
|
|
| MD5 |
8748ac38e0828275f9c09cc0e54e105c
|
|
| BLAKE2b-256 |
f0261aa0f358d56e8531c27681c99f0606c309588f7f9bb1cb37c0613bbe31ca
|
File details
Details for the file complex_text_tools-0.3.0-py3-none-any.whl.
File metadata
- Download URL: complex_text_tools-0.3.0-py3-none-any.whl
- Upload date:
- Size: 6.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0686353cc6d408aa6628d82b16ff1ffab8aacab446c374b953c3cd5af1dd0b73
|
|
| MD5 |
bc892f2d7858e0bc66bc6fffbfd20c25
|
|
| BLAKE2b-256 |
12399f69cbd7800281033582d90f6bb3de66b33bd683debb8ddce9b556fe00df
|