Skip to main content

A package for processing complex text with mixed Chinese and English characters

Project description

Complex Text Tools

PyPI version PyPI - Python Version PyPI - License

一个用于处理包含中英文混合字符的复杂文本的Python包,能够规范化空格、修复标点符号并根据特定规则计算文本长度。

功能特性

  • 规范化空格:自动在中英文/数字之间添加空格,移除中文之间的空格
  • 修复标点符号:智能转换中英文标点,处理混合括号
  • 计算有效长度:参考 Word 字数统计规则,支持中文日期格式

安装

pip install complex-text-tools

使用方法

规范化空格

from complex_text_tools import remove_extra_spaces

# 自动在中英文之间添加空格
text1 = "这是中文English文本"
print(remove_extra_spaces(text1))
# 输出: "这是中文 English 文本"

# 自动在中文和数字之间添加空格
text2 = "数量是100个"
print(remove_extra_spaces(text2))
# 输出: "数量是 100 个"

# 移除中文之间的空格
text3 = "这 是 中 文"
print(remove_extra_spaces(text3))
# 输出: "这是中文"

计算有效文本长度

from complex_text_tools import count_eff_len

# 中文日期格式统计(参考 Word 规则)
text = "2024年1月15日"
print(count_eff_len(text))
# 输出: 6 (2024=1, 年=1, 1=1, 月=1, 15=1, 日=1)

# 混合文本统计
text2 = "这是一段包含 English 和 123.45 的文本"
print(count_eff_len(text2))
# 输出: 13

修复标点符号

from complex_text_tools import fix_punctuation

# 智能转换圆括号(根据上下文判断)
text1 = "(3)中文内容"
print(fix_punctuation(text1))
# 输出: "(3)中文内容"

text2 = "(3)english"
print(fix_punctuation(text2))
# 输出: "(3)english"

# 修复混合方括号
text3 = "[测试】内容"
print(fix_punctuation(text3))
# 输出: "【测试】内容"

许可证

该项目基于 MIT 许可证 - 详情请见 LICENSE 文件。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

complex_text_tools-0.3.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

complex_text_tools-0.3.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file complex_text_tools-0.3.0.tar.gz.

File metadata

  • Download URL: complex_text_tools-0.3.0.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.10

File hashes

Hashes for complex_text_tools-0.3.0.tar.gz
Algorithm Hash digest
SHA256 bf116fa2f780f463727d6c67d7d1fa575061698c7c896a51a1b8442835d19ada
MD5 8748ac38e0828275f9c09cc0e54e105c
BLAKE2b-256 f0261aa0f358d56e8531c27681c99f0606c309588f7f9bb1cb37c0613bbe31ca

See more details on using hashes here.

File details

Details for the file complex_text_tools-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for complex_text_tools-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0686353cc6d408aa6628d82b16ff1ffab8aacab446c374b953c3cd5af1dd0b73
MD5 bc892f2d7858e0bc66bc6fffbfd20c25
BLAKE2b-256 12399f69cbd7800281033582d90f6bb3de66b33bd683debb8ddce9b556fe00df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page