Skip to main content

中文字符特征提取工具,可以从中文汉字中提取出拼音、声调、拆分偏旁部首、四角编码,并且可以转化为tensor作为模型的输入。

Project description

char_featurizer

char_featurizer 是一个汉字字符特征提取工具,他可以提取汉字的字音(包括声母、韵母、声调)、字形(偏旁、部首)、四角符号等信息。 同时可以将这些特征信息转换为tensor,作为模型的输入特征。这个项目是在安德森大佬的 字符提取工具 的基础上做了优化整合

目前 char_featurizer 支持的功能有:

1、字形特征提取

2、字音特征提取

3、四角编码提取

4、tensor转换

二、安装使用

1、安装

pip install char_featurizer

2、使用

1、字符特征提取

from char_featurizer import Featurizer

featurizer = Featurizer()

data = '明天去你家玩'

result = featurizer.featurize(data)
print(result)

2、作为特征输入模型

3、相关资源

1、汉字四角号码在线查询工具

三、Update News

  • 2020.5.4 完成V1版本

四、Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

char_featurizer-1.0.0-py3-none-any.whl (978.2 kB view details)

Uploaded Python 3

File details

Details for the file char_featurizer-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: char_featurizer-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 978.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for char_featurizer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4a973271cd1270998b01928a99e0c1d10dca030d4fcf3635da6f5114d0ff4be7
MD5 c3b977510fa011ed748205806941a91a
BLAKE2b-256 9a8727a3e0e89b719c525189d610a983216e381490769a8b24877cdaa09ce6bf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page