Skip to main content

基于 g2pW 使用 torch 推理的 pypinyin

Project description

hanpinyin

基于 g2pW 提升 pypinyin 的准确性。

特点:

  • 支持使用 torch 推理
  • 可以通过训练模型的方式提升拼音准确性。
  • 功能和使用习惯与 pypinyin 基本保持一致,支持多种拼音风格。

使用

安装依赖

  1. 安装 PyTorch

  2. 下载并解压 G2PWModel:

    wget https://storage.googleapis.com/esun-ai/g2pW/G2PWModel-v2-onnx.zip
    unzip G2PWModel-v2-onnx.zip
    
  3. 安装 git-lfs

  4. 下载 bert-base-chinese:

    git lfs install
    git clone https://huggingface.co/bert-base-chinese
    
  5. 安装本项目:

    pip install hanpinyin
    

使用示例

>>> from pypinyin import Style
>>> from hanpinyin import G2PWPinyin

# 需要将 model_dir 和 model_source 的值指向下载的模型数据目录
>>> g2pw = G2PWPinyin(model_dir='G2PWModel/',
                      model_source='bert-base-chinese/',
                      v_to_u=False, neutral_tone_with_five=True)
>>> han = '然而,他红了20年以后,他竟退出了大家的视线。'

# def lazy_pinyin(self, hans, style=Style.NORMAL, errors='default', strict=True, **kwargs)
# 通过 lazy_pinyin 方法获取拼音数据,各个参数的含义和作用跟 pypinyin 中是一样的,
# v_to_u 和 neutral_tone_with_five 参数只能在初始化 G2PWPinyin 时指定。

>>> g2pw.lazy_pinyin(han)
['ran', 'er', ',', 'ta', 'hong', 'le', '20', 'nian', 'yi', 'hou', ',', 'ta', 'jing', 'tui', 'chu', 'le', 'da', 'jia', 'de', 'shi', 'xian', '。']

>>> g2pw.lazy_pinyin(han, style=Style.TONE)
['rán', 'ér', ',', 'tā', 'hóng', 'le', '20', 'nián', 'yǐ', 'hòu', ',', 'tā', 'jìng', 'tuì', 'chū', 'le', 'dà', 'jiā', 'de', 'shì', 'xiàn', '。']

>>> g2pw.lazy_pinyin(han, style=Style.TONE3)
['ran2', 'er2', ',', 'ta1', 'hong2', 'le5', '20', 'nian2', 'yi3', 'hou4', ',', 'ta1', 'jing4', 'tui4', 'chu1', 'le5', 'da4', 'jia1', 'de5', 'shi4', 'xian4', '。']

离线使用

默认情况下,即便使用了离线的模型数据,程序使用的 transformers 模块仍旧会从 huggingface.co 下载部分模型元数据。 可以通过设置环境变量 TRANSFORMERS_OFFLINE=1 以及环境变量 HF_DATASETS_OFFLINE=1 禁用获取元数据的操作,实现完全离线使用的需求。 详见 transformers 官方文档

模型训练

详见 g2pW 官方文档中的说明。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hanpinyin-0.1.0.tar.gz (260.6 kB view details)

Uploaded Source

Built Distribution

hanpinyin-0.1.0-py2.py3-none-any.whl (275.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file hanpinyin-0.1.0.tar.gz.

File metadata

  • Download URL: hanpinyin-0.1.0.tar.gz
  • Upload date:
  • Size: 260.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.8

File hashes

Hashes for hanpinyin-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9bd1362b5f3795278361dd240b79e620796653ec9ae710cc2936cb0c9d3292c8
MD5 5a8d66581079053722fb44738d163dd3
BLAKE2b-256 f713f5d92a5434b7c4324dacdd650152c89bebee56710b581450fd2440a23ed2

See more details on using hashes here.

File details

Details for the file hanpinyin-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: hanpinyin-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 275.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.8

File hashes

Hashes for hanpinyin-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bd5dd737e40e299fc591f9bb32938bd01e3bc3a1aae0606a30a8e4259a1d7b9d
MD5 ca8a1c5e9afd52497dbfe06219931983
BLAKE2b-256 72b2bf7fe756ea0144f5c9e62833c84323fe29ce76ebd6d5d98455357a31b56e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page