Skip to main content

中文錯誤類型文字增量

Project description

中文錯誤類型文字增量

安裝

pip install zh-mistake-text-aug

使用 (Pipeline)

from zh_mistake_text_aug import Pipeline
import random

random.seed(7)
pipeline = Pipeline()
augs = pipeline("中文語料生成")
for aug in augs:
    print(aug)
type='MissingWordMaker' correct='中文語料生成' incorrect='中文料生成' incorrect_start_at=2 incorrect_end_at=2 span='語'
type='MissingVocabMaker' correct='中文語料生成' incorrect='語料生成' incorrect_start_at=0 incorrect_end_at=2 span='中文'
type='PronounceSimilarWordMaker' correct='中文語料生成' incorrect='中文語尥生成' incorrect_start_at=3 incorrect_end_at=3 span='尥'
type='PronounceSameWordMaker' correct='中文語料生成' incorrect='諥文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='諥'
type='PronounceSimilarVocabMaker' correct='中文語料生成' incorrect='鍾文語料生成' incorrect_start_at=0 incorrect_end_at=2 span='鍾文'
type='PronounceSameVocabMaker' correct='中文語料生成' incorrect='中文预料生成' incorrect_start_at=2 incorrect_end_at=4 span='预料'
type='RedundantWordMaker' correct='中文語料生成' incorrect='成中文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='成'
type='MistakWordMaker' correct='中文語料生成' incorrect='谁文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='谁'

可用方法

from zh_mistake_text_aug.data_maker import ...
Data Maker Description
MissingWordMaker 隨機缺字
MissingVocabMaker 隨機缺詞
PronounceSimilarWordMaker 隨機相似字替換
PronounceSimilarWordPlusMaker 編輯距離找發音相似並且用高頻字替換
PronounceSimilarVocabMaker 發音相似詞替換
PronounceSameWordMaker 發音相同字替換
PronounceSameVocabMaker 發音相同詞替換
RedundantWordMaker 隨機複製旁邊一個字作為沆於字
MistakWordMaker 隨機替換字
MistakeWordHighFreqMaker 隨機替換高頻字
MissingWordHighFreqMaker 隨機刪除高頻字

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zh-mistake-text-aug-0.1.1.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zh_mistake_text_aug-0.1.1-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file zh-mistake-text-aug-0.1.1.tar.gz.

File metadata

  • Download URL: zh-mistake-text-aug-0.1.1.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.8.9 Linux/5.15.0-46-generic

File hashes

Hashes for zh-mistake-text-aug-0.1.1.tar.gz
Algorithm Hash digest
SHA256 237f96ef0f520a7584d388dccdf7df72a3f141c6670a07c656b77e48b7d0ddfc
MD5 5570bad9e213f74c0ee7941067f918ae
BLAKE2b-256 7dd3ee0f59f9aebdde6e8e132613beda778acd5b65e709f9d3ae9c31e2571a68

See more details on using hashes here.

File details

Details for the file zh_mistake_text_aug-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: zh_mistake_text_aug-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.8.9 Linux/5.15.0-46-generic

File hashes

Hashes for zh_mistake_text_aug-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 93aaa7e5cb23f3995ceb18fa1fc91ee2d17b2baa7adeb1da3a4174b1b39f5fab
MD5 766c28e347b2633435c7935293b8e000
BLAKE2b-256 02fb8f559e3f0d0eb89e561fadb0a73f95650582e7f69aef668c4cb13ea1f5bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page