中文錯誤類型文字增量
Project description
中文錯誤類型文字增量
安裝
pip install zh-mistake-text-aug
使用 (Pipeline)
from zh_mistake_text_aug import Pipeline
import random
random.seed(7)
pipeline = Pipeline()
augs = pipeline("中文語料生成")
for aug in augs:
print(aug)
type='MissingWordMaker' correct='中文語料生成' incorrect='中文料生成' incorrect_start_at=2 incorrect_end_at=2 span='語'
type='MissingVocabMaker' correct='中文語料生成' incorrect='語料生成' incorrect_start_at=0 incorrect_end_at=2 span='中文'
type='PronounceSimilarWordMaker' correct='中文語料生成' incorrect='中文語尥生成' incorrect_start_at=3 incorrect_end_at=3 span='尥'
type='PronounceSameWordMaker' correct='中文語料生成' incorrect='諥文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='諥'
type='PronounceSimilarVocabMaker' correct='中文語料生成' incorrect='鍾文語料生成' incorrect_start_at=0 incorrect_end_at=2 span='鍾文'
type='PronounceSameVocabMaker' correct='中文語料生成' incorrect='中文预料生成' incorrect_start_at=2 incorrect_end_at=4 span='预料'
type='RedundantWordMaker' correct='中文語料生成' incorrect='成中文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='成'
type='MistakWordMaker' correct='中文語料生成' incorrect='谁文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='谁'
可用方法
from zh_mistake_text_aug.data_maker import ...
Data Maker | Description |
---|---|
MissingWordMaker | 隨機缺字 |
MissingVocabMaker | 隨機缺詞 |
PronounceSimilarWordMaker | 隨機相似字替換 |
PronounceSimilarWordPlusMaker | 編輯距離找發音相似並且用高頻字替換 |
PronounceSimilarVocabMaker | 發音相似詞替換 |
PronounceSameWordMaker | 發音相同字替換 |
PronounceSameVocabMaker | 發音相同詞替換 |
RedundantWordMaker | 隨機複製旁邊一個字作為沆於字 |
MistakWordMaker | 隨機替換字 |
MistakeWordHighFreqMaker | 隨機替換高頻字 |
MissingWordHighFreqMaker | 隨機刪除高頻字 |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
zh-mistake-text-aug-0.1.2.tar.gz
(17.8 kB
view hashes)
Built Distribution
Close
Hashes for zh-mistake-text-aug-0.1.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a1bfe770d49ada4e7ea651181ba6ea3a15497a69d14ebc1f52a8647379386b6 |
|
MD5 | 2480196e5d95e8bf2a7028f25a79e16e |
|
BLAKE2b-256 | 4bb94384722c2ae39be0b96404fe0854a3b144f02feaf4ee5377abde96df56a3 |
Close
Hashes for zh_mistake_text_aug-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 086b3e552777bbc55152d4b16064c599f40d8f856a9fa24aa614c7ac51e2cec1 |
|
MD5 | 40b1de57bf2be91a2953d6fae84f5c81 |
|
BLAKE2b-256 | 398b8791c051a0100659ef4a8978f25c9bed5e3350d05387f7036344cfa0c83b |