中文錯誤類型文字增量
Project description
錯誤類型中文語料生成
安裝
pip install zh-mistake-text-gen
使用 (Pipeline)
from zh_mistake_text_gen import Pipeline
import random
random.seed(7)
pipeline = Pipeline()
augs = pipeline("中文語料生成",k=8)
for aug in augs:
print(aug)
type='MissingWordMaker' correct='中文語料生成' incorrect='中文料生成' incorrect_start_at=2 incorrect_end_at=2 span='語'
type='MissingVocabMaker' correct='中文語料生成' incorrect='語料生成' incorrect_start_at=0 incorrect_end_at=2 span='中文'
type='PronounceSimilarWordMaker' correct='中文語料生成' incorrect='中文語尥生成' incorrect_start_at=3 incorrect_end_at=3 span='尥'
type='PronounceSameWordMaker' correct='中文語料生成' incorrect='諥文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='諥'
type='PronounceSimilarVocabMaker' correct='中文語料生成' incorrect='鍾文語料生成' incorrect_start_at=0 incorrect_end_at=2 span='鍾文'
type='PronounceSameVocabMaker' correct='中文語料生成' incorrect='中文预料生成' incorrect_start_at=2 incorrect_end_at=4 span='预料'
type='RedundantWordMaker' correct='中文語料生成' incorrect='成中文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='成'
type='MistakWordMaker' correct='中文語料生成' incorrect='谁文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='谁'
文檔
Pipeline
-
__init__
makers
= None : maker實例,可選maker_weight
= None : maker被抽中的機率,可選
-
__call__
x
: 輸入句(str),必需error_per_sent
: 每句要多少錯誤。預設:1
no_change_on_gen_fail
: 生成方法失敗的時候允許不變動。啟用時不拋出錯誤,反之。預設:False
verbose
=True : debug 訊息,可選
可用方法
from zh_mistake_text_gen.data_maker import *
Data Maker | Description |
---|---|
NoChangeMaker | 沒有任何變換 |
MissingWordMaker | 隨機缺字 |
MissingVocabMaker | 隨機缺詞 |
PronounceSimilarWordMaker | 隨機相似字替換 |
PronounceSimilarWordPlusMaker | 編輯距離找發音相似並且用高頻字替換 |
PronounceSimilarVocabMaker | 發音相似詞替換 |
PronounceSameWordMaker | 發音相同字替換 |
PronounceSameVocabMaker | 發音相同詞替換 |
RedundantWordMaker | 隨機複製旁邊一個字作為沆於字 |
RandomInsertVacabMaker | 隨機插入詞彙 |
MistakWordMaker | 隨機替換字 |
MistakeWordHighFreqMaker | 隨機替換高頻字 |
MissingWordHighFreqMaker | 隨機刪除高頻字 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
zh_mistake_text_gen-0.3.0.tar.gz
(19.1 kB
view hashes)
Built Distribution
Close
Hashes for zh_mistake_text_gen-0.3.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4584cb626f4ce5faaa4728ff18657e22225558e782a14c89b0b7d8916bc00caa |
|
MD5 | 1ef0bae620882285cae7405b467b248d |
|
BLAKE2b-256 | 4f5d996447b994e21b95eb8afead72c37c68900bd48b04e9e632c41dd7f80cb7 |
Close
Hashes for zh_mistake_text_gen-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d45a4e6b50c79091f180d251cbca96bca52346337e1aa365c9353349195fe3eb |
|
MD5 | f5eeaf0187828de3053dc4af726b1383 |
|
BLAKE2b-256 | ee76787858be65ee00bc8c64b839be66d6600818002cac22aad39192d3100841 |