中文錯誤類型文字增量
Project description
錯誤類型中文語料生成
安裝
pip install zh-mistake-text-gen
使用 (Pipeline)
from zh_mistake_text_gen import Pipeline
import random
random.seed(7)
pipeline = Pipeline()
augs = pipeline("中文語料生成",k=8)
for aug in augs:
print(aug)
type='MissingWordMaker' correct='中文語料生成' incorrect='中文料生成' incorrect_start_at=2 incorrect_end_at=2 span='語'
type='MissingVocabMaker' correct='中文語料生成' incorrect='語料生成' incorrect_start_at=0 incorrect_end_at=2 span='中文'
type='PronounceSimilarWordMaker' correct='中文語料生成' incorrect='中文語尥生成' incorrect_start_at=3 incorrect_end_at=3 span='尥'
type='PronounceSameWordMaker' correct='中文語料生成' incorrect='諥文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='諥'
type='PronounceSimilarVocabMaker' correct='中文語料生成' incorrect='鍾文語料生成' incorrect_start_at=0 incorrect_end_at=2 span='鍾文'
type='PronounceSameVocabMaker' correct='中文語料生成' incorrect='中文预料生成' incorrect_start_at=2 incorrect_end_at=4 span='预料'
type='RedundantWordMaker' correct='中文語料生成' incorrect='成中文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='成'
type='MistakWordMaker' correct='中文語料生成' incorrect='谁文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='谁'
文檔
Pipeline
-
__init__makers= None : maker實例,可選maker_weight= None : maker被抽中的機率,可選
-
__call__x: 輸入句(str),必需k=1 : 期待最多返回多少結果,可選verbose=True : debug 訊息,可選
可用方法
from zh_mistake_text_gen.data_maker import *
| Data Maker | Description |
|---|---|
| MissingWordMaker | 隨機缺字 |
| MissingVocabMaker | 隨機缺詞 |
| PronounceSimilarWordMaker | 隨機相似字替換 |
| PronounceSimilarWordPlusMaker | 編輯距離找發音相似並且用高頻字替換 |
| PronounceSimilarVocabMaker | 發音相似詞替換 |
| PronounceSameWordMaker | 發音相同字替換 |
| PronounceSameVocabMaker | 發音相同詞替換 |
| RedundantWordMaker | 隨機複製旁邊一個字作為沆於字 |
| RandomInsertVacabMaker | 隨機插入詞彙 |
| MistakWordMaker | 隨機替換字 |
| MistakeWordHighFreqMaker | 隨機替換高頻字 |
| MissingWordHighFreqMaker | 隨機刪除高頻字 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
zh-mistake-text-gen-0.2.1.tar.gz
(18.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zh-mistake-text-gen-0.2.1.tar.gz.
File metadata
- Download URL: zh-mistake-text-gen-0.2.1.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.9.13 Linux/5.4.0-125-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
334bb94dde78d8007c79da957893aa3fbe9a0e6b6836144aa44561a125726396
|
|
| MD5 |
cdedaf8478952bb61f2c660e312c8370
|
|
| BLAKE2b-256 |
91aff04397c36c36989fc6aa7534ae79ca69dd73e22f2d774a00d4be09f0ede9
|
File details
Details for the file zh_mistake_text_gen-0.2.1-py3-none-any.whl.
File metadata
- Download URL: zh_mistake_text_gen-0.2.1-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.9.13 Linux/5.4.0-125-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
820114eee5a57543c14b18dde794f677629d9ebdf445bffb57e8710c61ded813
|
|
| MD5 |
1f5359c6448eb1be5109aaa905c6f19e
|
|
| BLAKE2b-256 |
2631176e75567f874aa5e23274351a515f1b4c1901de12407efc29555664b092
|