中文錯誤類型文字增量
Project description
中文錯誤類型文字增量
安裝
pip install zh-mistake-text-aug
使用 (Pipeline)
from zh_mistake_text_aug import Pipeline
import random
random.seed(7)
pipeline = Pipeline()
augs = pipeline("中文語料生成")
for aug in augs:
print(aug)
type='MissingWordMaker' correct='中文語料生成' incorrect='中文料生成' incorrect_start_at=2 incorrect_end_at=2 span='語'
type='MissingVocabMaker' correct='中文語料生成' incorrect='語料生成' incorrect_start_at=0 incorrect_end_at=2 span='中文'
type='PronounceSimilarWordMaker' correct='中文語料生成' incorrect='中文語尥生成' incorrect_start_at=3 incorrect_end_at=3 span='尥'
type='PronounceSameWordMaker' correct='中文語料生成' incorrect='諥文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='諥'
type='PronounceSimilarVocabMaker' correct='中文語料生成' incorrect='鍾文語料生成' incorrect_start_at=0 incorrect_end_at=2 span='鍾文'
type='PronounceSameVocabMaker' correct='中文語料生成' incorrect='中文预料生成' incorrect_start_at=2 incorrect_end_at=4 span='预料'
type='RedundantWordMaker' correct='中文語料生成' incorrect='成中文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='成'
type='MistakWordMaker' correct='中文語料生成' incorrect='谁文語料生成' incorrect_start_at=0 incorrect_end_at=0 span='谁'
可用方法
from zh_mistake_text_aug.data_maker import ...
| Data Maker | Description |
|---|---|
| MissingWordMaker | 隨機缺字 |
| MissingVocabMaker | 隨機缺詞 |
| PronounceSimilarWordMaker | 隨機相似字替換 |
| PronounceSimilarWordPlusMaker | 編輯距離找發音相似並且用高頻字替換 |
| PronounceSimilarVocabMaker | 發音相似詞替換 |
| PronounceSameWordMaker | 發音相同字替換 |
| PronounceSameVocabMaker | 發音相同詞替換 |
| RedundantWordMaker | 隨機複製旁邊一個字作為沆於字 |
| MistakWordMaker | 隨機替換字 |
| MistakeWordHighFreqMaker | 隨機替換高頻字 |
| MissingWordHighFreqMaker | 隨機刪除高頻字 |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
zh-mistake-text-aug-0.1.1.tar.gz
(17.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zh-mistake-text-aug-0.1.1.tar.gz.
File metadata
- Download URL: zh-mistake-text-aug-0.1.1.tar.gz
- Upload date:
- Size: 17.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.14 CPython/3.8.9 Linux/5.15.0-46-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
237f96ef0f520a7584d388dccdf7df72a3f141c6670a07c656b77e48b7d0ddfc
|
|
| MD5 |
5570bad9e213f74c0ee7941067f918ae
|
|
| BLAKE2b-256 |
7dd3ee0f59f9aebdde6e8e132613beda778acd5b65e709f9d3ae9c31e2571a68
|
File details
Details for the file zh_mistake_text_aug-0.1.1-py3-none-any.whl.
File metadata
- Download URL: zh_mistake_text_aug-0.1.1-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.14 CPython/3.8.9 Linux/5.15.0-46-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93aaa7e5cb23f3995ceb18fa1fc91ee2d17b2baa7adeb1da3a4174b1b39f5fab
|
|
| MD5 |
766c28e347b2633435c7935293b8e000
|
|
| BLAKE2b-256 |
02fb8f559e3f0d0eb89e561fadb0a73f95650582e7f69aef668c4cb13ea1f5bb
|