this is data augmentation for chinese text

These details have not been verified by PyPI

Project links

Homepage

Project description

This is Data Augmentation for Chinese text for Python3

Usage

you have two func for Chinese text Data Augmentation

Install textda

pip install:

pip install textda

you can expansion data use data_expansion

from textda.data_expansion import *
print(data_expansion('生活里的惬意，无需等到春暖花开'))

output:

['生活里面的惬意，无需等到春暖花开', 
'生活里的等到春暖花开',
'生活里无需惬意，的等到春暖花开', 
'生活里的惬意，无需等到春暖花开', 
'生活里的惬意，并不需要等到春暖花开', 
'生活无需的惬意，里等到春暖花开', 
'生活里的惬意，等到无需春暖花开']

param explain：

:param sentence: input sentence text
:param alpha_sr: Replace synonym control param. bigger means more words are Replace
:param alpha_ri: Random insert. bigger means more words are Insert
:param alpha_rs: Random swap. bigger means more words are swap
:param p_rd: Random delete. bigger means more words are deleted
:param num_aug: How many times do you repeat each method

you can use parameters alpha_sr, alpha_ri, alpha_rs, p_rd, num_aug can control ouput.

if you set alpha_ri and alpha_rs is 0 that means use linear classifier for it, and insensitive to word location

like this:

from textda.data_expansion import *

print(data_expansion('生活里的惬意，无需等到春暖花开', alpha_ri=0, alpha_rs=0))

output:

['生活里的惬意，无需等到春暖花开', 
    '，无需春暖花开', 
    '生活里面的惬意，无需等到春暖花开', 
    '生活里的惬意，需等到春暖花开']

you can use translate_batch like this:

from textda.youdao_translate import *
dir = './data'
translate_batch(os.path.join(dir, 'insurance_train'), batch_num=30)

# translate results:  chinese->english and english -> chinese

颜色碰掉了一个角不延迟,但事情或他们不赠送,或发送,眉笔打开已经破碎,磨山楂,也不打破一只手,轻轻刷掉,持久性不长,
这个用户没有填写评价内容
颜色非常不喜欢它
不说话,缓慢的新领域
不太容易染好骑吗
不是很好我喜欢!
没有颜色的眼影
应该有大礼物盒眼影,礼物不礼物盒,没有一起破碎粉碎好的眼影不买礼物清洁剂脏就像商品是压力
没有生产日期,我不知道是否真实,总是觉得有点奇怪
是一个小飞粉吗
但是一些混合的颜色
有几次,现在这个东西,笔是空的
眼影有点小,少一点。
不好的颜色,粉红色
明星不想买,坏了,不容易,不要在乎太多!
一开始我已经联系快递,快递一直拖,说他将返回将联系快递服务
画不是,是不好的
物理和照片有很大的区别
不要把眼影刷不是很方便
感觉好干,颜色更暗
打破了在运输途中,有点太脆弱…
盒子有点坏了,还没有发送。

param explain：

:param file_path: src file path
:param batch_num: default 30
:param reWrite: default True. means you can rewrite file , False means you can append data after this file.
:param suffix: new file suffix

Reference:

https://github.com/jasonwei20/eda_nlp

Code for the ICLR 2019 Workshop paper: Easy data augmentation techniques for boosting performance on text classification tasks. https://arxiv.org/abs/1901.11196

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0.6

May 29, 2019

0.1.0.5

May 29, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

textda-0.1.0.6-py3-none-any.whl (14.0 kB view details)

Uploaded May 29, 2019 Python 3

textda-0.1.0.6-py2.py3-none-any.whl (14.0 kB view details)

Uploaded Aug 11, 2019 Python 2Python 3

File details

Details for the file textda-0.1.0.6-py3-none-any.whl.

File metadata

Download URL: textda-0.1.0.6-py3-none-any.whl
Upload date: May 29, 2019
Size: 14.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.8

File hashes

Hashes for textda-0.1.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e3564367c85bd915eede083bcea2537559d209b85c3b1fa5ca6272e800298647`
MD5	`2457625e6ba1c9cbc9539788420e335b`
BLAKE2b-256	`45c328473db1835202ce6c2f16393273cef29662e84eef662cd108ac82611247`

See more details on using hashes here.

File details

Details for the file textda-0.1.0.6-py2.py3-none-any.whl.

File metadata

Download URL: textda-0.1.0.6-py2.py3-none-any.whl
Upload date: Aug 11, 2019
Size: 14.0 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.6.8

File hashes

Hashes for textda-0.1.0.6-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`28c6baabd9ca539648cb8c8cb68c34bf1dfdfaf4fdeb61638bb6adbd5da2fb34`
MD5	`8618be2a6192ad5cc0f776ec852d696b`
BLAKE2b-256	`b3ed091104cd0788ee166ecc8b6e4e90b4360a5397355052725e6d42937d97c4`

See more details on using hashes here.

textda 0.1.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

This is Data Augmentation for Chinese text for Python3

Usage

you have two func for Chinese text Data Augmentation

Install textda

Reference:

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes