Data Augmentation for Japanese Text
Project description
AugLy-jp
Data Augmentation for Japanese Text on AugLy
Augmenter
base_text = "あらゆる現実をすべて自分のほうへねじ曲げたのだ"
Augmenter | Augmented | Description |
---|---|---|
SynonymAugmenter | あらゆる現実をすべて自身のほうへねじ曲げたのだ | Substitute similar word according to Sudachi synonym |
WordEmbsAugmenter | あらゆる現実をすべて関心のほうへねじ曲げたのだ | Leverage word2vec, GloVe or fasttext embeddings to apply augmentation |
FillMaskAugmenter | つまり現実を、未来な未来まで変えたいんだ | Using masked language model to generate text |
Prerequisites
Software | Install (Mac) |
---|---|
Python 3.8.11 | pyenv install 3.8.11 |
Poetry 1.1.* | curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python |
Get Started
Installation
pip install augly_jp
Or clone this repository:
git clone https://github.com/chck/AugLy-jp.git
poetry install
Test with reformat
poetry run task test
Reformat
poetry run task fmt
Lint
poetry run task lint
Inspired
- https://github.com/facebookresearch/AugLy
- https://github.com/makcedward/nlpaug
- https://github.com/QData/TextAttack
License
This software includes the work that is distributed in the Apache License 2.0 [1].
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
augly_jp-2021.9.6.tar.gz
(9.1 kB
view hashes)
Built Distribution
Close
Hashes for augly_jp-2021.9.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 689f1219d8f21a0ceabda2f7919bdd9a3a1bfdf9d5553afe0d8a85077d2546a8 |
|
MD5 | bf5ae01425dc96e0f938f10a4d683f8f |
|
BLAKE2b-256 | 66ca48c4f5dc10971fa3990fd32ba5849da6133735b115beb9cb9c17f6a4300d |