A Python library for augmenting Korean text.
Project description
Textmentations
Textmentations is a Python library for augmenting Korean text. Inspired by albumentations. Textmentations uses the albumentations as a dependency.
Installation
pip install textmentations
A simple example
Textmentations provides text augmentation techniques implemented using the TextTransform, which inherits from the albumentations BasicTransform.
This allows textmentations to reuse the existing functionalities of albumentations.
import textmentations as T
text = "어제 식당에 갔다. 목이 너무 말랐다. 먼저 물 한 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다."
rd = T.RandomDeletion(deletion_prob=0.1, min_words_per_sentence=0.8)
ri = T.RandomInsertion(insertion_prob=0.2, n_times=1)
rs = T.RandomSwap(alpha=1)
sr = T.SynonymReplacement(replacement_prob=0.2)
eda = T.Compose([rd, ri, rs, sr])
print(rd(text=text)["text"])
# 식당에 갔다. 목이 너무 말랐다. 먼저 물 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다.
print(ri(text=text)["text"])
# 어제 최근 식당에 갔다. 목이 너무 말랐다. 먼저 물 한 잔을 마셨다 음료수. 그리고 탕수육을 맛있게 먹었다.
print(rs(text=text)["text"])
# 어제 갔다 식당에. 목이 너무 말랐다. 물 먼저 한 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다..
print(sr(text=text)["text"])
# 과거 식당에 갔다. 목이 너무 말랐다. 먼저 소주 한 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다.
print(eda(text=text)["text"])
# 식당에 어제 과거 갔다. 너무 말랐다. 먼저 상수 한 잔을 마셨다 맹물. 그리고 맛있게 먹었다.
List of augmentations
- AEDA
- BackTranslation
- ContextualInsertion
- ContextualReplacement
- IterativeMaskFilling
- RandomDeletion
- RandomDeletionSentence
- RandomInsertion
- RandomSwap
- RandomSwapSentence
- SynonymReplacement
References
- AEDA: An Easier Data Augmentation Technique for Text Classification
- Conditional BERT Contextual Augmentation
- Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations
- EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
- Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling
- Korean Stopwords
- Korean WordNet
- albumentations
- kykim/albert-kor-base
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
textmentations-1.4.0.tar.gz
(49.8 MB
view details)
Built Distribution
File details
Details for the file textmentations-1.4.0.tar.gz
.
File metadata
- Download URL: textmentations-1.4.0.tar.gz
- Upload date:
- Size: 49.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77b1816077c08cc2956f698d8a9825804f09b369a89e4ab50ca36e27839930b1 |
|
MD5 | 932501f77344f0a0d755ea5fa3166b7a |
|
BLAKE2b-256 | 2b94eb13efb0b226fab8d120c69ed4560b8d39100f30e7e49283e339d46d3a1d |
File details
Details for the file textmentations-1.4.0-py3-none-any.whl
.
File metadata
- Download URL: textmentations-1.4.0-py3-none-any.whl
- Upload date:
- Size: 49.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0cde4484974ac184c88cc96cb17f304625c7357ff6bbb4d73c3cf341b156e11d |
|
MD5 | a13a8d39b558ce80d8cadf886168fa24 |
|
BLAKE2b-256 | 11f31953d24f57ebaf7997b65e81df8fbca4bfa6540bae88b8a6680d595f2749 |