A Python library for augmenting Korean text.
Project description
Textmentations
Textmentations is a Python library for augmenting Korean text. Inspired by albumentations. Textmentations uses the albumentations as a dependency.
Installation
pip install textmentations
A simple example
Textmentations provides text augmentation techniques implemented using the TextTransform, which inherits from the albumentations BasicTransform.
This allows textmentations to reuse the existing functionalities of albumentations.
import textmentations as T
text = "어제 식당에 갔다. 목이 너무 말랐다. 먼저 물 한 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다."
rd = T.RandomDeletion(deletion_prob=0.1, min_words_per_sentence=0.8)
ri = T.RandomInsertion(insertion_prob=0.2, n_times=1)
rs = T.RandomSwap(alpha=1)
sr = T.SynonymReplacement(replacement_prob=0.2)
eda = T.Compose([rd, ri, rs, sr])
print(rd(text=text)["text"])
# 식당에 갔다. 목이 너무 말랐다. 먼저 물 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다.
print(ri(text=text)["text"])
# 어제 최근 식당에 갔다. 목이 너무 말랐다. 먼저 물 한 잔을 마셨다 음료수. 그리고 탕수육을 맛있게 먹었다.
print(rs(text=text)["text"])
# 어제 갔다 식당에. 목이 너무 말랐다. 물 먼저 한 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다..
print(sr(text=text)["text"])
# 과거 식당에 갔다. 목이 너무 말랐다. 먼저 소주 한 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다.
print(eda(text=text)["text"])
# 식당에 어제 과거 갔다. 너무 말랐다. 먼저 상수 한 잔을 마셨다 맹물. 그리고 맛있게 먹었다.
List of augmentations
- AEDA
- BackTranslation
- ContextualInsertion
- ContextualReplacement
- IterativeMaskFilling
- RandomDeletion
- RandomDeletionSentence
- RandomInsertion
- RandomSwap
- RandomSwapSentence
- SynonymReplacement
References
- AEDA: An Easier Data Augmentation Technique for Text Classification
- Conditional BERT Contextual Augmentation
- Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations
- EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
- Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling
- Korean Stopwords
- Korean WordNet
- albumentations
- kykim/albert-kor-base
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file textmentations-1.4.0.tar.gz.
File metadata
- Download URL: textmentations-1.4.0.tar.gz
- Upload date:
- Size: 49.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77b1816077c08cc2956f698d8a9825804f09b369a89e4ab50ca36e27839930b1
|
|
| MD5 |
932501f77344f0a0d755ea5fa3166b7a
|
|
| BLAKE2b-256 |
2b94eb13efb0b226fab8d120c69ed4560b8d39100f30e7e49283e339d46d3a1d
|
File details
Details for the file textmentations-1.4.0-py3-none-any.whl.
File metadata
- Download URL: textmentations-1.4.0-py3-none-any.whl
- Upload date:
- Size: 49.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0cde4484974ac184c88cc96cb17f304625c7357ff6bbb4d73c3cf341b156e11d
|
|
| MD5 |
a13a8d39b558ce80d8cadf886168fa24
|
|
| BLAKE2b-256 |
11f31953d24f57ebaf7997b65e81df8fbca4bfa6540bae88b8a6680d595f2749
|