A Python library for text augmentation that is specialized for Korean.
Project description
Textmentations
Textmentations is a Python library for text augmentation that is specialized for Korean. Inspired by albumentations. Textmentations uses the albumentations as a dependency.
Installation
pip install textmentations
A simple example
Textmentations provides various text augmentation techniques implemented using the TextTransform, which inherits from the albumentations BasicTransform.
This allows textmentations to reuse the existing functionalities of albumentations.
from albumentations import Compose
from textmentations import RandomDeletionWords, RandomDeletionSentences, RandomSwapWords, RandomSwapSentences
text = "아침에는 짜장면을 맛있게 먹었다. 점심에는 짬뽕을 맛있게 먹었다. 저녁에는 짬짜면을 맛있게 먹었다."
dw = RandomDeletionWords(deletion_prob=0.5, min_words_each_sentence=1)
ds = RandomDeletionSentences(deletion_prob=0.5, min_sentences=2)
sw = RandomSwapWords()
ss = RandomSwapSentences()
mixed_transforms = Compose([sw, ss, dw, ds])
print(dw(text=text)["text"])
# 먹었다. 점심에는 맛있게 먹었다. 저녁에는 짬짜면을 맛있게 먹었다.
print(ds(text=text)["text"])
# 아침에는 짜장면을 맛있게 먹었다. 저녁에는 짬짜면을 맛있게 먹었다.
print(sw(text=text)["text"])
# 짜장면을 아침에는 맛있게 먹었다. 점심에는 짬뽕을 맛있게 먹었다. 저녁에는 짬짜면을 맛있게 먹었다.
print(ss(text=text)["text"])
# 아침에는 짜장면을 맛있게 먹었다. 저녁에는 짬짜면을 맛있게 먹었다. 점심에는 짬뽕을 맛있게 먹었다.
print(mixed_transforms(text=text)["text"])
# 저녁에는 먹었다 짬짜면을. 점심에는 짬뽕을.
List of augmentations
RandomDeletionWords
RandomDeletionSentences
RandomSwapWords
RandomSwapSentences
SynonymsReplacement
References
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
textmentations-0.0.1.tar.gz
(176.7 kB
view hashes)
Built Distribution
textmentations-0.0.1-py3-none-any.whl
(181.9 kB
view hashes)
Close
Hashes for textmentations-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 344cc8f5ba941367c846854d01a6f96c82df967fe69f0673cefa629b94a93a38 |
|
MD5 | 2335efa8f7fdb36e7662be38377f4189 |
|
BLAKE2b-256 | 2acead680cb05cbd7536863f8d2d72a969207d8ac627de4abb894d347c0540e8 |