Skip to main content

A Python library for text augmentation that is specialized for Korean.

Project description

Textmentations

Textmentations is a Python library for text augmentation that is specialized for Korean. Inspired by albumentations. Textmentations uses the albumentations as a dependency.

Installation

pip install textmentations

A simple example

Textmentations provides various text augmentation techniques implemented using the TextTransform, which inherits from the albumentations BasicTransform.

This allows textmentations to reuse the existing functionalities of albumentations.

from albumentations import Compose
from textmentations import RandomDeletionWords, RandomDeletionSentences, RandomSwapWords, RandomSwapSentences

text = "아침에는 짜장면을 맛있게 먹었다. 점심에는 짬뽕을 맛있게 먹었다. 저녁에는 짬짜면을 맛있게 먹었다."
dw = RandomDeletionWords(deletion_prob=0.5, min_words_each_sentence=1)
ds = RandomDeletionSentences(deletion_prob=0.5, min_sentences=2)
sw = RandomSwapWords()
ss = RandomSwapSentences()
mixed_transforms = Compose([sw, ss, dw, ds])

print(dw(text=text)["text"])
# 먹었다. 점심에는 맛있게 먹었다. 저녁에는 짬짜면을 맛있게 먹었다.

print(ds(text=text)["text"])
# 아침에는 짜장면을 맛있게 먹었다. 저녁에는 짬짜면을 맛있게 먹었다.

print(sw(text=text)["text"])
# 짜장면을 아침에는 맛있게 먹었다. 점심에는 짬뽕을 맛있게 먹었다. 저녁에는 짬짜면을 맛있게 먹었다.

print(ss(text=text)["text"])
# 아침에는 짜장면을 맛있게 먹었다. 저녁에는 짬짜면을 맛있게 먹었다. 점심에는 짬뽕을 맛있게 먹었다.

print(mixed_transforms(text=text)["text"])
# 저녁에는 먹었다 짬짜면을. 점심에는 짬뽕을.

List of augmentations

  • RandomDeletionWords
  • RandomDeletionSentences
  • RandomSwapWords
  • RandomSwapSentences
  • SynonymsReplacement

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textmentations-0.0.1.tar.gz (176.7 kB view hashes)

Uploaded Source

Built Distribution

textmentations-0.0.1-py3-none-any.whl (181.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page